Some applications of machine learning and artificial intelligence are recognizably impressive – predicting future hospital readmission of discharged patients, for example, or diagnosing retinopathy. Others – self-driving cars, for example – seem almost magical. The matching problem, though, is one where your first reaction might be “What’s so hard about that?” For example, to take the application of finding duplicates, if a customer by the name of Elliot Sanderson places an order at a web site, the task of matching him up to the Elliot Sanderson already on the customer list would seem easy.
A bit of reflection quickly reveals the difficulty involved – suppose it is a common name, like Robert Smith? Or suppose the name matches but the email doesn’t? Or suppose the first name is missing and there’s only an initial?
Matching two potentially identical individuals is known as “entity resolution.” One company, Senzing, is built around software specifically for entity resolution. Other matching problems seek compatibility between two different people or entities. Both are best done using machine learning rather than simple rule-based logic. The best-known compatibility matching problem? Online dating!. Entity resolution is used in
Marketing (merging duplicate customers into the same record)
Law enforcement (is person “X” the same as the known criminal “Y”)
Financial compliance, transportation security (is person “X” on a watch list)
In this blog, we’ll look at two applications of person-matching, one commercial and one academic:
Matching call center agents to incoming callers (compatibility)
Matching census records of slaveholders in 1860 to post-Civil War records (entity resolution)
Compatibility Matching – Agents and Callers
Online dating applications are the best-known application of compatibility matching, but they are difficult to study and assess because of the lack of a good baseline (even if you could reliably measure the outcome of a dating algorithm, what do you compare it to?)
Matching call center agents to incoming callers can be more precisely assessed, and is the key offering of the Afiniti company. Tech support centers and customer service centers handle large volumes of calls from customers, and some customer-agent pairings are more “simpatico,” and hence more efficient, than others. A traditional solution has been to measure the outcome of calls (duration, customer satisfaction score, solution or no solution) and route incoming calls to agents in order of overall agent performance on these metrics. This brute-force solution fails to take into account the fact an agent might do well with some customers and not with others. Moreover, in times of agent shortage (which is most of the time, as anyone calling customer service for just about anything can attest), there is little choice about routing, and forcing customers to wait longer for the best agents (while leaving less capable agents under-employed) is not an optimal solution.
Afiniti’s solution is to pair agents and callers in such a way as to optimize the outcome of calls overall. The input data consist of attributes of agents (measured by assessing agents on style and other characteristics of their interactions), attributes of callers (drawn from internal company data if they are previous callers, and also from external demographic data drawn from caller ID), and scoring of call outcomes. During a training period of several weeks, a model is developed that is tweaked in such a way as to increasingly improve outcomes. The model is proprietary, but Afiniti describes its general form in the following way.
Suppose you have a set of 5 incoming callers and 5 agents who will shortly be available. For each possible pairing, the model (which was fit to earlier data) uses the caller and agent data to predict the outcome (say, a customer satisfaction score). The model then chooses the pairings that, together, maximize the overall customer satisfaction. The model is data-intensive (Afiniti says it processes over 2 petabytes of data per day), and the algorithms must operate with extreme efficiency, since it must make its allocation decision quickly enough to not materially delay response to the customer.
Afiniti then alternates short periods of “model operating” and “model not operating” (i.e. the client company uses its old agent assignment system) to provide its client with an ongoing metric of benefit from the model. Afiniti claims benefits in single-digit percentage ranges which seem small (2-4%), but can amount to large savings over time.
Tracking Slaveholder Wealth via Census Data
The compatibility matching problem above was a form of supervised learning, in which supervision was done by the call outcomes, which were known. The outcomes were scores to be maximized. In a more typical form of supervision, a model attempts to estimate a known outcome that is hidden from the model, and the goal is to estimate correctly. A typical problem is obtaining data where the outcome is known – i.e. you have “ground truth,” and we will look at that issue in the context of entity resolution, and a study of slavery in the U.S.
As the United States inexorably headed down the road to civil war in the mid 1800’s, one propelling factor in the conflict between the slave-holding south and the free north was the immense wealth that slaves represented. Even those in the south who were unenthusiastic about slavery faced the conundrum that ending the injustice would kick the legs out from the southern economy, and impoverish the politically powerful. Thomas Jefferson, a Virginian whose intellectual “brand” was individual rights, recognized the contradiction between his political philosophy and his personal practice. He signed legislation to outlaw the slave trade and put forth proposals for gradual emancipation, yet he himself freed only a handful of his more than 600 slaves and left his estate so debt-ridden that his slaves had to be sold off on his death, often splitting up families. Roughly 50% of the wealth in the U.S. South in 1860 was in the form of slaves, so the abolition of slavery in 1865 represented a sudden and tremendous loss of wealth almost without parallel.
Recently, historians have been exploring the longer term effects of this wealth loss, and, surprisingly, have found that it was minimal. Research by Ager, Boustan and Eriksson concluded that, within a single generation, the sons of slaveholders had virtually restored the economic position of their families. Ager et al speculate that this restoration arose from the extensive and powerful social network that was available to these families.
The Interesting Statistical Question
A methodological question of particular statistical interest is the problem of matching up the slaveholders in 1860 to their sons in 1880. This question has two components:
Obtaining data for which “ground truth” is known, so the matching process, whether manual or automated, can be assessed
Determining if an automated way of matching names will work better than human matching
Obtaining “ground truth” for supervised learning is an unglamorous part of machine learning, but the key part. For an AI-based medical diagnosis system, say reading x-rays, have you ever wondered where the benchmark “true diagnosis” comes from? For the slaveholder research, how did the investigators know whether the match of an 1860 father to an 1880 son was correct?
The answer is that they didn’t. Instead, the matching algorithm was testing on data where they did know, or, at least, came much closer to knowing.
One set of calibration data was a comparison of the 1900 Census to Union Army Civil War records. The Census and Army records were “carefully (and expensively) hand collected using trained research assistants who had access to extra information not typically available for linking.” 
Another set of data was the 1900 and 1910 Census data, which the Brigham Young University Record Linking Lab had analyzed using a crowd-sourced method. High quality supplemental information provided by users of the website FamilySearch.org (while filling out and researching their family tree) allowed the Lab to establish strong matches – these data are considered the “gold standard” for census benchmarking.
Ager, et al tested several automated systems incorporating varying elements of statistical learning. One was a rule-based system tweaked by machine learning. Each row in the dataset is a potential name match that meets some minimum threshold of feasibility. Features for a potential match include the absolute difference in ages, and a measure of similarity for first name, last name and place of birth. One popular measure of similarity is the “Jaro-Winkler” score, which is based on the number of edits that would be required to move from one name to the other, and a weighting in favor of initial letters. It is scaled to 0-1, with 0 indicating a perfect match. Each record is labeled by human review as a 0 (no match) or 1 (match), and a model is fit to estimate the relationship between the features and the probability of a match.
The decision rule specifies a minimum probability of a match (i.e. the records must be similar), and a minimum “lead” over the second-place contender (i.e. there cannot be uncertainty about which potential match is the likely one). These two parameters are tuned by cross-validation within the training data, to maximize the overall matching performance.
Abramitzky et al report that the automated methods that they tested all performed as well as, or better than, human reviews, with a false positive rate of less than 5%.
Bottom line: The use of well-performing statistical matching algorithms, coupled with the increasing availability of digitized Census and other historical family data, is enabling ground-breaking economic and social research that can track family fortune and status over time.
 Automated Linking of Historical Data, Ran Abramitzky, Leah Platt Boustan, Katherine Eriksson, James J. Feigenbaum, Santiago Pérez; Working Paper 25825
NATIONAL BUREAU OF ECONOMIC RESEARCH