Unforeseen Consequences in Data Science

After the massive Exxon Valdez oil spill, states passed laws boosting the liability of tanker companies for future spills. The result was not as intended: fly-by-night companies, whose bankruptcy would not be consequential, took over the trade. In this blog we look at some notable examples of unforeseen consequences of analytics algorithms.

Recommender Algorithms

How is it that we get so much online content, services and value free of charge? Advertising and promotion, of course, which rely on effective recommendation algorithms. Ads that show individual consumers goods or services that they want or need are much more powerful, and hence more lucrative, than ads showing the same thing to everyone. The importance of the algorithms that serve up useful recommendations (and predict which ads you will click on) was illustrated just recently when, owing to a slip-up, Twitter lost some of its ability to target ads. The result was a 20% drop in its share price, erasing $5 billion in market value.

These same predictive recommendation algorithms fuel much of the value that we get out of the internet: they anticipate what we are looking for, show us products and services that we are likely to want, and connect us with people we are likely to like.

They also have the capacity for harm, either in the normal course of their operation, or by being repurposed. Some examples:

TikTok, the popular video-sharing app, has powerful recommender algorithms that bring video postings to the attention of other like-minded users. The terrorist group ISIS used these facilities to post videos of corpses being dragged through streets in an effort to reach new recruits (TikTok did eventually remove the postings, as the result of complaints).

The messaging app WhatsApp focuses on direct messaging and forwarding, and collects and shares user data that helps them target advertising. In 2017 in India, rapidly proliferating fake stories on WhatsApp about child abductions led to a number of lynchings of those thought responsible.

Facebook and Twitter have attracted attention in the last couple of years centering around fake accounts that malevolent actors such as Russia used to manipulate the American body politic. The contribution of the social media giants to the coarsening and tribalization of politics, however, lies more in the nature of the “connection” algorithms they use. On the one hand, they connect us to family and friends who might otherwise be hard to keep up with. On the other, they also connect like-minded extremists and sociopaths to each other – people who, in the pre-online world, remained isolated.

Predictive Model Bias

Healthcare data scientists have been developing and using algorithms to predict which patients are more likely to require additional care, so that healthcare providers can track them more carefully and help manage their care. The goal was improved health outcomes. Optum, which which used such an algorithm as part of its services to doctors and hospitals, found that the outcome was bias against black patients. The reason? Researchers found that the algorithm was trained, partly, on prior consumption of healthcare services, the presumption being that people who spent more on healthcare were sicker and needed more attention. And, indeed, prior healthcare spending was predictive of future healthcare needs. However, it was also reflective of wealth – wealthier patients, who tended to be white, were more inclined to seek care, other things being equal. The result, unintended, was that white patients were prioritized over black patients who were sicker, but simply hadn’t been able to spend as much on their own healthcare. This illustrates the benefits that come from a good understanding of the data, the variables, and how a model works — all part of the landscape of good statistical practice (but less of a tradition in the machine learning community).

I reported last month on Amazon’s effort to build a machine learning model to triage resumes for making hiring decisions. In addition to saving money, the company hoped to institutionalize a more consistent and objective hiring process. Instead, the machine learning algorithm effectively internalized existing prejudices against women and learned that one of the best predictors of whether a new candidate would be a good fit was their gender. Amazon eventually abandoned the effort, concluding it was doomed to failure.

Synthesis of Fake Content

In the old days, visual artists made fun of politicians with cartoons, which nobody mistook for real images. Sophisticated image and voice algorithms, versions of deep learning neural networks, have now enabled internet trolls to create “deep fake” video hoaxes and other malicious content, generally centering on celebrities, using highly realistic fake images that are difficult to distinguish from genuine photos and videos. Fabricated pornographic videos of a target are a favorite tactic.

A similar arena is the generation of fake, but realistic, text, given an initial prompt. This is still in the realm of “parlor tricks,” but it is not hard to imagine its future. The recent scandal involving hundreds of thousands of fake comments on proposed government rules is an example. These efforts to overwhelm the rule-making public comment process with misleading or irrelevant content were only partly automated. One can imagine that a more sophisticated text generation process might make such efforts at the same time both more powerful (by lowering detection probability), and easier to launch at volume.

The developers of AI algorithms to synthesize context appropriate text when given a prompt did think about the possible implications of their work, and released only a limited version of the program, GPT-2, to the public (read this story for more). They worried that the fully capable version would inundate public online discussion spaces with useless synthetic content. Of course, releasing a limited version only whetted the appetite of the online programming community for more, and laid down a challenge for others to match what they have done, such as at least one group has claimed – see this article.

The potential for harm from synthetic content is obvious. What benefits did academic researchers have in mind in pursuing this type of research in the first place? Some literature in the computer science field seems merely to presume that overcoming the technical challenge of fabricating, say, Nicolas Cage’s face in a video, in and of itself, is sufficient motivational justification.

One oft-cited very early work cites the benefits for the Hollywood special effects industry, saying

“close-ups in dubbed movies are often disturbing due to the lack of lip sync.”

Does innovation with such potential for harm have no better justification than improved lip sync?

China

Nowhere is AI placed more squarely in the service of Big Brother than China, and nearly all the technologies employed there have origins in more innocent uses.

Facial recognition methods, used by Facebook to tag photos and by phone makers to secure phones, are widely used in China (and elsewhere) for mass surveillance. In some regions, police patrol booths, and cameras, are omnipresent and individuals are required to record detailed high resolution facial imagery for the police.

Entity resolution techniques were first used long ago in customer databases to identify and merge potentially duplicate customer records. Expanding capabilities allowed them to be used more widely in law enforcement across different data sources to identify fugitive criminals and potential terrorists. More recently they have been used to track down otherwise law-abiding undocumented immigrants in the U.S. In China, they are used to ease the otherwise crushing administrative burden of mass surveillance by automating the monitoring of suspicious individuals, and tracking their activity.

Privately-calculated financial credit scores applied to individuals by machine learning algorithms have been a mainstay of the financial system for decades. The government of China is in the process of replacing traditional financial credit scores with more comprehensive “social credit scores,” which can take into account non-financial behavioral variables such as propensity for online gaming, personal shopping habits, and religious affiliation. Automated tracking and assessment of “social suitability” seems to be the goal.

For more details, and a global picture, check out this Carnegie Endowment report.

Data science researchers owe it to society to think through the implications of their work. It’s difficult to escape the conclusion, though, that “technology will out,” and, if something can be done, it will be done. It’s like water running downhill – it may dam up temporarily when it encounters an obstacle, but it will eventually find a way around.