Skip to content

Ethical Dilemmas in Data Science

Know those ads that follow you around the web, as a result of tracking cookies?  Many see them as an invasion of privacy, and EU rules made them subject to user consent.  Google recently announced that Chrome will eventually stop supporting these cookies.  A win for the consumer?  Perhaps, but there is another side to the coin.  Google’s Chrome is the dominant browser, so Google has its own special window into user behavior and attributes.  Cutting out 3rd-party cookies will strengthen Google’s dominant position in online advertising. Should there be a regulation to protect the other advertisers?  That would promote more competition, but wouldn’t it diminish user privacy protections? Apple’s decision in late 2019 to restrict apps’ ability to track you while not using the app was a win for consumer privacy, but also a blow to the app maker Tile, which consumers use to track lost items.

The ethical practice of data science is a hot topic now (see this list of readings on the subject), with knotty issues such as these.  Can more be done to promote ethical practice, besides lots of hand-wringing?  

Government Regulation

The EU has one solution – the 2016 General Data Protection Regulation (GDPR).  The European Commission has 32,000 employees, and a good number of them must have been involved in producing this voluminous, complex and bureaucratic regulation.  The Table of Contents alone runs to 8 pages, and the document itself features 88 pages of fine print.     

While comprehensive in use of words, the regulation’s usefulness in addressing ethical issues in data science is questionable.

  • The regulation mainly addresses the use of individuals’ data by big corporations.  Privacy and the desire to avoid targeted corporate advertising are concerns of many, no doubt, but are a small part of the overall concern of data ethicists.
  • The regulation is so costly to comply with, and so punitive of infractions ($20 million minimum) that either it will discourage small startups, or they will ignore it.
  • The regulation addresses in minute detail the structure of a fast-growing industry that is only 10 years old.  We at Statistics.com learned how this turns out when, in 2002, we had to navigate state education certification rules that were written before the advent of online education. 

Moreover, the regulation applies to Europe.  For big corporations, it will have spillover to other markets.  However, it will have no impact on some of the most egregious government “big brother” activities, such as those in China. 

Corporate and Individual Action

The big social media and internet companies of Silicon Valley have developed their own responses to public pressure.  For example, Google now says it will disallow microtargeting of political ads.  Twitter is banning political ads altogether.  Facebook garnered much criticism for its refusal to disallow “lies” in political ads, though one can easily see how a private company would want to avoid being cast as the arbiter of “truth” in politics.

And then there is the issue of action at the level of the individual data scientist.  There is increasing awareness – in conferences, books, academic programs – of issues of ethics in the practice of data science and the need for individual data scientists to consider the consequences of their work.

The Case for Hand-Wringing

At first, this may seem like so much hand-wringing, with no action.  But, as frustrating as it seems, hand-wringing may be the optimal course at the moment (or, perhaps better, the only feasible course). Many issues cut two ways:

  • The spread of DNA testing and matching algorithms brings the promise of medical advances at the individual level; it also provides a basis for catching violent criminals via DNA evidence.  On the other hand, it opens the prospect for health and life insurance companies to deny coverage or differentially rate individuals based on the genetic hand they were dealt.
  • Models based on geolocation data can suggest where police should focus preventive policing on an hour-by-hour basis, but, when applied at the individual level can result in prejudicial profiling of individuals when they have done nothing wrong.
  • Over a half century, predictive models to generate credit scores have opened up credit facilities to a wide variety of people who would not have qualified in earlier times; yet, now China has used the same modeling principles to characterize a person’s worth to society and differentially treated families based on the results.
  • Recommendation algorithms connect you with products that might appeal to you and with social groups that may be of interest; still, the same algorithms help put previously isolated extremist voices in touch with one another on social media.

Hasty regulation can discard the good with the bad, and leave the sector burdened with counter-productive and ill-advised rules.  In responding to specific visible ills of the depression in the 1930’s, well-intentioned government officials created rules on everything from airline routes to apricot harvests – rules that achieved little overall benefit but created entitled classes that ensured the rules would hamstring the economy for decades to come.  

So hand-wringing may be better than jumping into regulatory regimes like GDPR, and an open discussion of issues can help guide individual ethical choices and company policies.  Here are some guideposts that can help guide the hand-wringing:

Categories to Consider

  1. Behavioral modification and manipulation. A variety of AI and machine learning techniques serve to modify and manipulate individual behavior.  Many people are most aware of this when they see ads follow them around – for example, that small sea kayak they viewed at the REI website keeps popping up when they look at the news.  Businesses use predictive modeling, incorporating uplift models, to microtarget advertising for the greatest effect at least cost. A-B testing is used at scale to determine the most effective message for different segments of people, or even individuals. Politicians use the same techniques to target voters as individuals or small groups.  Social media collaborative filtering algorithms sustain useful professional and social collaboration, and recommender systems help you find products of likely interest. For nearly every positive and constructive application of these algorithms, it is also possible to point to destructive and dangerous ones.
  2. Fakery – surface and deep.  When people got their news from established media organizations, outright fabrication faced a high bar in reaching the public. Twitter and Facebook and their sharing algorithms eliminated this bar, helping to channel and focus the spread of fabricated information.  The desire to make mischief is latent in many, perhaps most people, and full-blown in the 3% of males and 1% of females estimated to be sociopaths. The internet presents them with the tools to cause trouble – the extremely alluring and dangerous tools of image, video and voice synthesis. The technology now exists to create videos and voice recordings that mimic real people so closely as to be indistinguishable (such as this fake of Mark Zuckerberg). Imagine the havoc that could result from fabricated instructions from political leaders, or the financial harm that could result from gaining access to banking relationships that rely on trusted voice communications.  Google recently announced that the same types of minds that created this Frankenstein technology will now be devoted to detecting it.

Users of Algorithms

One way to think about or categorize ethical issues in data science is by looking at the folks using these methods.  

  1. Business.  This issue seems to get more attention than anything else (it is the main target of the GDPR) but I believe it is among the least problematic.  The public receives tremendous value in services in return for allowing use of their data, and it seems clear that (a) these services are very important to people, and (b) people are accustomed to the advertising-based model and are not likely to pay for these services.  Moreover, they do have some options in configuring what they share and don’t share. One study found that only 15% of users blocked cookies or javascript.  
  2. Government.  The “surveillance state” is looming. There are over half a million surveillance cameras in London.  They feature in most British crime dramas. British citizens may not feel the presence of “big brother” at a practical level, but in autocratic states it is a different matter. Collection of comprehensive and detailed individual data enables authorities in some non-democratic societies, for example, to exercise arbitrary and destructive control over individuals and communities in ways that Stalin, Hitler and Mao could only dream of.  
  3. Personal data, used by thieves and con-artists. Not a week goes by without an alarming story about a data breach, or an interesting anecdote about an internet scam.  Theft of credit card data is indeed annoying to the affected individuals, but doesn’t rise to the level of cataclysmic or life-changing, given the AI-enabled protections that limit loss.  Theft of identity is another matter – it can ruin a person’s well-being and completely disrupt their life. It seems reasonable to believe, though, that identity theft is something that can be combated by better algorithms (though it is also true that clever but malevolent actors may devise algorithms that facilitate better identity theft).

“Ethics really isn’t about agreeing to a set of principles.  It’s about changing the way you act.”

Mike Loukides, Hilary Mason and DJ Patil:  Ethics and Data Science (O’Reilly)