Skip to content

Machine Learning and Human Bias

Does better AI offer the hope of prejudice-free decision-making?  Ironically, the reverse might be true, especially with the advent of deep learning.  

Bias in hiring is one area where private companies move with great care, since there are thickets of laws and regulations in most countries governing bias in employment.  The total cost of recruiting, interviewing, reviewing candidates, and hiring is substantial (up to $50,000 for software engineers, by one estimate), so it is no wonder that some companies have turned to machine learning in an effort to automate decisions and take human discretion out of the picture.

In 2015, Google introduced its Google Hire app, to simplify the hiring process.  A year earlier Amazon started working on its own internal software to sort through resumes and make hiring recommendations.  Both have now been abandoned.

Google’s decision to discontinue Google Hire was cast as a purely commercial one – part of an ongoing process of testing new applications and focusing on those that showed the most promise.  Amazon, on the other hand, abandoned its effort to find an effective AI tool for recruiting and hiring reportedly because all the algorithms it developed showed persistent bias against women.

At first you might think that statistical and machine learning algorithms, being objective and non-human, would not show human bias.  But the algorithms were trained by data on what candidates were previously deemed “successful.” Since women are a small minority of tech employees in Silicon Valley (less than 25%), a clever algorithm will learn that “male” is an excellent predictor of success.  

Amazon tried to restrict algorithms by disabling rules that it found to be discriminatory, such as rules that downgraded resumes that contained the word “women’s” (e.g. indicating membership in a women’s organization), or rules that worked against graduates of women’s colleges.  However, it seemed that the algorithms found ways around these restrictions. And, since the algorithms had no innate sense of appropriateness, they will use effective predictors like “male” without compunction.

Preference for male company at work seems to be part of the Silicon Valley culture, and is reflected in prior hiring decisions, and in judgements about which employees are successful.  Any good machine learning algorithm will learn this fact and use it. This is especially true of deep learning algorithms. Their flexible and complex neural networks, coupled with unlimited training time, will ferret out the true factors that are important in labeling candidates as successful or not, even if the humans making these individual labeling decisions are not explicitly aware of them.  Their black box nature helps to keep the basis for decisions hidden. 

Classical statistical algorithms like logistic regression and discriminant analysis are less prone to this problem.  They force the analyst to define a structural model in advance, one that does not use forbidden predictors, and these algorithms are not free to explore endless combinations and options for predictors in automated fashion.  But they are not the favored tools for dealing with the unstructured text in resumes and, in any case, do not fully solve the discrimination problem if they are trained with data that are labeled in discriminatory fashion.