Avoiding Algorithmic Bias

Three Scenarios

The three scenarios below are excerpted from “How Algorithms Rule Our Working Lives” by Cathy O’Neil (author of Weapons of Math Destruction)

Scenario 1: Financial Market Crash of 2008

After the financial crash [of 2008], it became clear that the housing crisis and the collapse of major financial institutions had been aided and abetted by mathematicians wielding magic formulas. If we had been clear-headed, we would have taken a step back at this point to figure out how we could prevent a similar catastrophe in the future. But instead, in the wake of the crisis, new mathematical techniques were hotter than ever, and expanding into still more domains. They churned 24/7 through petabytes of information, much of it scraped from social media or e-commerce websites. And increasingly they focused not on the movements of global financial markets but on human beings, on us. Mathematicians and statisticians were studying our desires, movements, and spending patterns. They were predicting our trustworthiness and calculating our potential as students, workers, lovers, criminals.

This was the big data economy, and it promised spectacular gains. A computer program could speed through thousands of résumés or loan applications in a second or two and sort them into neat lists, with the most promising candidates on top. This not only saved time but also was marketed as fair and objective. After all, it didn’t involve prejudiced humans digging through reams of paper, just machines processing cold numbers.

Few of the algorithms and scoring systems have been vetted with scientific rigor, and there are good reasons to suspect they wouldn’t pass such tests. For instance, automated teacher assessments can vary widely from year to year, putting their accuracy in question. Tim Clifford, a New York City middle school English teacher of 26 years, got a 6 out of 100 in one year and a 96 the next, without changing his teaching style. Of course, if the scores did not matter, that would be one thing, but sometimes the consequences are dire, leading to teachers being fired.

There are also reasons to worry about scoring criminal defendants rather than relying on a judge’s discretion. Consider the data pouring into the algorithms. In part, it comes from police interactions with the populace, which is known to be uneven, often race-based. The other kind of input, usually a questionnaire, is also troublesome. Some of them even ask defendants if their families have a history of being in trouble with the law, which would be unconstitutional if asked in open court but gets embedded in the defendant’s score and labelled “objective”.

[The popularity of these predictive algorithms] relies on the notion they are objective, but the algorithms that power the data economy are based on choices made by fallible human beings. And, while some of them were made with good intentions, the algorithms encode human prejudice, misunderstanding, and bias into automatic systems that increasingly manage our lives.

Scenario 2: Automation in Hiring

Finding work used to be largely a question of whom you knew. Companies like Kronos brought science into corporate human resources in part to make the process fairer. The hiring business is becoming automated, and many of the new programs include personality tests. Such tests now are used on 60 to 70% of prospective workers in the US.

Defenders of the tests note that they feature lots of questions and that no single answer can disqualify an applicant. Certain patterns of answers, however, can and do disqualify them. And we do not know what those patterns are. We’re not told what the tests are looking for. The process is entirely opaque. What’s worse, after the model is calibrated by technical experts, it receives precious little feedback.

Sports provide a good contrast here. Most professional basketball teams employ data geeks, who run models that analyse players by a series of metrics, including foot speed, vertical leap, free-throw percentage, and a host of other variables. Teams rely on these models when deciding whether or not to recruit players. But if, say, the Los Angeles Lakers decide to pass on a player because his stats suggest that he won’t succeed, and then that player subsequently becomes a star, the Lakers can return to their model to see what they got wrong. Whatever the case, they can work to improve their model.

Scenario 3: Churn

Naturally, many hiring models attempt to calculate the likelihood that a job candidate will stick around. Evolv, Inc, helped Xerox scout out prospects for its call centres, which employ more than 40,000 people. The churn model took into account some of the metrics you might expect, including the average time people stuck around on previous jobs. But they also found some intriguing correlations. People the system classified as “creative types” tended to stay longer at the job, while those who scored high on “inquisitiveness” were more likely to set their questioning minds towards other opportunities.

But the most problematic correlation had to do with geography. Job applicants who lived farther from the job were more likely to churn. This makes sense: long commutes are a pain. But Xerox managers noticed another correlation. Many of the people suffering those long commutes were coming from poor neighbourhoods. So Xerox, to its credit, removed that highly correlated indicator of churn from its model. The company sacrificed a bit of efficiency for fairness.