FAQ: What is artificial intelligence, and why do many researchers believe it is prone to bias?
Correction: Karen Hao’s estimate that on YouTube, “80% of what users watch is dictated by recommendation algorithms” is not her own estimate. This comes from YouTube’s algorithm, which estimates around 70%, according to Quartz.
Artificial intelligence is a large part of our daily lives, often in ways that we don’t realize. AI helps Google Maps shave five minutes off your commute, enables Netflix to recommend a new TV show, and assists Bumble with finding your next date.
But, as AI becomes more embedded in our day-to-day lives, its meaning remains nebulous and ever changing. “I don’t feel like the academic community has a good definition” says Calvin Lai, assistant professor of Psychology and Brain Sciences at Washington University in St. Louis. “The definition is constantly evolving,” says Karen Hao, artificial intelligence reporter at M.I.T. Technology Review.
We need to understand what AI is, how AI can become biased, and how to combat the effects of bias. I spoke to AI reporters, researchers and experts for this primer.
Q: What is artificial intelligence?
A: Artificial intelligence is a technology system that can mimic human intelligence to solve problems. AI can “learn, reason and act,” says Hao. AI “learns” by taking in information and instructions about a problem. AI “reasons” by finding patterns using these instructions and information. AI “acts” by making predictions and conclusions based on these patterns.
Q: What is natural language processing?
A: It is a subset of AI that interprets words and voice. It is used most commonly in voice assistants like Apple’s Siri and Google Home.
Q: What is machine learning?
A: It is another subset of AI that uses statistics to find patterns and make conclusions from these patterns. Machine learning is often used to determine a person’s credit score, which is based on data points like income fluctuations, mortgage payments and spending patterns, explains Tom Taulli, author of Artificial Intelligence Basics.
Q: What is deep learning?
A: It is a type of machine learning. In deep learning, data is processed in a neural network. A neural network is made up of interconnected data points. Each data point has a set of instructions, called “parameters,” that the AI system reads to solve the problem.
Deep learning can be used in image-recognition technology. Taulli references a scene from the HBO show, Silicon Valley, to explain deep learning. A computer scientist on the show wanted to create a food-recognition system. He scans a picture of a hot dog as one data point and sets the parameters as: “This is a picture of a hot dog.” He then scans a picture of a french fry as a second data point and sets the parameter as, “This is not a picture of a hot dog.”
Q: What is bias in AI?
A: Bias occurs when your data set is not representative of the intended population, says Cassie Kozyrkov, chief decision scientist at Google, in her blog. This means that because something is incorrect in your data set, you cannot make accurate conclusions that would reliably apply to a larger population.
Q: When does AI become biased?
A: AI can become biased through any step of the process: from the people inputting data, the data sets themselves, or the algorithms that learn from these data sets.
Q: How can bias become injected into AI systems?
A: Programmers are people, and people have biases. “AI systems are created by people with their own experiences, backgrounds and blind spots,” says Taulli. These people are typically white men making more than $90,000 a year, according to a Wired report in 2018. They are “not representative of the society at large,” Taulli emphasizes. So, when interacting with data sets, they may either frame a question incorrectly, or miss out on identifying a bias, because of their blind spots.
According to IBM’s Bias in AI 2018 report, a data set can have “implicit racial, gender, or ideological biases.” For example, if affluent people participate in more activities that collect their data — like banking, online shopping, using smartphones — there will be more data on people with higher incomes. If an AI system is using this high-income data set to calculate a credit score, the system may primarily learn the spending behaviors and habits of wealthy people. So, the system may not accurately score a low-income person. “Your AI may actually learn to be predatory to low-income people,” says Hao.
Q: Are all biased AI systems dangerous?
A: No, manifestations of bias are not always harmful. AI can be used to recommend content to users based on what content they already have watched. Hao estimates that on YouTube, “80% of what users watch is dictated by recommendation algorithms.”
Q: What are some negative effects of biased AI?
A: In an MIT Media Lab study, facial recognition technology was 99 percent accurate when evaluating a white man’s face. But, when evaluating faces with darker skin, there was a 35 percent error rate. In another study, a commonly used facial-recognition data set was “more than 75 percent male and more than 80 percent white” according to a 2018 New York Times article.
Another case: African-American men are “disproportionately represented in mugshot databases,” according to Georgetown Law School’s research. The Georgetown researchers estimated that 117 million American adults are in law enforcement facial recognition databases. So, if computer scientists use law enforcement data to build an AI system, there is a risk that their data would exaggerate the number of African American men who are criminals.
Q: How do we remove bias from AI?
A: We can remove bias retroactively but it is complicated. “It continually surprised me that it is so difficult to change implicit bias,” says Hao. Humans must look back at data sets and determine at what step of the process bias was infused, says Taulli.
Removing, blinding or adding data to a data set before performing analysis are important ways to remove bias.
Removing specific data from a data set can lower likelihood of bias. For example, a company using an AI system to hire candidates may delete race or gender from their datasets because these points often carry implicit bias, says Taulli.
Or, blinding — covering parts of the data from the viewer — can lower the likelihood of biased conclusions. For example, a company might hide job candidates’ names and genders from the first round of culling by managers. That can reduce the impact of implicit biases towards a candidate’s gender or ethnic background.
Also, adding more diverse data can “soften and smooth” out your data set, says Taulli. In this same recruiting process example, if the company has a pool of applicants with low numbers of non-white applicants, they could recruit more racially diverse candidates.
Q: How can you frame your questions to avoid biased answers?
A: By asking the right questions of your data, you can reduce “subjectivity or wiggle room,” explains Lai.
For example, a start-up company might want to hire candidates with leadership and collaboration skills. But those questions might include, “Could I see myself grabbing a beer with this candidate?” or “What hobbies does this candidate enjoy?” This is where bias may seep in, favoring candidates with similar socio-economic, ethnic and gender characteristics. As the start-up grows, the biased data about current employees accumulates, and the AI system that suggests whom the company should hire can reinforce this bias.
Lai suggests setting a specific set of questions from the beginning, like “how does this candidate work in a team?” and “How has this candidate displayed leadership skills in their past roles?”