Large language models (LLMs), including those that power chatbots such as ChatGPT, make racist judgements on the on basis of users’ dialect, a preprint study has found1.
Researchers found that some artificial intelligence (AI) systems are more likely to recommend the death penalty to a fictional defendant presenting a statement written in African American English (AAE) — a dialect spoken by millions of people in the United States that is associated with the descendants of enslaved African Americans — compared with one written in Standardized American English (SAE). The chatbots were also more likely to match AAE speakers with less-prestigious jobs.
Bias detectives: the researchers striving to make algorithms fair
“Focusing on the areas of employment and criminality, we find that the potential for harm is massive,” wrote study co-author Valentin Hofmann, an AI researcher at the Allen Institute for AI in Seattle, Washington, on X (formerly Twitter).
The findings show that such models harbour covert racism, even when they do not display overt racism, such as suggesting negative stereotypes for people of a given race. The team, whose study was posted to the arXiv preprint server and has yet to be peer reviewed, found that the conventional fix of retrospectively using human feedback to try and address bias in models had no effect on covert racism.
The paper highlights how superficial methods to remove bias from AI systems “simply paper over the rot”, says Margaret Mitchell, an AI researcher focusing on ethics at Hugging Face, a New York City-based company that aims to expand access to AI. Efforts to tackle racism after the model has been trained, rather than before, “make it harder to identify models that are going to disproportionately harm certain subpopulations when deployed”, she adds.
Hidden bias
LLMs make statistical associations between words and phrases in large swathes of text, often scraped from the Internet. The overt biases that they derive from these data, such as linking Muslims with violence, have been well studied. But the existence of covert racism has been less explored.
Hofmann and his colleagues tested versions of five LLMs, including GPT, developed by AI-research organization OpenAI, based in San Francisco, California, and RoBERTa, developed by Meta, based in Menlo Park, California. They presented the models with around 4,000 X posts written in either AAE or SAE.
Around 2,000 data points were made up of an SAE post paired with an AAE post of identical meaning — for example, “I be so happy when I wake up from a bad dream cus they be feelin too real” in AAE, and “I am so happy when I wake up from a bad dream because they feel too real” in SAE. A further 2,000 texts carried different meanings, which the authors added to capture real-world potential differences in content written in different dialects.
AI can be sexist and racist — it’s time to make it fair
First, the authors presented the AIs with texts in both dialects and asked them to describe what the person who said it “tends to be” like. They found that the top associated adjectives for AAE texts were all negative — including ‘dirty’, ‘lazy’ and ‘aggressive’. Comparing the results with a long-term study of associations made by humans, the team found that the models’ covert stereotypes were more negative “than any human stereotypes about African Americans ever experimentally recorded”, and closest to the ones from before the US civil rights movement.
The team also looked at whether covert racism would affect the decisions that the model made. They found that, when asked to match speakers with jobs, all of the models were more likely to associate AAE speakers with jobs that do not require a university degree, such as a cook, soldier or guard. Looking at potential consequences in a legal setting, the models were next asked to acquit or convict a defendant on the basis of an unrelated text spoken by the defendant. The authors found a much higher conviction rate when the defendant spoke AAE, at roughly 69%, compared with 62% for the SAE defendants.
The model was also more likely to sentence to death hypothetical defendants that were guilty of first-degree murder if their statement was written in AAE — at 28%, compared with 23% for SAE.
‘Fundamental limitation’
“This is an important, novel paper,” says Nikhil Garg, a computer scientist at Cornell Tech in New York City. Covert biases could influence a model’s recommendations in sensitive applications, such as prioritizing job candidates, adds James Zou, a researcher in applied machine learning at Stanford University in California.
Moreover, the study “speaks to a seemingly fundamental limitation” of LLM developers’ common approach to dealing with racist models — using human feedback to fine-tune them after the model is already trained, says Garg.
The researchers found that, in similar experiments in which the model is directly told whether someone is Black or white, overt stereotypes were less pronounced in the models that incorporated human feedback, compared with models that didn’t. But this intervention had no clear effect on covert racism on the basis of dialect.
“Even though human feedback seems to be able to effectively steer the model away from overt stereotypes, the fact that the base model was trained on Internet data that includes highly racist text means that models will continue to exhibit such patterns,” says Garg.