While bias in generative AI is a well-known phenomenon, it’s still surprising what kinds of biases sometimes get unearthed. TechCrunch recently ran a test using Meta’s AI chatbot, which launched in April 2024 for over a dozen countries including India, and found an odd and disturbing trend.
When generating images using the prompt “Indian men,” the vast majority of the results feature said men wearing turbans. While a large number of Indian men do wear turbans (mainly if they’re practicing Sikhs), according to the 2011 census, India’s capital city Delhi has a Sikh population of about 3.4%, while the generative AI image results deliver three to four out of five men.
Unfortunately, this isn’t the first time generative AI has been caught up in a controversy related to race and other sensitive topics, and this is far from the worst example either.
How far does the rabbit hole go?
In August 2023, Google’s SGE and Bard AI (the latter now called Gemini) were caught with their pants down arguing the ‘benefits’ of genocide, slavery, fascism, and more. It also listed Hitler, Stalin, and Mussolini on a list of “greatest” leaders, with Hitler also making its list of “most effective leaders.”
Later on that year in December 2023, there were multiple incidents involving AI, with the most awful of them including Stamford researchers finding CSAM (child abuse images) in the popular LAION-5B image dataset that many LLMs train on. That study found more than 3,000 known or suspected CSAM images in that dataset. Stable diffusion maker Stability AI, which uses that set, claims that it filters out any harmful images. But how can that be determined to be true — those images could easily have been incorporated into more benign searches for ‘child’ or ‘children.’
There’s also the danger of AI being used in facial recognition, including and especially with law enforcement. Countless studies have already proven that there is clear and absolute bias when it comes to what race and ethnicity are arrested at the highest rates, despite whether any wrongdoing has occurred. Combine that with the bias that AI is trained on from humans and you have technology that would result in even more false and unjust arrests. It’s to the point that Microsoft doesn’t want its Azure AI being used by police forces.
It’s rather unsettling how AI has quickly taken over the tech landscape, and how many hurdles remain in its way before it advances enough to be finally rid of these issues. But, one could argue that these issues have only arisen in the first place due to AI training on literally any datasets it can access without properly filtering the content. If we’re to address AI’s massive bias, we need to start properly vetting its datasets — not only for copyrighted sources but for actively harmful material that poisons the information well.