Wednesday, July 9, 2025

AI World - Biases in Algorithms

Introduction

In the world of AI, a new challenge that is being increasingly discussed is the biases that creep into algorithms. Due to these biases, when we leave it to algorithms to take decisions, there can be unintended consequences. 

In the past, developing an algorithm involved writing a series of steps that a machine could implement again and again without getting tired or without making a mistake. But today’s algorithms, based on machine learning, do not follow a programmed sequence of instructions. Rather they ingest data and figure out for themselves the most logical sequence of steps. They then keep improving this sequence as they consume more and more data.

 

Unintended consequences: Some examples

In March 2016,  Microsoft developed a Twitter chatbot named Tay, built by mining public conversations. The bot would learn and adapt based on its conversations on Twitter.  But in less than 24 hours, Tay began to spew out offensive tweets. What began as a fun chat bot designed to engage in “casual and playful” conversation turned into a PR nightmare.

Facebook developed a feature called “Memories”, highlighting to users what happened on this date in previous years.  It could remind people of memorable vacations, friends’ weddings, or other joyful occasions. However, it also reminded people of painful events, like the anniversary of the death of a family member, or it might ask someone to say happy birthday to a deceased friend.

Google Translate uses AI and deep learning to provide automated translation service for dozens of languages. But in November 2017, there were complaints that the algorithm was sexist. That was because in case of languages like Turkish, there is a single third-person pronoun “o” with no gender while in English, we typically use either “he” or “she,” depending on the gender. When translating Turkish to English, the translation algorithm would decide a gender when translating the gender-neutral “o,” producing translations like “he is a doctor,” “she is a nurse,” “he is hard working” or “she is lazy.” The problem was identified not only for Turkish but also for many other languages that mark gender differently. While Google quickly fixed the problem, the incident was an embarrassment to the tech giant.

Needing to deal with   millions of resumes, Amazon developed an algorithm that could screen out potential applicants. The algorithm looked at patterns in the resumes of previous successful hires and began applying those characteristics to new applicants. Unfortunately, the algorithm reinforced the biases of hiring for male-dominated roles like software engineering. The algorithm taught itself that resumes that included phrases like “Society for Women Scientists” were less preferred because they contained the word “women.” In October 2018, the company scrapped the project.

Princeton University researchers used off-the-shelf machine learning AI software to analyze and link 2.2 million words. They found that European names were perceived as more pleasant than those of African-Americans. The words “woman” and “girl” were more likely to be associated with the arts while science and maths were most likely connected to males. In analyzing these word-associations in the training data, the machine learning algorithm picked up on existing racial and gender biases shown by humans.

Latanya Sweeney, Harvard researcher and former chief technology officer at the Federal Trade Commission (FTC), found that online search queries for African-American names were more likely to return ads to that person from a service that renders arrest records, as compared to the ad results for white names. The same differential treatment occurred in the micro-targeting of higher-interest credit cards and other financial products when the computer inferred that the subjects were African-Americans, despite having the same backgrounds as whites.

MIT researcher Joy Buolamwini found that facial recognition software systems failed to recognize darker-skinned complexions. Generally, most facial recognition training data sets are estimated to be more than 75% male and more than 80% white. When the person in the photo was a white man, the software was accurate 99% of the time at identifying the person as male. According to Buolamwini’s research, the product error rates for the three popularly used systems were less than 1% overall, but increased to more than 20% in one product and 34% in the other two in the identification of darker-skinned women as female.

The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm is used by judges to decide whether defendants should be detained or released on bail pending trial. The algorithm was found to be biased against African-Americans. Compared to whites who were equally likely to re-offend, African-Americans were more likely to be subjected to longer periods of detention while awaiting trial.

The Predictability Paradox

We can thus create intelligent algorithms in highly controlled environments, to ensure they are highly predictable in behavior. However, these algorithms will encounter problems they were not prepared for. Alternatively, we can use machine learning algorithms to create resilient but also somewhat unpredictable algorithms. This is the predictability-resilience paradox. While we prefer fully explainable and interpretable algorithms, the balance between predictability and resilience inevitably seems to be tilting in the latter direction.

One solution to resolve the predictability resilience paradox is to use multiple approaches. Thus, in a self-driving car, machine learning might drive the show but in case of confusion about a road sign, a set of rules might kick in. Another solution being discussed is the development of explainable or interpretable systems.

What Can We Learn From these AI Failures?

Part of the power of AI and deep learning is that AI training can indiscriminately learn all nuances of language, even if we do not explicitly instruct it to do so.  Unfortunately, it can pick up trends that we would not like it to pick up — such as the inherent gender bias in our use of language. Thus, AI reinforces biases and stereotypes we may inherently have, but do not even realize.

AI itself is not fundamentally biased. The biases in these algorithms are the result of biased training data built by humans. It is simplistic to suggest that we must collect unbiased data.  Almost all human data is fundamentally biased in some way.

Companies need to incorporate anti-bias training alongside their AI and ML training, spot potential for bias in what they are doing, and actively correct for it. And along with the usual testing processes for software, AI needs to undergo an additional layer of social testing so that problems can be caught before they reach the consumer and result in a massive backlash. Additionally, the data scientists and AI engineers developing the models must be formally trained on the risks of AI.

Most importantly, business leaders need specialized AI training to understand both the possibilities and the risks. Executives need to understand data science and AI enough to manage AI centric products and services and  to appreciate the potential shortcomings of AI.

Understanding these dangers is the responsibility of not just those leading AI initiatives, but all executives. A PR leader who understands social media dynamics and the vicious troll culture can avert the dangers of a self-learning AI Twitter bot. An executive familiar with HR and employment discrimination law can help flag potential dangers of resume screening bots. And a manager with operating experience across multiple countries might be able to spot the pitfalls in translating genderless pronouns.

No single manager has the contextual knowledge to spot everything that can go wrong. So companies need to train all their business leaders on AI’s potential and risks, so that every line of business can spot opportunities and flag concerns.

Concluding notes

Algorithms have the potential to change the world for the better. To leverage that potential, it is important that confidence in algorithms is not undermined. We lose confidence in algorithms much more quickly than we do in human forecasters when both make the same mistake. We are unforgiving of algorithms when they fail, but we accept that humans are fallible. Take the case of autonomous cars. According to some estimates, driverless cars would save up to 1.5 million lives just in the US and close to 50 million lives globally in the next 50 years. Yet, in a poll conducted in April 2018, 50% of the respondents said they considered autonomous cars less safe than cars driven by human beings.

When it comes to trust, two factors are very important: control and transparency. Research indicates that as long as we have some control, even minimal, trust is significantly enhanced. We should introduce simple features that give people a feeling that they are in control. When automated elevators were introduced, people were scared to use them in the absence of an operator. What reassured lift users was the introduction of a red button that said “Stop”. The button was no big deal. In fact, if someone pressed the button, they were directed to use the phone and speak to a remote operator. Yet, the Stop button made users feel as if they were in control.

Transparency can improve trust in an algorithm. When we know how the algorithm works, we feel more in control. But again, the reality is nuanced. When we meet new people, we expect a degree of transparency. We tend to distrust people who are too cagey and seem to be concealing their thoughts and intentions. At the same time, we get suspicious when people try to be too transparent. We start wondering what they are trying to prove. Therefore, for human beings, there is a right level of transparency. The same logic applies to algorithms. Too much information about the algorithm can undermine user trust just as too little.

 

There are different dimensions of trust that we must keep in mind when we design an algorithm. In general, customers should be convinced that the algorithm knows what it is doing, has good motives and upholds basic human values such as honesty and fairness. That is users must know how the algorithm works, why it takes decisions in particular ways and what tradeoffs it makes while making a recommendation. However, not all these aspects may be important for everyone and for all situations. Different aspects of trust may become important in different contexts. So, explanations about the algorithm have to be carefully prepared and calibrated to address the specific trust issues.

While designing and deploying algorithms, companies must consider the role of diversity within their work teams, training data, and the level of cultural sensitivity within their decision-making processes. Greater diversity can potentially avoid harmful discriminatory effects on certain protected groups, especially racial and ethnic minorities. The use of cross functional teams in developing algorithms is emerging as a best practice. Rather than trying to automate everything, more human involvement is desirable for better algorithms. Last but not the least, the formal and regular auditing of algorithms can also help in detecting and mitigating bias.

 

Hope you found this useful.

No comments:

Post a Comment


People call me aggressive, people think I am intimidating, People say that I am a hard nut to crack. But I guess people young or old do like hard nuts -- Isnt It? :-)