Introduction
In the world of AI, a new challenge that is being increasingly discussed is the biases that creep into algorithms. Due to these biases, when we leave it to algorithms to take decisions, there can be unintended consequences.
In the past, developing an algorithm involved writing a
series of steps that a machine could implement again and again without getting
tired or without making a mistake. But today’s algorithms, based on machine
learning, do not follow a programmed sequence of instructions. Rather they
ingest data and figure out for themselves the most logical sequence of steps.
They then keep improving this sequence as they consume more and more data.
Unintended
consequences: Some examples
In March 2016,
Microsoft developed a Twitter chatbot named Tay, built by mining public
conversations. The bot would learn and adapt based on its conversations on
Twitter. But in less than 24 hours, Tay began to spew out offensive
tweets. What began as a fun chat bot designed to engage in “casual and playful”
conversation turned into a PR nightmare.
Facebook developed a feature called
“Memories”, highlighting to users what happened on this date in previous
years. It could remind people of memorable vacations, friends’ weddings,
or other joyful occasions. However, it also reminded people of painful events,
like the anniversary of the death of a family member, or it might ask someone
to say happy birthday to a deceased friend.
Google Translate uses AI and deep learning
to provide automated translation service for dozens of languages. But in November 2017, there were complaints that the algorithm
was sexist. That was because in case of languages like Turkish, there is a
single third-person pronoun “o” with no gender while in English, we typically
use either “he” or “she,” depending on the gender. When translating Turkish to
English, the translation algorithm would decide a gender when translating the
gender-neutral “o,” producing translations like “he is a doctor,” “she is a
nurse,” “he is hard working” or “she is lazy.” The problem was identified not
only for Turkish but also for many other languages that mark gender
differently. While Google quickly fixed the problem, the incident was an
embarrassment to the tech giant.
Needing to deal with millions
of resumes, Amazon developed an algorithm that could screen out potential
applicants. The algorithm looked at patterns in the resumes of previous
successful hires and began applying those characteristics to new applicants.
Unfortunately, the algorithm reinforced the biases of hiring for male-dominated
roles like software engineering. The algorithm taught itself that resumes that
included phrases like “Society for Women Scientists” were less preferred
because they contained the word “women.” In October 2018, the company scrapped
the project.
Princeton University researchers used off-the-shelf machine
learning AI software to analyze and link 2.2 million words. They found that
European names were perceived as more pleasant than those of African-Americans.
The words “woman” and “girl” were more likely to be associated with the arts
while science and maths were most likely connected to males. In analyzing these
word-associations in the training data, the machine learning algorithm picked
up on existing racial and gender biases shown by humans.
Latanya Sweeney, Harvard researcher and former chief technology
officer at the Federal Trade Commission (FTC), found that online search queries
for African-American names were more likely to return ads to that person from a
service that renders arrest records, as compared to the ad results for white
names. The same differential treatment occurred in the micro-targeting of
higher-interest credit cards and other financial products when the computer
inferred that the subjects were African-Americans, despite having the same
backgrounds as whites.
MIT researcher Joy Buolamwini found that facial recognition
software systems failed to recognize darker-skinned complexions. Generally,
most facial recognition training data sets are estimated to be more than 75%
male and more than 80% white. When the person in the photo was a white man, the
software was accurate 99% of the time at identifying the person as male.
According to Buolamwini’s research, the product error rates for the three
popularly used systems were less than 1% overall, but increased to more than
20% in one product and 34% in the other two in the identification of
darker-skinned women as female.
The COMPAS (Correctional Offender Management Profiling for
Alternative Sanctions) algorithm is used by judges to decide whether defendants
should be detained or released on bail pending trial. The algorithm was found
to be biased against African-Americans. Compared to whites who were equally
likely to re-offend, African-Americans were more likely to be subjected to
longer periods of detention while awaiting trial.
The Predictability Paradox
We can thus create
intelligent algorithms in highly controlled environments, to ensure they are
highly predictable in behavior. However, these algorithms will encounter
problems they were not prepared for. Alternatively, we can use machine learning
algorithms to create resilient but also somewhat unpredictable algorithms. This
is the predictability-resilience paradox. While we prefer fully explainable and
interpretable algorithms, the balance between predictability and resilience
inevitably seems to be tilting in the latter direction.
One solution to resolve the predictability resilience
paradox is to use multiple approaches. Thus, in a self-driving car, machine
learning might drive the show but in case of confusion about a road sign, a set
of rules might kick in. Another solution being discussed is the development of
explainable or interpretable systems.
What Can We Learn From these AI Failures?
Part of the power of AI and deep learning
is that AI training can indiscriminately learn all nuances of language, even if
we do not explicitly instruct it to do so. Unfortunately, it can pick up
trends that we would not like it to pick up — such as the inherent gender bias
in our use of language. Thus, AI reinforces biases and stereotypes we may
inherently have, but do not even realize.
AI itself is not fundamentally biased. The
biases in these algorithms are the result of biased training data built by
humans. It is simplistic to suggest that we must collect unbiased data.
Almost all human data is fundamentally biased in some way.
Companies need to incorporate anti-bias
training alongside their AI and ML training, spot potential for bias in what
they are doing, and actively correct for it. And along with the usual testing
processes for software, AI needs to undergo an additional layer of social
testing so that problems can be caught before they reach the consumer and
result in a massive backlash. Additionally, the data scientists and AI
engineers developing the models must be formally trained on the risks of AI.
Most importantly, business leaders need
specialized AI training to understand both the possibilities and the risks.
Executives need to understand data science and AI enough to manage AI centric
products and services and to appreciate the potential shortcomings of AI.
Understanding these dangers is the
responsibility of not just those leading AI initiatives, but all executives. A
PR leader who understands social media dynamics and the vicious troll culture
can avert the dangers of a self-learning AI Twitter bot. An executive familiar
with HR and employment discrimination law can help flag potential dangers of
resume screening bots. And a manager with operating experience across multiple
countries might be able to spot the pitfalls in translating genderless
pronouns.
No single manager has the contextual
knowledge to spot everything that can go wrong. So companies need to train all
their business leaders on AI’s potential and risks, so that every line of
business can spot opportunities and flag concerns.
Concluding notes
Algorithms have the potential to change the world for
the better. To leverage that potential, it is important that confidence in
algorithms is not undermined. We lose confidence in algorithms much more
quickly than we do in human forecasters when both make the same mistake. We are
unforgiving of algorithms when they fail, but we accept that humans are
fallible. Take the case of autonomous cars. According to some estimates,
driverless cars would save up to 1.5 million lives just in the US and close to
50 million lives globally in the next 50 years. Yet, in a poll conducted in
April 2018, 50% of the respondents said they considered autonomous cars less
safe than cars driven by human beings.
When it comes to trust, two factors are very important: control
and transparency. Research indicates that as long as we have some
control, even minimal, trust is significantly enhanced. We should introduce
simple features that give people a feeling that they are in control. When
automated elevators were introduced, people were scared to use them in the
absence of an operator. What reassured lift users was the introduction of a red
button that said “Stop”. The button was no big deal. In fact, if someone pressed
the button, they were directed to use the phone and speak to a remote operator.
Yet, the Stop button made users feel as if they were in control.
Transparency can improve trust in an algorithm. When we know how
the algorithm works, we feel more in control. But again, the reality is
nuanced. When we meet new people, we expect a degree of transparency. We tend
to distrust people who are too cagey and seem to be concealing their thoughts
and intentions. At the same time, we get suspicious when people try to be too
transparent. We start wondering what they are trying to prove. Therefore, for
human beings, there is a right level of transparency. The same logic applies to
algorithms. Too much information about the algorithm can undermine user trust
just as too little.
There are different dimensions of trust that we must
keep in mind when we design an algorithm. In general, customers should be
convinced that the algorithm knows what it is doing, has good motives and
upholds basic human values such as honesty and fairness. That is users must
know how the algorithm works, why it takes decisions in particular ways and
what tradeoffs it makes while making a recommendation. However, not all these
aspects may be important for everyone and for all situations. Different aspects
of trust may become important in different contexts. So, explanations about the
algorithm have to be carefully prepared and calibrated to address the specific
trust issues.
While designing and deploying algorithms, companies must
consider the role of diversity within their work teams, training data, and the
level of cultural sensitivity within their decision-making processes. Greater
diversity can potentially avoid harmful discriminatory effects on certain
protected groups, especially racial and ethnic minorities. The use of cross
functional teams in developing algorithms is emerging as a best practice.
Rather than trying to automate everything, more human involvement is desirable
for better algorithms. Last but not the least, the formal and regular auditing
of algorithms can also help in detecting and mitigating bias.
Hope you found this useful.
No comments:
Post a Comment