Bias in Machine Learning

Kushal Gupta
5 min readAug 24, 2024

--

Identifying, Measuring, and Mitigating

In the realm of machine learning, bias isn’t just a technical issue — it’s a reflection of deeper societal challenges. Consider a scenario where a company develops a machine learning model to screen job applicants. The model, trained on historical hiring data, consistently favors male candidates over female candidates for technical roles. The company’s data scientists are perplexed. How could an algorithm, which is supposed to be objective, exhibit such biased behavior? The answer lies in understanding the nature of bias in machine learning.

Understanding Bias: The Basics

At its core, bias in machine learning occurs when an algorithm produces results that are systematically prejudiced due to erroneous assumptions in the machine learning process. These biases can arise from various sources, including the data used to train the model, the way the data is labeled, or even the algorithm itself.

As data scientist Cathy O’Neil once said, “Algorithms are opinions embedded in code.” This highlights the fact that any algorithm is only as objective as the data and assumptions it is built upon.

Example: In our job screening example, the model was trained on historical hiring data where men were predominantly hired for technical roles. The model learned from this data and began to associate technical competence more strongly with male candidates, leading to biased outcomes.

This type of bias is often referred to as historical bias, where the model inherits the biases present in the training data. Other common types include selection bias (when the data used to train the model is not representative of the population it is applied to) and measurement bias (when the data collected is not accurate or consistent).

Measuring Bias: Quantifying the Problem

Identifying bias is only the first step. The real challenge lies in measuring it. This involves comparing the model’s performance across different groups (e.g., gender, race, age) to see if it systematically favors one group over another.

As AI ethics researcher Ruha Benjamin notes, The technologies we create often reflect the biases we have.Measuring bias is about ensuring these reflections don’t lead to unjust outcomes.

One common metric used is disparate impact, which measures whether the model’s decisions disproportionately affect a particular group. For instance, in the job screening example, if the model recommends 70% of male applicants but only 30% of female applicants for interviews, this would be a clear indication of bias.

Another approach is using equalized odds, which evaluates whether the model’s error rates are the same for all groups. If the model incorrectly rejects female candidates more often than male candidates, it indicates a biased error rate.

Real-Life Scenario: In 2019, a widely reported case involved an AI algorithm used by a major tech company to assist in hiring. The model, trained on resumes submitted over a ten-year period, was found to downgrade resumes that included the word “women’s,” as in “women’s chess club captain.” The bias stemmed from the fact that the majority of resumes the model had been trained on were from men, leading it to favor male applicants.

Mitigating Bias: Strategies for Fairer Models

Once bias is identified and measured, the next step is mitigation. Various strategies can be employed to reduce or eliminate bias from machine learning models.

1. Data Preprocessing:
— One approach is to modify the training data to remove or reduce bias. This can involve balancing the dataset so that underrepresented groups are more equally represented or correcting for any inaccuracies in the data.

Example: In the job screening scenario, the company could re-train the model on a dataset that includes an equal number of male and female candidates, ensuring that the model learns to treat both groups fairly.

2. Algorithmic Adjustments:
— Another method is to adjust the algorithm itself. Techniques like fairness constraints can be incorporated into the model to ensure that it does not disproportionately favor one group over another.

Example: By introducing fairness constraints, the job screening model could be adjusted to ensure that its recommendations are equally likely to include qualified candidates from both genders.

3. Post-Processing:
— This involves adjusting the model’s predictions after it has been trained to reduce bias. For example, if the model is found to be biased, its outputs can be corrected to ensure fair treatment across different groups.

Example: After running the job screening model, the company could apply a correction to the model’s recommendations to ensure gender parity among the shortlisted candidates.

4. Continuous Monitoring and Feedback:
— Bias mitigation is not a one-time process. Models need to be continuously monitored and updated as new data comes in. Regular audits can help identify emerging biases and ensure that the model remains fair over time.

Real-Life Scenario: Financial institutions use machine learning models to assess creditworthiness. These models must be regularly updated and audited to ensure they do not inadvertently discriminate against certain groups. By implementing continuous feedback loops, banks can adjust their models to align with changing social and regulatory standards.

As machine learning pioneer Andrew Ng puts it, “AI is not just about machines; it’s about making better decisions. Mitigating bias is key to ensuring these decisions are just and equitable.

Conclusion: The Path Forward

As machine learning becomes increasingly integrated into decision-making processes across industries, addressing bias is crucial. Unchecked bias can lead to unfair outcomes, perpetuate existing inequalities, and erode trust in AI systems. By understanding the sources of bias, employing robust measurement techniques, and adopting effective mitigation strategies, data scientists can build models that are not only accurate but also fair and just.

The goal is not just to create intelligent systems but to ensure that these systems reflect the values of equity and fairness in society. As we move forward, it’s essential to remember the words of renowned data ethicist Virginia Eubanks: “The most important part of any algorithm is the person who is accountable for its effects.”

--

--

Kushal Gupta
Kushal Gupta

Written by Kushal Gupta

ML Engineer | Full Stack Developer | Data Analyst | Final Year Undergrad Student at GGSIPU

No responses yet