Bias in machine learning

If you think AI tech is flawless, think again. Machine learning models can learn to bias results - given the training data they work on. Read on to see this “behavior” exposed.

May 26, 2023 ✧ Tomislav Turniški

8 minutes

all technology

Machine learning is one of the driving forces of AI, and, at the same time, it changes the world as we know it. It transforms the future that promises innovative solutions to rising environmental issues, healthcare challenges, educational obstacles, and challenges from other diverse industries. But with a bright future, a dark secret lurks below the surface - the risk of biasing.

Bias in machine learning can lead to negative outcomes, distort decision-making and sometimes reinforce existing social prejudices.

That’s why we will start with one of the “loudest” examples of ML biasing on the internet - Amazon’s algorithm that taught itself to bias men’s resumes over women’s.

Do you want to know how this happened?

Amazon's hiring algorithm

Bias in machine learning has been a known issue for a long time. Even the largest of companies are not immune to it. Amazon, one of the largest tech giants in the world, had an issue with its hiring algorithms that were biased toward women.

Amazon's idea of putting 100 resumes in an algorithm and having the top 5 chosen you can hire instantly sounded like a perfect solution to their HR department in their "search" for the Holy Grail of the ideal recruitment process.

To train that model, Amazon used ten years' worth of resumes they had received. With the current overrepresentation of males in the tech industry, the model taught itself that male candidates were preferred.

Resumes with the word "women's" in their resume had been downgraded - resumes of graduates of two all-women's colleges located in the USA were declined.

That is just one example of data making machine learning biased. Suppose the data is incomplete, outdated, or has confounding variables (external factors that affect both independent and dependent variables, causing them to appear correlated even when there is no direct causal relationship between them). In that case, it can lead to biased machine learning models.

Allow me to return to the algorithm in question. As mentioned, Amazon had issues with data bias, but that was not all. Another issue was that their algorithm became biased because of the male-dominant data; their labeling system ranked resumes on a scale of 1 to 5, but the historical data used to train it was based on the hiring decision of the company, making it influenced by the existing human bias.

Although not specifically mentioned, measurement bias was probably part of the issue. If certain skills or qualifications were not represented in the resumes in a way the algorithm could understand them, those candidates could not progress any further, thus including the exclusion bias in this combination.

Popular types of bias

In the Amazon example, we saw what can happen when data is not properly prepared for machine learning. Let me explain the most common biases and how they can affect your machine learning algorithms:

Data bias occurs when the dataset used to train an ML model does not represent all the data the model is meant to serve. This can be caused by:

- Underrepresentation or overrepresentation - If certain groups of people are underrepresented or overrepresented, the model's predictions may be biased toward the majority group.

- Sampling bias - If the data collected doesn't accurately represent the population because of the sampling method, it can lead to data bias. For instance, using voluntary surveys can skew data toward those who are more likely to respond.

- Historical bias - If the data reflects historical or societal favoritism, the trained model may perpetuate these biases.

- Exclusion bias - This occurs when some of the data is systematically excluded from the dataset used for model training. Causes can include incomplete data collection or certain data preprocessing decisions - for example, based on age. Healthcare models can display age bias during disease identification and therapy advice. Bias towards young patients in training data may lead to erroneous or less effective healthcare choices for the elderly, resulting from age-dependent prejudices in the algorithm.

Algorithmic bias happens when the algorithm still produces biased outcomes, even when the data is fair and unbiased. The main causes are:

- Preprocessing choices - Decisions made during data preprocessing, such as how to handle missing data or which features to include can create bias.

- Modeling assumptions - All machine learning algorithms make certain assumptions. If these assumptions aren’t defined validly by experts, it can introduce bias.

Measurement bias arises when there are errors in data collection, measurement, or processing. Causes can include:

- Inaccurate measurements biasing - Occurs when the tools or techniques used to collect or measure the data are incorrect.

- Systematic error - This is a consistent and repeatable error associated with faulty equipment or a flawed experimental design.

Label bias occurs when the target labels used for training the machine learning model are incorrect or biased. Causes include:

- Human error or bias - If humans are involved in data labeling, their biases or errors can affect the labels.

- Biased instructions - If the guidelines for labeling tasks are biased, it can lead to unfair labels. So, if labelers are instructed to label images of professional attire with one gender more than the others, the model trained on this data might display the same bias.

Bias in the justice system

Now that we’ve explained the most common types of biasing, let’s turn our attention to another controversial ML discriminating case.

One more famous example of bias in machine learning is COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), a risk assessment tool used in the US judicial system to predict the likelihood of a defendant's re-offending or recidivism.

Developed by Northpointe (now Equivant), the tool's use has sparked controversy due to its alleged racial bias, as revealed in a 2016 analysis by ProPublica.

ProPublica found that COMPAS was almost twice as likely to predict future crime wrongly in black defendants compared to white defendants. Contrarily, white defendants were more likely to be labeled as low-risk, despite the possible quantity of the crimes they might've committed.

The main reason for biasing was based on the following:

- Inaccurate data history that COMPAS used. That data is influenced by societal biases, such as racial profiling or over-policing in certain communities. These biases can be learned and perpetuated by the algorithm.

- Proxy variables are various sets of data that COMPAS used, of which some may act as a proxy for race. For example, factors like zip codes could indirectly introduce racial bias, given the racial segregation in many US neighborhoods.

- Imbalanced data: If the data used to train the model is imbalanced, such as having a disproportionate representation of one racial group over another, it can lead to biased predictions.

The COMPAS case highlights the serious real-world implications of bias in ML and the need for rigorous measures to ensure fairness and transparency in such systems.

Ethical and regulatory aspects of bias

Just like COMPAS impacted many people with its biased views, there are many ethical and regulatory aspects of bias in ML. To avoid future incidents like that, there are ethical and regulatory aspects to consider when using ML to ensure impartiality, frankness, and accountability.

Ethical aspects

Various organizations have proposed ethical guidelines for the use and development of AI and ML.

The “Ethics Guidelines for Trustworthy AI”, developed by the European Commission's High-Level Expert Group on AI, include the principles of beneficence, non-maleficence, autonomy, justice, and explicability as core components. These principles are fundamental in ensuring AI systems' ethical and responsible development and deployment. With those core concepts, there are also:

- Fairness - Models should treat all individuals or groups impartially. This includes avoiding harmful bias or discrimination based on characteristics such as race, gender, age, or disability.

- Transparency - If we fully comprehended how ML models made their decision, it would help us build trust and allow for better scrutiny and accountability over certain processes.

- Accountability: There should be mechanisms to hold those who develop and deploy machine learning models accountable for their performance, including any biased or unfair outcomes.

Regulatory aspects

With the increase of machine learning affecting everyday life in the way we’ve explained, several jurisdictions have started implementing laws that touch upon aspects of machine learning bias:

- General Data Protection Regulation (GDPR) - includes automated decision-making and profiling provisions. It allows individuals to opt out of solely automated decisions that have legal or similarly significant effects.

- Algorithmic Accountability Act - proposed in the US, this act would require companies to conduct impact assessments of automated decision systems and any significant changes to them to evaluate their impact, including correctness, integrity, bias, discrimination, privacy, and security.

As we continue to rely on machine learning, navigation between these ethical and regulatory aspects is crucial. It's a delicate balance between encouraging innovation and ensuring the technology is used responsibly and fairly. The goal should always be to create machine learning systems that respect human rights, social values, and the norms of fairness and justice.

Who is to blame, and how to fix it?

To put this in the perspective of the COMPAS case, even if there is significant controversy and debate around these responsibilities, some people consider developers to have an ethical responsibility to ensure that the tool is as fair and unbiased as possible. This involves using representative and objective training data, transparent development processes, and rigorous testing for potential prejudices. When the bias is identified, developers should make efforts to mitigate it. They should also inform users about the tool's limitations and possible biases.

Courts and judges, as users of the COMPAS tool, also bear responsibility. They should be aware of the limitations and potential prejudices of their tools. This includes understanding that risk scores are statistical predictions, not deterministic forecasts, and can be influenced by systemic bias. Judges should consider risk scores as just one factor in their decision-making process and apply professional judgment.

Regulatory bodies also have a role to play in monitoring and controlling the use of such tools. They can establish guidelines and regulations for AI systems' fairness, transparency, and responsibility. They can also enforce laws against discrimination that biased AI systems may perpetuate.

COMPAS case underscores the need for ethical and regulatory frameworks for AI systems, particularly in high-stakes areas like criminal justice.

Hey, you! What do you think?

They say knowledge has power only if you pass it on - we hope our blog post gave you valuable insight.

If you need a dedicated team to make your AI project come to life, or you just want to share your thoughts on this topic, feel free to contact us.

We'd love to hear what you have to say!