Towards Equitable Artificial Intelligence: Strategies for Mitigating Bias and Ensuring Fairness

11 min readMar 8, 2024

Embark on a journey where the prowess of AI algorithms transcends institutional boundaries, empowering private entities and organizations to enhance capabilities across vital realms like healthcare management, financial lending, education admissions, and hiring. In this technological revolution, the omnipresence of AI introduces unprecedented efficiency, yet shadows loom over crucial fairness concerns, making equitable NLP indispensable for fostering just societies and progressive institutions. Though algorithms proved to be highly-performant, their success is fundamentally grounded on the quality of data they were trained on. What if the input rubbish reflects some bias? For both moral and legal reasons, we should ensure that a model’s predictions are not biased towards sex, race, age, or any sensitive attribute relevant in the model’s perimeter. We expect AI systems to be fair.

Unfair systems were already used in the past two decades. You have probably heard of the Amazon’s hiring system biased against women (Reuters 2018), or the COMPAS tool used for assisting decisions about pretrial release and sentencing in the US, biased against Black people (Jeff Larson 2016). More recently, Google’s Gemini multimodal model has encountered calibration problems in bias management (The Guardian 2024).

The sensitive attributes are much broader than race only. They are well-defined and protected by law.

The Article 14 of the European Convention on Human Rights (1953) defines the following as protected attributes: “sex, race, color, language, religion, political or other opinion, national or social origin, association with a national minority, property, birth or other status” .

The EU AI Act will consider an AI system that might classify people based on personal characteristics as an unacceptable risk. Ensuring fair treatment of every person concerned by an AI system thus becomes either mandatory or highly recommended.

This article aims at going through the entire process of understanding bias, detecting it, and finally mitigating it when building AI systems. Among all the ML tasks, binary classification is the easiest on which we can perform fairness manipulation. We will demonstrate some on-shelf methods which already exist by using the German Credit public dataset. Feel free to take it in hand!

Presentation — German Credit Use Case

The German Credit dataset is a pivotal resource for demonstrating principles of AI fairness in financial applications. This dataset encompasses data from 1000 loan applicants, detailing 20 variables to classify everyone as a good or poor credit risk. This classification is pivotal in the context of loan approvals and financial reliability assessments (Hofmann 1994).

Key attributes in the dataset include financial and banking details crucial for evaluating creditworthiness. These variables cover a spectrum of information such as the status of existing checking accounts, credit history duration, the purpose of the loan, and the total credit amount requested.

The German Credit dataset, while focusing primarily on financial attributes, does raise ethical concerns due to its inclusion of sex information. This inclusion could potentially lead to biased AI models. When AI systems are trained on datasets containing gender-related information, there is a risk that the resulting models may inadvertently learn and perpetuate existing gender biases.

The following table provides an overview of the raw dataset.

To use the categorical features, we proceed to a straightforward label encoding, which leads to the following preprocessed dataset.

The dataset features a continuous target variable, which is the credit amount allocated to individuals. Suppose we are interested in determining whether an individual has received a loan amount that is equal to or greater than a certain threshold. By doing so, we convert the regression problem into a binary classification issue. Let us arbitrarily set $2,000 as the threshold above which an individual is considered to have received a favorable outcome. The threshold is exceeded by about 57 % of individuals, thus ensuring the dataset is balanced.

Bias Detection

To assess the fairness of machine learning models, we compare the performance of different approaches with and without bias mitigation techniques. In this analysis, we test one simple model, the Ridge Classifier. The demonstration would be the same with any other inference model.

First, we define a baseline, and then benchmark them against the outputs given by a chosen mitigation technique. After initializing the model, we use the dalex library (dalex s.d.) for evaluating both the performance and the fairness of the model. This library comes with an Explainer object that provides five key fairness metrics to evaluate our model.

lc = RidgeClassifier(random_state=random_state)
#We then initialize an Explainer object.
exp_lc = dx.Explainer(lc, X_test, y_test, verbose=False, 
                      model_type='classification', 
                      label='Ridge Classifier')

It evaluates several performance metrics which are quite common when considering classification tasks (Recall, Precision, F1- Score, Accuracy, AUC).

exp_lc.model_performance().result

While typical performance metrics are important, they alone, are not sufficient for this dataset’s full evaluation since we aim to maintain fairness in data usage. Notably, the Ridge Classifier falls short on fairness, showing noticeable disparities in the treatment of male and female applicants across three criteria: True Positive Rate (TPR), False Positive Rate (FPR), and Statistical Parity (STP).

The library sets a standard for fairness, flagging any ratio below 0.8 or above 1.25 as potentially unfair . This threshold, although arbitrary, is widely used to compare fairness (Disparate Impact n.d.) (Aequitas s.d.). However, it is not, per se, a sufficient condition to assess fairness, and should therefore be used carefully.

fobject_lc = exp_lc.model_fairness(protected = X_test[sensitive], privileged=1)
fobject_lc.fairness_check(epsilon = 0.8)

Output: Bias detected in 3 metrics: TPR, FPR, STP
Conclusion: your model is not fair because 2 or more criteria exceeded acceptable limits set by epsilon.
Ratios of metrics, based on ‘1’. Parameter ‘epsilon’ was set to 0.8 and therefore metrics should be within (0.8, 1.25)

The results can be viewed in a graph which depicts five fairness metrics. It illustrates the ratio between fairness metric values for the unprivileged group and the privileged group. The goal is for this ratio to be as close to 1 as possible, indicating that the model performs according to a given metric. To assess fairness, we often apply the four-fifth rule, which deems a model sufficiently fair if the ratio falls within the range of 0.8 to 1.25.

fobject_rc.plot()

Before delving further into the graph associated to the Ridge Classifier graph, let us offer some general insights to help interpret these metrics significance:

Predictive Parity and Accuracy Equality: These metrics closely resemble traditional performance metrics, such as accuracy. Accuracy Equality assesses the probability of an individual being correctly assigned to their respective class, effectively breaking down accuracy into subgroups. Consequently, base models often exhibit robust performance in these two metrics. The imbalance in the German Credit Dataset, where males comprise 69% and females 31%, is notably significant. This disparity could inherently lead to biases, making it especially crucial to focus on fairness in analysis and model development.
Equal Opportunity and Predictive Equality: Equal Opportunity focuses on achieving equality in the False Negative Rate (FNR), while Predictive Equality aims to equalize the False Positive Rate (FPR). These metrics share a similar underlying philosophy, differing only in whether they compare FNR or FPR. The choice between them depends on the specific fairness objectives one wishes to enforce. Additionally, another metric called Equalized Odds — though not represented on the graph — addresses both FNR and FPR simultaneously.
Statistical Parity: This metric strives for both subgroups to have an equal probability of being assigned to the positive class, irrespective of their target values in the dataset. It proves particularly important when the target variable itself exhibits bias. However, it often turns out to be challenging for base models to achieve this fairness metric when the actual targets are biased.

On the graph below depicting the performance of Ridge Classifier model, it becomes evident that at least two metrics consistently fail to meet this criterion, namely Predictive Equality and Statistical Parity.

Fairness check for Ridge Classificier on German Credit dataset

The graph is interpreted as follows: each horizontal bar represents a fairness metric. A metric is considered perfectly fair if the associated horizontal bar has a value equal to 1. Within the central zone, ranging from 0.8 to 1.25, the fairness disparity is not deemed prohibitive. However, beyond this range, in the grey zone, the difference is considered significant. This is where bias mitigation efforts should be concentrated.

Then, when assessing the performance of Ridge Classifier model according to this convention, the model outputs are unfair for the metrics of Equal Opportunity Ratio, Predictive Equality Ratio and Statistical Parity Ratio

III. Bias Mitigation

Several libraries help mitigate bias, although most findings from academic research are not yet implemented nor accessible for data scientists. Nevertheless, there are robust packages available to specifically address unfairness in binary classification and regression scenarios. IBM’s AIF360 and Microsoft’s Fairlearn are two notable libraries that provide a wode range of algorithms to curb the bias inherent in these algorithms. Additionally, the dalex library remains valuable tool to evaluate the fairness of both types of models .

For this part, we will focus on optimizing models acording to the False Positive Rate (FPR), which is especially critical in credit lending scenarios where banks aim to minimize errors to avoid financial losses. Beyond general vigilance over FPR, it’s essential to ensure equitable treatment across different groups. A naive approach would be to remove the protected column (here, ‘Sex’) from the dataset, and train a model on this modified dataset. However, this approach is insufficient as the information in this variable may be dispersed throughout the rest of the dataset via proxy variables.

This in-processing technique, introduced by Agarwal et al. at the International Conference on Machine Learning in 2018, reduces fair classification to a sequence of cost-sensitive problems, producing a randomized classifier that minimizes empirical error while adhering to fair classification constraints . This bias mitigation technique is interesting for two reasons:

As an in-processing method, it modifies neither the input nor the output data. This is particularly useful when we are also interested in model explainability analyses.
It allows you to choose from several fairness constraints, and can therefore be adapted to different ML scenarios.

A mitigated model can be trained as following by applying the Exponentiated Gradient Reduction technique.

from aif360.sklearn.inprocessing import ExponentiatedGradientReduction

exg_lc = ExponentiatedGradientReduction(
    prot_attr='Sex',
    estimator=RidgeClassifier(random_state=random_state),
    constraints='EqualizedOdds',
)
exg_lc.fit(X_train, y_train)

Fairness improvement

In comparison to the baseline, the results depicted in the graph below reveal significant enhancements across all fairness metrics following the implementation of the exponentiated gradient. Notably, a substantial improvement is evident in the Predictive Equality Ratio, indicating a notable reduction in bias within the False Positive Rate.

Fairness Comparison for Ridge Classificier before and after applying exponentiated gradient

It’s crucial to recognize that achieving a scenario where all metrics align with fairness criteria at a 80% threshold cannot be aken for granted. In fact, this scenario is often rare in real-world applications. This rarity stems from the fact that certain fairness metrics are inherently incompatible with one another, making it unrealistic to expect compliance with all of them simultaneously if the raw data is too biased. Nevertheless, the mitigation technique remains a valuable approach to enforce at least one criterion and assess the model’s performance on the dataset.

As a result, of the 300 rows in the test dataset, 12 points (7 women with essentially few resources, and 5 men) with with incorrect outputs were corrected by mitigation: All 7 women receive a credit afterwards, 3 out of 5 men also benefit from the correction, and the last 2 lose their credit. These two have undefined “Saving accounts” and “Checking account” values. Mitigation therefore corrects the results favorably and fairly for some women, but only to a limited extent for men.

On the other hand, mitigation also works by altering predictions that were right in the first place. These errors benefit women as illustrated below where 10 out of 13 were awarded a loan when they should not have been according to the reference dataset, and only 3 lost theirs unfairly. As for men, 10 out of 15 unfairly lost their loan and 5 were wrongfully awarded one.

Performance vs Fairness tradeoff

The graph below provides a representation of the tradeoff between fairness and accuracy in machine learning, exemplified by the impact of the Exponentiated Gradient Reduction technique on a Ridge Classifier. It illustrates that as fairness metrics improve (increase of FPR parity), particularly in reducing bias in the False Positive Rate, there is a corresponding increase in the overall FPR, which is unintended.

The main take-away is that achieving fairness in machine learning models often involves making concessions for other performance metrics. Striking the right balance between these two objectives is a nuanced and context-specific procedure.

Warning: we cannot evaluate the success of an approach based on a unique mitigation as the algorithms have random steps. Before validating an approach, a broad study on a random sample of models should be conducted. On the previous example, we had a significant increase of FPR parity (~ +50%) at the cost of an increase in the overall FPR. However, this result should be considered carefully as in some cases the FPR parity optimization could be “more important” than the FPR. So, the trade-off between fairness and performance lies in the hands of the data scientist and business teams.

Conclusion

The preceding analysis has shown that algorithmic strategies can enhance fairness in machine learning model predictions. However, these approaches cannot overcome all AI fairness issues. Emphasizing fairness often comes at a cost, impacting other performance criteria, computational time, and the interpretability of the developed model. Additionally, bias mitigation algorithms sometimes fail to fully eliminate bias. Even though we could maintain good performance on traditional measures while satisfying these fairness criteria with other methods, the simplicity of the case studied here must be acknowledged. In practical scenarios, a purely technical approach is likely insufficient. Therefore, the entire corporate activity must adhere to a rigorous ethical framework.

Notably, organizations should focus on several key strategies: dedicating resources to educate and raise awareness among designers, developers, and managers; ensuring a diverse team composition; engaging with relevant social organizations and impacted communities to define fairness and address intersectionality issues; developing clear methodologies, adoption, and governance frameworks for revising AI pipelines sustainably , including steps to detect and mitigate bias; and creating transparency and explainability tools to identify bias and understand its impact on AI decision-making.

Though fairness in AI is a critical component, it solely remains just one aspect of a larger ethical AI framework. For a responsible approach, it is key to recognize the need to address all aspects of ethical AI simultaneously and integrate fairness into a wider reflection that also takes into account transparency and privacy concerns (IBM n.d.) (UNESCO n.d.).

Stay alert for the release of our next article which will mainly focus on transparency concerns!

References

Disparate Impact. n.d. https://en.wikipedia.org/wiki/Disparate_impact.

Aequitas. Understanding Output. n.d. https://dssg.github.io/aequitas/output_data.html.

Fairnlearn. The Four Fifths Rule: Often Misapplied. n.d. https://fairlearn.org/main/user_guide/assessment/common_fairness_metrics.html.

IBM. AI Ethics. n.d. https://www.ibm.com/impact/ai-ethics.

UNESCO. “Recommendation on the Ethics of Artificial Intelligence.” UNESDOC. n.d.