In recent years, algorithmic bias has become an important research topic in machine learning. Sometimes, because of imbalances in training data or other factors, machine learning models will yield different results for different populations of users, when we want them to treat all populations the same.
At this year’s AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES), with our colleagues we are presenting a paper demonstrating how to help mitigate bias simply by tuning a model’s hyperparameters.
Our method is a variation on Bayesian optimization (BO), which is a technique for sampling input-output pairs to efficiently estimate an unknown function. Our method, which we call fair Bayesian optimization, simultaneously models two functions, one that correlates hyperparameters with model accuracy and one that correlates them with a fairness measure. The approach is agnostic as to choice of fairness measure.
Amazon Science wrote about our approach when an earlier version of our paper won a best-paper award at an ICML workshop. But with the AIES paper, we have released our code using Amazon’s AutoML framework AutoGluon. In this post, we’d like to demonstrate how to apply constrained BO to mitigate unfairness while optimizing the accuracy of a machine learning model, using our code.
How to use fair BO
As a running example, we are going to use the German Credit Data dataset from the UCI Machine Learning Repository. The dataset is annotated for a binary classification task, predicting whether a person is a “good” or “bad” credit risk. The dataset is unbalanced, with more than twice as many positive examples as negative ones. The unbalance is ever higher if we focus our attention to two subgroups: foreign and local workers.
First, we need to choose a base model, whose hyperparameters we will tune. In this example, we select a random forest and tune three hyperparameters: min_samples_split, max_depth, and criterion.
We also need to select a fairness measure. We use the notion of statistical parity, which holds that the probability of a positive classification should be the same across subgroups. More precisely, we use difference in statistical parity (DSP), which requires that for two subgroups, A and B, the difference between their probabilities of positive classification should fall below some threshold, ϵ.
We next create the black box to optimize and set a fairness constraint on the DSP between local and foreign workers, with a value of ϵ equal to 0.01.
We are now ready to create the scheduler and searcher and run a hyperparameter-tuning experiment through constrained Bayesian optimization:
Let’s compare the models obtained by using standard BO and constrained BO (CBO) after 50 iterations:
In the plots above, the horizontal line is the fairness constraint, set to DSP ≤ 0.01, and darker dots correspond to later BO iterations. Standard BO (left) can get stuck in high-performing yet unfair regions, failing to return a well-performing, feasible solution. Our CBO approach (right) is able to focus the exploration over the fair area of the hyperparameter space and finds a more accurate fair solution.
Feel free to check out our code in AutoGluon. We have also published a full tutorial on the use of our code.