Machine learning (ML) algorithms are being used to solve real-world problems and that is a great thing. The main warning that many have probably heard by now has to do with algorithmic bias. This post briefly discusses some of the recent research trying to mitigate bias in machine learning algorithms.
All kinds of ML algorithms can have bias. The animal images at the top of this post are used to demonstrate the bias of a deep learning model that detects the animal in an image (I encourage you to try the model using the web interface on that page). The input images are in the top row, and the classification result of the model for each input image is in the bottom row. The model clearly struggles with albino wallaby images (sometimes classifies them as white cats), and apparently has never seen an albino gorilla (always classifies them as some other animal).
ML algorithms learn bias in the training data. That is almost always the source of bias for ML models. While the bias in the above example is benign, in some applications it can have serious life-changing consequences. Hany Farid in the TED talk embedded below discusses one of these cases: bail algorithms. The takeaway is that if the source of the bias is data, how can one identify the bias in the data itself and attempt to remove that?
Hany touches on using causal models to model the data and identify the causal relationship between the attributes. The following Nature article goes into further detail, explaining the three ways causal models can test the fairness of an algorithm. The references in the article include mostly recent research in this area if you’d like to learn more about causal models.
The long road to fairer algorithms
An algorithm deployed across the United States is now known to underestimate the health needs of black patients. The…
More recently in the computer vision (CV) community, the bias problem is also getting more attention. At ECCV 2020, there was a workshop on fair face recognition and a parallel challenge with the goal of comparing a face recognition algorithm on the amount of bias for instance with respect to skin tone. For the competition, a causal model is used to design the evaluation metric. This is a good example of using such models for computer vision algorithms.
Recently, at the main ML and CV conference, more papers on bias mitigation have been published. A class of algorithms attempts to determine when the model should be less confident about input data that is not similar to the training dataset. For instance, if the animal image classification model at the beginning of this post could tell it has never seen an albino gorilla, it would not mistake it for another animal. Hence, this approach can be used to mitigate bias in ML algorithms. The following figure is from a paper that is representative of bayesian algorithms applied to deep learning models.
The two arrows point to the same point in the embedding space. On the left, the deep learning model is confident about the class of the points that are far from the training points. The bottom row shows the same space at a further distance to illustrate the uniform high confidence at even very far points. The right, the bayesian method applied to the deep learning model reduces the confidence of the model for the points far from the training points. Thus one can refuse to classify an input if the model predicts a low confidence.
I have started collecting papers, presentations, and open source projects on algorithmic bias mitigation in a github repository, and I need your help. If you are working on relevant methods or code or come across some, I look forward to your pull requests. This project is supported by Trueface.ai.