Have you ever considered the gender bias issue in Machine Learning?

*This blog is my review of paper “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints” in EMNLP 2017

Machine learning shows powerful performance nowadays. However, the paper “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints” points out that machine learning models may be influenced by bias in datasets. This reminds me the Ted talk: How I’m fighting bias in algorithms which found that most of face recognition system can not detect faces of black people.

I think the issue of bias in data is really important because most of machine learning systems, especially deep learning, highly depend on training data, but their labels may include stereotypes in human society. For example, most of cooks in cooking pictures are women. In imSitu visual semantic role labeling (vSRL) dataset, 67% of cooking images have woman in the agent role. The most interesting thing is that models may amplify the bias when testing. This paper introduce Reducing Bias Amplification, RBA, a debiasing technique for calibrating the predictions from a structured prediction model.

The method applies Corpus-level Constraints on the CRF score function on test instances and tries to use Lagrangian Relaxation to optimize the target function. It tests on visual semantic role labeling and multilabel classification tasks. The most wonderful thing is that this approach works as a meta-algorithm and developers do not need to implement a new inference algorithm.

Open Questions (welcome to comment and discuss)

  1. The Corpus-level Constraints is applied on test instances. However, authors should divide data into three parts: training, validation and testing and then calculate the constraints on validation data. This method will be suspected of cheating if it tries to optimize bias issue on testing set.
  2. Is it possible to learn the user-specified margin in Corpus-level Constraints?
  3. How to apply this algorithm on other tasks, for example, segmentation and detection? Is there any bias issue in these tasks?
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.