“Lipstick on a Pig — Existing Debiasing Methods Simply Cover-up Systematic Gender Biases in Word Embeddings But do not Remove Them”

11 min readFeb 10, 2020

Wow!! This looks like a powerful statement. Well, this is the title of a recently published NAACL 2019 paper by Hila Gonen and Yoav Goldberg. “Lipstick on a Pig: Debiasing Methods Cover-up Systematic Gender Biases in Word Embeddings But do not Remove Them” has gathered a lot of attention in the NLP community and in particular among the researchers working in the area of debiasing word embeddings. The arxiv version was put only a few months before and its citation count as of Feb 7, 2020, is 52.

The paper argues that the current methods proposed in the literature for removing gender bias in word embeddings are only superficial. It provides a strong claim that the actual methods only hide the bias but not eliminate them with experimental analysis. In particular, it compares two popular gender debiasing methods —one from Bolukbasi et al. [2] and second from Zhao et al. [3]

Background

Word embeddings provide a powerful way of representing words as low dimensional vectors. In some sense, they are like a dictionary, where we look up for a word and get their corresponding bunch of numbers representing their meaning. Word2Vec [10] and GloVe [11] are two popular data-driven approaches that learn the word representations from corpora. The broad idea of these methods is that the words occurring together should have similar vectors. In other words, the cosine of the angle between the two similar words must be as large as possible. These embeddings were shown to be useful for several downstream NLP tasks like analogy prediction, determining part of speech (such as noun, verb, etc.).

Hard-Debiased Algorithm

Bolubasi et. al [2] showed that the word embeddings learned from the publicly available human-generated corpora such as Reddit, Google News not only capture meanings of the words but also the gender stereotypes perceived by society. They defined a gender direction in the vector space as:

gender_direction = word_vectors[‘she’]-word_vectors[‘he’]

The authors observed that most of the ‘gender-neutral’ professions were inclined towards either ‘he’ or ‘she’ direction. For example, ‘Nurse’ was shown to more inclined towards ‘she’ and ‘programmer’ was inclined towards ‘he’.

Word cloud of top occupations words inclined towards ‘she’ (left) and ‘he’ (right)

Using this observation, the authors defined a metric — ‘Direct Bias’ as simply summing up the cosine of the angle between neutral words and the gender direction. The gender bias associated with a word is proportional to the dot product in the gender direction. Ideally, debiased embeddings should exhibit zero-bias in gender direction.

To remove gender bias in neutral words, the authors proposed two algorithms — Neutralize and Equalise (Hard Debiasing), Equalize and Soften (Soft Debiasing). The broad idea of both the methods is to subtract the gender direction from gender-neutral words to remove bias. The authors showed improvements in terms of lesser stereotypical analogies generated from debiased embeddings.

GN-GloVe Algorithm

The debiasing method proposed by Bolubasi et. al [2] was a post-processing step. The embeddings are first learned from corpora and then tuned to remove the projection onto the gender subspace. Zhao et al. [3] proposed a different approach where the debiased embeddings are trained from scratch. The authors allocate exclusive dimensions for capturing gender information. They do so by modifying GloVe's objective function by adding loss terms that ensure that the gendered dimensions capture all the definitional and stereotypical information. The neutralized dimensions are constrained to be in the nullspace of the gender directions.

The GloVe vectors are split into two components — one that is gender-neutral and the other part captures the definitional/stereotypical relations

The authors show significant improvement in using these embeddings than Hard-Debiased embeddings in word similarity and analogy tasks.

Lipstick on a pig

Gonen and Goldberg [1] showed the existing debiasing efforts such as Hard Debiased and GN-GloVe as simply hiding the bias and not removing them. In this way, these efforts are compared to a popular phrase — “lipstick on a pig”.

The problem with Hard Debiased and GN-GloVe algorithm is that both work on the same definition of bias: the projection onto the ‘she’ — ‘he’ axis. The authors show that this definition of gender bias is not complete and the bias information can still be recovered after debiasing. The definition of bias is much more profound and systematic, and simply reducing the projection of words on a gender direction is insufficient to remove bias.

It is important to note that both the methods are based on removing the component of neutral word embeddings from the gender direction. Therefore, the operations between the gender-neutral word embeddings are the same during the debiasing step. This means that the spatial geometry of the words stays largely the same.

The authors make the following observations to support the claim that the “debiased” embeddings from Hard Debiased and GN-GloVe still contain significant bias:

Observation 1: Male-biased and Female-biased words cluster together

TSNE projection of the embeddings into a 2D plane. Adapted from [1]

The authors hypothesized that gender bias can still be recovered by looking at the neighbors of a neutral word. They experimentally verified this fact by performing clustering on the top 500 most male-biased words (in purple) and the top 500 most female-biased words (in green).

The labeling is done with respect to the projection of the biased embeddings onto the gender axis. The clustering is done using the debiased word embeddings. From the TSNE plot, we can clearly observe that the male-biased and female-biased embeddings are clustered separately even when using “debiased” embeddings from Hard-Debiased and GN-GloVe. For the Hard-Debiased embeddings, the clusters align with gender bias labels by 92.5% and for GN-GloVe, the accuracy is 85.6%. These numbers mean that while we cannot directly observe the bias in the gender direction, they are indirectly reflected in the neighbors of a neutral word

Observation 2: Bias-by-projection correlates to bias-by neighbors

The previous observation reveals that the debiased embeddings still have indirect bias information reflected in the neighbors. The authors quantified this by finding a correlation between 2 terms:

Intuitively, this term captures the correlation between “gender bias” in the debiased embeddings and the original bias. The original bias is defined in terms of the projection in the gender direction and the authors define bias in “debiased” embeddings as the fraction of words biased towards male/female out of its 100 nearest neighbors. It turns out that the correlation scores for the “debiased” embeddings are > 0.7 for both Hard-Debiased and GN-GloVe, indicating that the embeddings still hold indirect bias significantly.

The plot of the number of male neighbors for occupation (target) vs original bias score. We can observe a linear relation between them in this plot. Adapted from [1]

Observation 3: Previous female- and male-biased words are classified accurately using simple classifiers

The authors came up with an interesting question— Can a classifier separate the previously biased words using the debiased embeddings?. The authors considered the top 5000 most male and female-biased words and trained an SVM with RBF kernel. The accuracy in the test set was 89% for Hard-Debiased embeddings and 96% for GN-GloVe. This fact illustrates that gender bias can easily be recovered from the “debiased” embeddings.

Conclusion

The aforementioned experiments clearly indicate that — so-called “debiased” embeddings from two popular methods still maintain the gender bias information, though reasonably reduced.

The authors provide three major findings:

Gender biased words still cluster together in the “debiased” space.
The definition of gender bias used by Bolukbasi et al. [2] and Zhao et al. [3] is quite narrow and limited. The authors provide an alternative definition of indirect bias based on the correlation scores mentioned before.
Male and Female biased words are easily separable in the debiased space.

While the gender direction provides a good way to measure bias, they are not the only determining factors for bias. Methods such as Hard Debiased and GN-GloVe, that only aim to eliminate the gender subspace components from the embeddings merely hide the bias than removing them.

Related Works

1. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings [5]

This recently accepted NAACL 2019 paper extends the debiasing idea of Bolukbasi et al. to a multi-class setting. While gender is mostly treated as a binary attribute, other protected attributes such as race, religion are of more than 2 classes. The authors propose a straight forward extension to find a bias subspace in the multi-class scenario by mean subtracting the vectors and followed by a PCA. The Hard-Debiasing and Soft-Debiasing methods from Bolukbasi et al. are then used to remove the bias components from the word vectors. The paper employs a new metric to evaluate bias — Mean Average Cosine Distance (MAC) and shows statistically significant debiasing results. The paper also shows that there is negligible loss of performance in downstream tasks like POS tagging and NER after debiasing embeddings.

The authors discuss a simple extension of Gonen and Goldberg [1] for multi-class cluster bias analysis. The authors found that simply removing the bias component is insufficient to remove multiclass cluster bias — in line with the core idea of Gonen and Goldberg [1]. They observed a strong correlation between original bias and neighbor ratio for protected multi-class attributes like race and religion.

The plot of the number of neighbors exhibiting positive bias with respect to “Jew” as a function of original bias. Pearson’s correlation coefficient is 0.772 for this case, clearly indicating an indirect bias prevalent in embedding. Adapted from [5]

2. Gender-preserving Debiasing for Pre-trained Word Embeddings [6]

This paper provides a different debiasing approach. The authors split the vocabulary into 4 sets — male-specific, female-specific, neutral and stereotypical words. The task is to learn 3 different networks — a denoising autoencoder that projects the embeddings into a lower-dimensional space for all the words, a feminine regressor that takes the latent dimensions of autoencoder as input and outputs a real number close to 1 for female-related words, a masculine regressor that takes hidden dimensions of male-specific words and outputs a value close to 1. The hidden dimensions of the autoencoder are finally used as embeddings.

Along with these losses, we also have an objective that minimizes the projection of the stereotypical words onto the gender subspace. The authors used manual annotation to partition the vocabulary. One of the key highlights of the paper is the ability of the algorithm to debias those embeddings obtained from Hard-Debiased and GN-GloVe further. The embeddings learned preserved better semantic information than debiasing methods, primarily due to the explicit objective functions to preserve gender information and were shown to perform well in word similarity, analogy tasks.

3. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution [7]

This paper proposes a debiasing strategy based on Counterfactual Data Augmentation (CDA) as opposed to the traditional gender basis projection methods. CDA is a method to remove biases by directly altering the training corpora. The corpus is duplicated and the gendered words are reversed. It turns out that word embeddings trained on CDA are still affected by direct bias. To overcome this, the authors proposed two novel contributions — Counterfactual Data Substitution (CDS) and Names Intervention. CDS avoids duplication of corpora, but instead, it applies substitution probabilistically. Names Intervention removes the biases in the first names using a graph partition matching algorithm.

Gender biased words aren’t that clusterable in the debiased embeddings generated by the CDS algorithm. Image adapted from [7]

It was observed that the proposed algorithm reduced both direct bias and indirect bias defined by Gonen and Goldberg [1]. In fact, the cluster purity obtained by these debiased embeddings were shown to be less than 50%. The generated debiased embeddings also performed well in other downstream tasks.

4. A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations [8]

This paper talks about a post-processing method that uses causal inference to remove spurious gender information from the embeddings. The core idea of the paper is based on the following observation— while both gender definitional and non-gender definitional words have similar gender information, gender definitional words have very limited semantic information when compared to gender-neutral words. The task is to eliminate gender information and only consider semantic information in the gender-neutral embeddings. The authors propose a Half Sibling Regression (HSR) framework to solve this problem.

Relationship between Gender-Definitional and Gender-biased neutral words. Image adapted from [8]

The paper performs the experiments proposed by Gonen and Goldberg [1]. It reports state-of-the-art performance in the proposed experiments such as correlation, clustering and classification tests — indicating a decrease in indirect bias.

5. Gender Bias in Contextualized Word Embeddings [9]

While many past works such as Bolukbasi et al. [2] focused on discovering gender bias in word embeddings models such as Word2Vec and Glove, this paper identifies gender bias in contextualized embedding methods such as ELMo [12]. The authors trained the model on a large scale dataset with significant skew and they observed that the ELMo embeddings were extremely sensitive to gender. These biases also propagated to some downstream NLP tasks such as coreference resolution. The paper provides two ways of overcoming the bias in this setting — Data augmentation and Neutralization. The idea behind the ‘data augmentation’ step is to simply flip the gender entities in the dataset and concatenate them to the original dataset [Zhao et al.] [4]. In the ‘neutralization’ algorithm, we do not modify the training instances. Instead, the sentences in the test set are flipped with respect to gender entities and the final contextual representation is the mean of the ELMo representations on the two gender-swapped sentences.

References

[1] Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. Hila Gonen and Yoav Goldberg — Proceedings of NAACL-HLT 2019.

[2] Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V. and Kalai, A.T., 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems (pp. 4349–4357).

[3] Zhao, J., Zhou, Y., Li, Z., Wang, W. and Chang, K.W., 2018. Learning gender-neutral word embeddings. Published at EMNLP 2018

[4] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. and Chang, K.W., 2018. Gender bias in coreference resolution: Evaluation and debiasing methods — NAACL 2018

[5] Manzini, T., Lim, Y.C., Tsvetkov, Y. and Black, A.W., 2019. Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. arXiv preprint arXiv:1904.04047. — NAACL 2019

[6] Kaneko, M. and Bollegala, D., 2019. Gender-Preserving debiasing for pre-trained word embeddings. arXiv preprint arXiv:1906.00742 — ACL 2019

[7] Maudslay, R.H., Gonen, H., Cotterell, R. and Teufel, S., 2019. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution. arXiv preprint arXiv:1909.00871- EMNLP 2019

[8] Yang, Z. and Feng, J., 2019. A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations. arXiv preprint arXiv:1911.10787.- AAAI 2020

[9] Zhao, J., Wang, T., Yatskar, M., Cotterell, R., Ordonez, V. and Chang, K.W., 2019. Gender bias in contextualized word embeddings. arXiv preprint arXiv:1904.03310.- NAACL 2019

[10] Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[11] Pennington, J., Socher, R. and Manning, C.D., 2014, October. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

[12] Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L., 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365.