How Fair Are Language Representations?

Sunipadev
The Startup
Published in
3 min readOct 22, 2020

Distributed representations are a widely used tool in machine learning and data mining because if their ability to compress large scale data from the real world into representations with relatively few dimensions. Images, text, networks and graphs are increasingly expressed in this format before their usage in different downstream tasks.

For text or language representations, GloVe and word2vec were significant advancements over the extremely high-dimensional and sparse one hot encodings. Both were representations of about a few hundreds in dimensions and exhibited meaningful linear relationships between words. For example, analogies like ‘man:woman::king:?” returns ‘queen’. Alongside, these embeddings were able to interpret word similarities meaningfully by vector distances, enabling them to be clustered or get their synonyms predicted. It even aids translation tasks using nearest neighbor predictions.

However, along with learning meaningful word relationships, they have also been known to learn and amplify invalid and often stereotypical associations, like preferential association of adjectives like ‘important’ or ‘strong’ or ‘intelligent’ or occuptions like ‘doctor’ or ‘programmer’ to statistically male names or words like ‘man’, ‘father’, etc. Similar disparate associations of occupations, adjectives, sentiments are seen across different races, age groups, nationaities and so on.

The image depicts the association of different adjectives along the male-female axis. Average female vector computed from words woman, she, girl, mother, lady and the avergae male vector from words man, he, boy, father, gentleman. We see how adjectives like glamorous and shy are closer to the avg female term as opposed to avg male term which is more associated with strong and important.

These associations are learnt due to statistical distributions of occupation and sentiment words across different genders, races, ages, nationalities and so on. While representative of the underlying data, the embeddings are also prone to amplifying it. Further, they enable the propagation of these associations into downstream tasks and applications, for e.g., in the form of unfair predictions. What this means is, statistically, in the data, even if most doctors are men, it should not lead to prejudice about whether people of any gender are or can be doctors. Additionally, it should not lead to discrimination based on gender when selecting between applicants.

For instance, let’s consider three sentences:

(a) The doctor bought a coat, (b) The woman bought a coat, and (c) The man bought a coat.

The task is, given sentence (a), we have to establish a directional relationship and predict if it entails, contradicts or is neutral to sentences (b) and (c). As humans we would see that sentence (a) does not specify the gender of the doctor and hence we would categorize it to be neutral to sentences (b) and (c). However, on using GloVe vectors and standard models for the task of textual entailment, sentences (a) predicts sentence (b) to be a contradiction but (c) to be an entailment. So, the preferential associations learnt of a stereotypical nature, are being propagated by the representations, leading to harm in downstream tasks as well.

The question then becomes, how do we mitigate these invalid and unfair associations before they percolate into tasks and result in undesirable and possibly harmful outcomes? There are a few different methods aimed at this, and I will describe one of them here which is amongst the most straightforward.

The first step is identifying the subspace within the distributed representation which captures the most amount of bias, so say the gender bias subspace. For this, we need seed words first. Say the wordsets used in the image above:

F = {woman, she, mother, girl, lady} and M = {man, he, father, boy, gentleman}.

First, we determine the averages of each group, say f and m respectively. Then, the vector v = f -m/||f-m|| captures the subspace of their difference which is what the bias is along.

Given this subspace of bias v, to reduce its affect on other vectos in the space, we do a step of linear projection. So for every vector u in the embedding space, we redefine it as

u’ = u- <u,v>v.

As a result, all vectors now lie in a lower dimensional space where they retain all information as the original embedding apart from the information along the component of bias. Empirically, this has been known to show improvements in both intrinsic associations between vetors by minimizing invalid ones. It also improves upon the sentence relation tasks as described above and increases the probability of neutral classification.

More information about the method, the data and the tests can be found in:

[1] Sunipa Dev and Jeff M Phillips; Attenuating Bias in Word Vectors; AISTATS 2019. https://arxiv.org/abs/1901.07656

[2] Sunipa Dev, Tao Li, Jeff M Phillips, Vivek Srikumar; On Measuring and Mitigating Biased Inferences of Word Embeddings; AAAI 2020. https://arxiv.org/abs/1908.09369

--

--