How we aim to raise awareness of and create a database for machine translation gender bias…

5 min readAug 3, 2022

… and how this originated at the Goethe-Institut’s Artificially Correct Hackathon

In the field of translation studies and professional translation and research, the topic of bias in machine translation (MT) is not new. But by now, laypersons are increasingly becoming aware of and (personally) affected by such occurrences of bias in MT. This is intensified due to the heightening discussions about diversity and inclusivity, gender and social impact in society. The stereotypes and perceptions we form in society are captured and represented in text, and with that, also in translation. As MT systems are trained on the data humans create, how can we be surprised that they translate a text with bias connotations, if our society (and our accumulated data over the past decades) is full of biases?

Our Foundation

The presence of bias in MT is something we aim to change. Luckily for us, the Goethe Institut is equally interested in raising awareness of the topic and in spurring research in the field of bias in MT. For this reason, last October they hosted the Artificially Correct Hackathon. Check out their website for memories of the hackathon and and the interview with our team (one of the two winning teams of the hackaton), where we present our solution and talk about our hackathon-experience. From our initial team of five, where we all equally contributed to the ideation and creation of our project, two of us decided to extensively invest time and continue our work on the project. The current DeBiasByUs team consists of Joke Daems, soon-to-be Assistant Professor at Ghent University (who, at the time of the hackathon, was a Post-Doc), and myself, Research Associate at the Cologne University of Applied Sciences, (at the time of the hackathon I was writing my MA thesis).

Our Aim

We want to raise awareness by informing people where gender bias in MT comes from and how it comes into play when using an online MT system such as Google Translate, DeepL or Microsoft Translator. We want to explain the science behind this phenomenon to laypersons, but also to translation students or anyone learning about or researching in this field, and start a discussion about what we can do to change this. The better we understand the issue of bias in MT, the more we can focus on forming language in everyday life to be more inclusive and to naturally help create more inclusive datasets. Ideally, such naturally inclusive datasets can be fed to machines, to train them with naturally inclusive language, thus yielding unbiased translation outputs.

However, the process of shaping everyday language to become more inclusive will take time, as any other change in society does. Until then, we aim to work with the data we have now and aim to balance this data to help machines create less biased translations. To do so, we will create a community-driven database of examples of cases of bias in machine translation that can currently be found “in the wild”. Online commercial MT systems are constantly developing alongside research and society. Even though there have been numerous attempts at providing unbiased outputs, the ideal solution has not yet been found. Our database is aimed to help spur research in this field by providing a large sample of occurrences of bias outputted by such online MT systems, that can be used to analyse the problem and worked with to come up with potential solutions.

Our Concept

To include these two aspects — raising awareness and creating a database — we will be creating a public platform. During the hackathon weekend, we have created a proof of concept platform using Wix to showcase our solution.

To raise awareness about the topic of (gender) bias in MT, we will provide extensive information about the topic, its origins, its impact in society, and new research in the field. Anyone working in this field or interested about this topic can check out our website and find information.

Next to providing context and information about the topic of bias in MT, we have created a landing page where anyone is welcome to add cases of bias in a MT output that they have encountered online.

Adding examples of bias in MT — DeBiasByUs.

This is simply done by adding the source sentence and source language as well as the target language and the biased target sentence (as given by an online commercial MT system). Optionally, the user can type in an unbiased target sentence, as they would have preferred it. Optionally, bias categories (note: above not yet changed from the hackathon weekend) can be selected and comments can be added. The checked and proved datasets will be open-source data and thus freely available for download.

Next up

Following up from the Goethe-Institut’s Artificially Correct Hackathon, we have presented our DeBiasByUs project at the 23rd Annual Conference of the European Association for Machine Translation in 2022 and published our first paper. Together with Ghent University, we are currently developing our own website to serve as our platform, to raise awareness and to collect and save a database. We are also in talks with the developer of Fairslator, Michal Měchura — whom you may already know from this article — to develop a plug-in that users can use on top of online commercial MT systems (such as those mentioned above) to directly add a target sentence containing bias to our database. The user may choose the potentially resolved version provided by Fairslator, or provide their own unbiased example. We aim to have a fully functioning platform and plug-in up and running by the end of the year, as well as a sample database ready for download.

This is a community-driven project and all your opinions and inputs matter. So please reach out to us if you would like to discuss or comment on our project. We would love to hear from as many of you as possible!

How we aim to raise awareness of and create a database for machine translation gender bias…

Our Foundation

Our Aim

Our Concept

Next up

Written by Janiça Hackenbuchner