Bias in the System

“Grid distortion” by Marius Watz*

These last few weeks we’ve been in the process of designing an AI application that will help people learn new languages faster and more fluently. Our current concept focuses around an app where real people of different languages talk with one another and an AI facilitates the conversation and ensures that both sides are understanding one another. Users would be matched up based on what language they are wanting to learn and by conversing together both sides would gain practical conversation skills. The conversations happen asynchronously so that the AI assistant can work with each side, helping them both understand the spoken word as well as craft a coherent reply. Conversation partners would periodically switch what language is being spoken, allowing for each person to learn and teach.

For this application to be widely useful some bias and accessibility areas need to be addressed. The main feature of this application is a system that can understand spoken languages and convey that meaning to a user of another language. Our main concern here is that the model will only be well trained on major language groups, and so the application won’t be useful for people from minority language groups. Another important aspect is being able to understand differing accents and give users feedback on correction pronunciation. This requires, training the model on many different accent variations, and again some groups may not be well represented so the application will struggle with these cases. Each user, depending on their background, learning style, and culture will progress differently and the system needs to be able to teach all of them effectively. If the system is not trained on a large set of people from varying ages, languages, cultures, and socio-economic backgrounds it won’t be able to effectively teach or engage the user.

Most of these challenges can be resolved by gathering the right data and being intentional throughout the design process. If, from the beginning we set out to gather a diverse set of languages and include as many accent variations as possible our model with have a strong base. On a personal level this means that when we conduct research we need to go out and speak with the people that we have trouble understanding. These are the individuals that the system is most likely to have trouble understanding and are therefore the people who most benefit from having a tool that helps them be understood.

To improve our application continually as it is used data from our users conversations could be used to further train the model. If the system is ever unsure of what a user has said it can ask for a written translation of the words, so that it can add to it’s understanding. By gathering a diverse set of languages and accents and a system that understands how they are used in context we can improve natural language processing algorithms as a whole.

It is impossible to create a system which works perfectly for all users, there will always be bias in the system. Our goal is to reduce this as much as possible so that many people can become fluent in new languages and connect with one another. By helping people across the world understand one another we can minimize the bias in our selves and improve the world!

--

--