What I Learned Exploring Bias in My Company’s Technology

Recognizing and addressing bias is key to productivity in our modern world.

Published in

The Startup

12 min readNov 14, 2020

I’m the co-founder and CTO of Anu. Integral to Anu (which I’ll dive into a little later) is understanding in an equitable, scalable way how committed, excited, and knowledgeable someone is about their business. The technology we use relies on Natural Language Processing (NLP) for this task. In this article, I’ll dive into one of the central challenges with deploying models like this into real business scenarios: bias.

Bias is a broad term that can be both a positive and a negative. As humans, we rely on our own biases every day to better understand the world. However, bias can be harmful when we jump to conclusions that automatically put someone at a disadvantage without understanding their unique circumstances [1, 2, 3].

With technology, we can try to reduce or eliminate human bias when judging someone for something important, whether that is for a job, or in the case of Anu. However, bias is not just a human phenomenon. It is also a huge challenge in technology. Recognizing and addressing this bias is important for anyone who wants to bring complex technology to a business setting with real-world consequences.

What underlies bias in machine learning (ML)

Anu began in May of 2020 after my co-founder and I finished graduate school. When starting Anu I was excited to bring tech I learned about in my program to the business world. My background is in data science (primarily in the field of manufacturing), and I went to graduate school to focus on the forefront of research in machine learning at the time. I built on my prior knowledge of basic ML techniques and was able to dive into deep learning, neural networks, and all of their fascinating applications. Some of my favorites include deep fakes [4], adversarial neural nets made to confuse image recognition models [5], and speech/language recognition.

Speech, intention analysis, and grammar are processes that we take for granted on a daily basis. No matter what language you speak, language is hard-wired and second nature to humans [6]. Frankly, we tend to overlook how complex and intricate language actually is. Think about negations, sarcasm, regional dialect, and even just plain old confusing words. Set has 430 unique definitions and senses in the English dictionary, and clip means both to take something apart and put it back together. Here’s a fun thread of more ways the English language can be weird.

Challenge of Understanding Sentence Meaning. From Bored Panda.

How can we expect machines that rely on logic to possibly understand someone who doesn’t have to think twice about what they’re saying?

Well, to put it briefly, we need incredibly complex, fascinating statistical models that take into account word order, distance from other words in a sentence, dependency trees, parts of speech, and much more. Here are some great Medium articles that explain NLP from a high level [7, 8]. It’s important for this discussion to get a sense of how complex these models are: they can have hundreds of thousands of parameters and take days to train [9, 10].

Bias variance trade-off

It’s also important to understand the Bias-Variance Tradeoff. The Bias-Variance Tradeoff is one of the most fundamental principles of data science, governing basically how all models are trained. It states that as a model increases in complexity and is more finely tuned on one set of data, it tends to do worse on the data the model isn’t trained to. As the variance (variability of a prediction given a data point) goes down, its bias (distance a model is from its training data) will increase. Striking a balance between a model that is complex enough to capture the nuance of your data while being simple enough to be widely applicable is at the heart of data science [11].

In order to get passable results for an NLP task, the models need to be incredibly finely tuned to the data they’re trained to, with an enormous number of parameters. A model trained with this specificity to a certain dataset can start to have complications when deployed in an uncontrolled, uncertain setting with data points that are unlike what the model has seen.

Messiness of language data

This really begins to be a problem when we talk about just how messy real-world language data is. When these models are trained, they need pristine, tightly controlled, processed, and clean data. This is why a good portion of getting good results on an NLP task isn’t actually the architecture and training of the model, but preprocessing the data. We need a dataset that keeps as much of the underlying meaning while making it manageable enough to input into a model and train in a reasonable amount of time [12, 13].

For example, in graduate school I had some tasks with 100,000+ unique words in a data’s corpus that I had to clean and cut down to <10,000 tokens to pass into my models. While big companies and research groups are training on machines tens of thousands of times more powerful than my Macbook Pro [14], this problem scales. Just think about training on the 55 million Wikipedia articles, 184 million Yelp reviews, or 200 billion new Tweets every year.

Making a good language dataset can take hundreds of hours of tedious person-power. Due to this barrier, when a dataset is created it is often shared and used thousands of times for widely-varying tasks. This puts an incredible amount of power on these few datasets to shape machine understanding of human language [15, 16, 17]. Check out some of the most common here.

As an illustration, let’s say that I want to build a model to predict how positive or negative a review for a restaurant is. I’ll train the model on hand-labeled reviews of restaurants. Due to language and time constraints, my team builds, labels, and cleans a dataset of 10,000 reviews of restaurants in the United States. I train my model and get satisfactory results, and am now ready to publish. My business is international, however, and someone tries to use the algorithm for a restaurant review in Italy. They are confused because the results of the algorithm make no sense. The words, sentence structure, and grammar of an Italian review differ so much from that of an American review that the algorithm, so specifically tailored for US English, is not able to perform well when looking at something written in Italian.

Where we can find bias in Anu

At Anu, we are changing the way people starting a business find professional services, disrupting traditional referral networks. Introductions to professionals like lawyers, accountants, and publishers rely almost entirely on referrals from professional networks. This is great for the professionals because they trust the people in their network, but access to these networks can be limited by human bias. A professional network is usually composed of people with similar education, background, and interests. When meeting someone who doesn’t fit these characteristics, people tend to be distrustful of them. This again is not necessarily a negative, but rather the way we are hardwired to think.

It is becoming vitally clear that recognizing this bias and addressing it must happen to increase productivity in our modern world.

There are thousands of examples of people who overcame bias to become successful, from Copernicus persevering through bias against a heliocentric model of the solar system [18], to Arlan Hamilton building her venture fund while homeless [19], to Classpass becoming a $600 million business when no one thought Payal Kadakia could do it [20].

Heliocentric Model From **De revolutionibus orbium coelestium** by Copernicus

Think of the countless others who had great ideas that never reached their full potential because of bias. We are more connected than ever before. With this connectivity, it is becoming abundantly clear the power that comes with bringing together people with different backgrounds, values, and ideas [21, 22, 23].

This is the opportunity my company is designed to tackle. Why can’t we introduce professionals and clients outside of their traditional referral networks, while using great technology to maintain the inherent level of trust? Our professionals will see clients they traditionally never would have interacted with before, and we will all but eliminate the human bias that has been baked into referral networks since their inception.

This is a gigantic task. Our beachhead is connecting young startups with their first lawyer. We chose this space is because we saw for ourselves how much a strong professional network indicates success. We’ve seen countless entrepreneurs get shaky legal work (and experience disastrous consequences down the road) just because they didn’t have competent and experienced startup lawyers in their professional network.

In the current iteration, we use two language models to make a determination on how committed someone starting a business is to their idea. This is one of the steps in our vetting mechanism: determining whether an entrepreneur is ready to speak to one of our partner legal professionals. The use of NLP achieves our goal of reducing human bias by not making determinations about clients on a basis of education, race, or background. However, we then introduce the bias that comes with technology discussed earlier.

We didn’t have our own data to train our models on, so the datasets we used are from academic sources. This runs into the issues around the recycling of commonly used datasets. We use BERT as our primary architecture. Even though it is a consistent top performer on neuro-linguistic tasks, it is an incredibly complex model (about 1GB worth of parameters), and almost impossible to dive into and understand the architecture. This means that all of the issues with NLP I talked about before apply directly to the models we use in our business. Darn.

So, what are we doing about it?

There actually is a lot we can do about the bias in our technology. Here are a few steps I’ve taken so far.

Most importantly, we can be aware of the bias.

It seems simple, but knowing bias is there, and talking about it are really powerful ways to start addressing it. If you spend some time and dig down into possible sources of bias, and where it can seep in, this can be a great way to think about how it might impact your business.

Secondly, we did a deep dig into the data we used to train our models. Especially because the data wasn’t our own, we spent some time analyzing the sources of the data, seeing how it was processed, and looking at the raw data itself. It was language data, so we looked to see if it was representative of our business case in terms of important factors like race, education, location, languages, dialects, and more. We did our best to quantify the data in terms of these metrics.

Third, we tested. In our case, since we give people’s written statement a score, I asked people from all different groups to write the same statement in their own words and see how the scores compare. We compared the statements with someone who is a native speaker of a different language, someone from the other side of the country, and people of different ages. We’re continuing to do this as much as we can, with different statements and from many diverse people. This gave us a good understanding of how our models are biased towards certain ways of speaking and certain grammatical styles.

You can see the scores given by our models for Anu’s mission statement written in our own words by myself, a white man from the United States and my co-founder, a black woman from South Africa. Although we are writing essentially the same statement, our model gives varied results. This is anecdotal evidence, so repeating experiments like this over and over will produce a good sense of if the models have biases.

Finally, in the future when we have the capacity to, we will develop our own datasets. If we build our own dataset from the ground up it will not only be the best data for our use case but also will be a dataset built with bias in mind. We will be cognizant of taking material from wide-ranging sources, people from varied backgrounds, and in different mediums. At the end of the day, a model is only as biased as its data. Because of how complex these models are, they will do a great job with the data you give it, so we need to make sure the data is reflective of the full audience of our business.

Final thoughts

Addressing bias is one of the central missions we have at Anu. As a team composed of different genders and races, we have benefited first hand from bringing together different viewpoints, communities, and ways of thinking. We want to lead by example and promote these ideals in entrepreneurship, all while making access to professionals about the strength of a business and passion instead of school or race. Our technology is created to directly combat human biases that build these barriers. Concurrently combating the bias that exists in the technology itself will edge us closer to reaching our goal.

Myself and My Co-Founder Tiyani Majoko. Photo by Willy Lin.

Bias is a problem in tech. This is not a new revelation. There are many thoughtful pieces that have been written about it (here is one of my favorites). It is ironic that the more we rely on technology to reduce human bias, the more the tech-based bias seeps into our company. This is a problem that in all likelihood we will always be trying to solve. No matter how many biases we tediously try to address, dozens more pop up. However, that doesn’t mean steps can’t be taken to mitigate these biases, or at least be aware of them.

Like I said the most important and easiest thing to do is just be aware that your tech has bias, and start to think about it critically.

Check Out Anu Here

References

“Bias”. 2020. Psychology Today. https://www.psychologytoday.com/us/basics/bias.
Suttie, Jill. 2019. “How To Work With The Bias In Your Brain”. Greater Good. https://greatergood.berkeley.edu/article/item/how_to_work_with_the_bias_in_your_brain.
Amodio, David M. 2014. “The Neuroscience Of Prejudice And Stereotyping”. Nature Reviews Neuroscience 15 (10): 670–682. doi:10.1038/nrn3800.
Porup, J.M. 2019. “Deepfake Videos: How And Why They Work — And What Is At Risk”. CSO Online. https://www.csoonline.com/article/3293002/deepfake-videos-how-and-why-they-work.html.
Geng, Daniel. 2018. “Tricking Neural Networks: Create Your Own Adversarial Examples”. Medium. https://medium.com/@ml.at.berkeley/tricking-neural-networks-create-your-own-adversarial-examples-a61eb7620fd8.
Northeastern University College of Science. “Our brains are hardwired for language.” ScienceDaily. www.sciencedaily.com/releases/2014/04/140417191620.htm
Horev, Rani. 2018. “BERT Explained: State Of The Art Language Model For NLP”. Medium. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270.
Dass, Riti. 2018. “The Essential Guide To How NLP Works”. Medium. https://medium.com/@ritidass29/the-essential-guide-to-how-nlp-works-4d3bb23faf76.
Bapna, Ankur, and Orhan Firat. 2019. “Exploring Massively Multilingual, Massive Neural Machine Translation”. Google AI Blog. https://ai.googleblog.com/2019/10/exploring-massively-multilingual.html.
Toole, Jameson. 2019. “Deep Learning Has A Size Problem”. Medium. https://heartbeat.fritz.ai/deep-learning-has-a-size-problem-ea601304cd8.
Formann-Roe, Scott. 2012. “Understanding The Bias-Variance Tradeoff”. Personal Blog. http://scott.fortmann-roe.com/docs/BiasVariance.html.
“Text Preprocessing In Natural Language Processing Using Python”. 2019. Medium. https://towardsdatascience.com/text-preprocessing-in-natural-language-processing-using-python-6113ff5decd8.
Woo, HoSung, JaMee Kim, and WonGyu Lee. 2020. “Validation Of Text Data Preprocessing Using A Neural Network Model”. Mathematical Problems In Engineering 2020: 1–9. doi:10.1155/2020/1958149.
Tung, Liam. 2017. “GPU Killer: Google Reveals Just How Powerful Its TPU2 Chip Really Is | Zdnet”. Zdnet. https://www.zdnet.com/article/gpu-killer-google-reveals-just-how-powerful-its-tpu2-chip-really-is/
Peirsman, Neil. 2019. “Dealing With Data Scarcity in Natural Language Processing.” Medium. https://medium.com/nlptown/dealing-with-data-scarcity-in-natural-language-processing-95ac035fa76b
Gregory, Kathleen. 2020. “A Dataset Describing Data Discovery And Reuse Practices In Research”. Scientific Data 7 (1). doi:10.1038/s41597–020–0569–5.
Tresoldi, Tiago. 2019. “Illustrating Linguistic Data Reuse: A Modest Database For Semantic Distance”. Computer-Assisted Language Comparison In Practice. https://calc.hypotheses.org/1980.
Fitzpatrick, Richard. “Gravity: Historical Background,” 2006. http://farside.ph.utexas.edu/teaching/301/lectures/node151.html.
Hamilton, Arlan, and Rachel L Nelson. 2020. It’s About Damn Time. 1st ed. New York.
Gelles, David. “How Payal Kadakia Danced Her Way to a $600 Million Start-Up.” The New York Times. The New York Times, August 16, 2019. https://www.nytimes.com/2019/08/16/business/payal-kadakia-classpass-corner-office.html.
Marshall, Melinda, Sylvia Hewlett, and Laura Sherbin. “How Diversity Can Drive Innovation.” Harvard Business Review, August 1, 2014. https://hbr.org/2013/12/how-diversity-can-drive-innovation.
Levine, Stuart R. “Diversity Confirmed To Boost Innovation And Financial Results.” Forbes. Forbes Magazine, January 15, 2020. https://www.forbes.com/sites/forbesinsights/2020/01/15/diversity-confirmed-to-boost-innovation-and-financial-results/?sh=6cd9c45cc4a6.
Vaze, Shrikant. “Why Diversity Is Necessary For Innovation At the Workplace.” Entrepreneur, April 17, 2020. https://www.entrepreneur.com/article/349419.