Design Inclusive Machine Learning

Design and engineer systems that truly bring value to human’s lives

“Alexa, play Explosions in the Sky from Spotify.”

“Alexa, what’s the weather now?”

“Alexa, wake me up at 7:30.”

Since getting Amazon Echo Dot, aka Alexa, I talk to her almost daily. Our conversations are usually very simple and task-oriented, because, well, she only knows basic commands and is highly dependent on skills she knows.

However, the fact she is able to execute my asks, as basic as they are, makes her a superior virtual assistant. To be fair, I haven’t had the chance to compare her with other assistants besides Siri, which I now rarely interact with despite owning several Apple products. The reason I avoid Apple’s virtual assistant is that Siri has a hard time understanding me — or, more accurately, my accent.

Recognizing human intention

Virtual assistants, like many other digital products we use, are a type of machine learning application. Tons and tons of data are fed into such an application to help it see, hear and understand the world. Like how we learn to recognize spoken words, in order for a virtual assistant to recognize speech in a wide range of contexts, it needs to be trained with numerous variations of audio samples in speed, tone, pitch, volume, and pronunciation.

Failing to properly train virtual assistants before shipping will very likely aggravate users — or worse, make them lose trust in speech recognition altogether. This is exactly what happened to me five years ago, when I first encountered Siri on my then-brand new iPhone 4S. Oftentimes, the words Siri detected from my speech were hardly accurate, and I felt almost ashamed of not being understood by Siri.

In this case, understanding every user is the crucial first impression the machine has to make.

Evolving as human-machine interaction increases

To better understand the specific audience a virtual assistant is interacting with, a robust feedback loop needs to be in place to allow it to continue taking in new inputs and evolving as human-machine interaction increases.

Amazon claims that since Alexa, the brain behind Echo, is built in the cloud, it’s always getting smarter. In other words, the more we use Echo, the more it adapts to our speech patterns, vocabulary, and personal preferences.

Though I don’t have concrete evidence that Alexa is adapting to how I speak, I do feel she is getting better at understanding me.

For example, the first time I asked Alexa “When is sunset”, she didn’t recognize it and interpreted the sentence as “Why is sunset”.

I guess she was confused about my “h” sound in “When”, which in theory, should be silent.

The second time, without changing my pronunciation, I put more emphasis on the word “When”. For some reasons, she recognized my intention, and gave me a proper response: “Sunset is at 8:19 p.m..”.

The third time and beyond, even when I followed the same speech pattern as my first attempt, with no word emphasized and a clear “h” in “When”, she got it, and told me the time of sunset.

The role of design in machine learning

Anecdotes like these illuminate the role of user experience design in machine learning. However, without a strong technical background, how do designers approach machine learning?

A few months ago, I worked on a client project that required some understanding of machine learning. During this project, I put a lot of effort into researching high-level machine learning concepts and found myself enjoying the process of learning them.

What got me interested in machine learning is that it reflects something we designers are all so familiar with — the human-centered design (HCD) process. At a high level, machine learning and HCD both go through these three stages: Define, Ideate, and Implement.

Let’s look at their similarities with a simple machine learning example: spam email detection.

Machine learning workflow for spam detection


The goal of spam detection is to prevent users from being bombarded or harmed by irrelevant information. To solve this problem, we need a labeled training data set that contains enough samples of spam and non-spam emails.

Once collected, these samples will then be pre-processed and cleaned to ensure data quality.

The next steps are to identify aspects of data and select or create models that are relevant to the problem we are trying to solve. We are looking for features, i.e. data properties, that help determine if an email is spam, such as email title, content, sender, receiver, etc, and use them to train a classification model, such as Naive Bayes.


After training, we will validate the model’s performance and accuracy with the validation set and iterate model selection or creation based on the result.


Once the model is ready to go, we will deploy it to real-world applications, and start using it to predict and classify new emails. These new emails will then become new data input to the system.

Asking the right questions

In this workflow example we can see that machine learning, just like HCD, starts with asking why we are doing this, who it’s serving for, what the user is trying to do, which contexts matter, what success looks like (Define) and then figuring out how we are going to do this (Ideate), and what we can learn from the real-world uses (Implement).

Not addressing these questions properly, especially for those at the Define stage — where we define what and who machine is learning from, we will end up having biased systems and make some people feel they or their experiences don’t matter.

Are dark skins not normal to cameras? Shirley cards — named after the original model, a Caucasian female with a light skin tone and blue eyes — set photography’s skin-tone standard for many years

Is a person with a different accent not the user? Switchboard — a collection of roughly 2,400 telephone conversations that skewed toward a narrow set of American accents — is a benchmark for the models used in numerous voice recognition systems.

HCD process, when it’s done well, is an inclusive approach to problem solving. By asking and answering these questions, the UX community has helped make the digital world a much more pleasant place to be. This is why we need to get designers and researchers at the machine learning table, along with other professionals, to design and engineer systems that truly bring value to human’s lives.

Inclusive machine learning

As Kat Holmes, the Founder of KATA and previous Microsoft’s Principal Director for Inclusive Design, puts it like this:

Thinking about machine learning, data-driven decision-making, conversational interfaces and artificial intelligence…we need design in those conversations, we need designers in the decisions that are made around all those technologies…machine that’s learning, well, who’s it learning from and who designed it to learn in what way? Do they design it to learn in the way that they learn? Do they process with their own biases into the design and is that AI?

With the progress of data collection and algorithms, as well as our awareness of design’s role in machine learning, I’m hopeful that future technologies will be more inclusive, rather than in favor of certain skin tones, accents, abilities, resources, and contexts. And maybe, just maybe this wouldn’t happen to me again:

“Siri, what is machine learning?”

“Here’s what I found on the web for ‘What is mushroom landing.’”

That was a good laugh, though.

Special thanks to Laura Mattis, Lilian Qian, Mick McGee, Rawan AbuShaban, Wan-Ting Huang, Yalu Ye and Zach Herring for their feedback and support! Also, thank you Adam Geitgey for writing an awesome series about machine learning — I learned a lot about machine learning from your articles 🙂