Driven by data: the intersection of ML and UX — with Sam Gracie, CPO of WhyLabs.ai, MLUX Tech Talk

Zach Cohen
Machine Learning and UX
5 min readMay 26, 2022
Machine learning and design venn diagram with data in the middle.

As designers we need to be data driven, but in practice leveraging much of the data that’s at our fingertips can be challenging. How do you know what the right data is to use, if it is high quality, and trustworthy? These same problems face data scientists and engineers when building and operating AI applications.

Sam Gracie from WhyLabs, a company looking to build the interface between humans and AI applications, discusses all things data and why developing data literacy is important for designers.

Why we all need data

Every AI and design practitioner needs data. Data is the intersection between human-centered design and machine learning, designers need data to improve the user experience, AI practitioners need data to build and improve models, but also to demonstrate ROI to business stakeholders as well as customers.

Whether it’s looking for patterns in data in order to make predictions or analyzing data to generate insights to measure our designs, we all need to be comfortable with the processes, challenges and formats that comes with working with data.

Data understanding between qualitative and quantitative data.

Building your data understanding

Data understanding and access to data can be the difference between a project failing or succeeding in the human-centered design process.

We can break data down into two buckets:

  1. Qualitative data — methods used to generate and evaluate
  2. Quantitative data — evaluate and analyze

Even more simply, qualitative data is non-technical while quantitative is technical.

A common theme in business is “I’ve got an idea and just need the data to prove it…” which can lead to objections in gathering data such as needing expensive analytics software, knowing what type of data is being collected, or wondering if your data is high quality. This leads to understanding that the process of working with data is more than just the data itself, there’s an ecosystem around the data as well.

Here’s an example:

If we were walking into a library to try and find a book, we would know exactly what to do. We would figure out which department our book is going to be in, we’d go to the department, find the author on the shelf and then look up the title. When you’re trying to find the data that you want to work with it can be much more difficult to know where to start, and this is a problem for AI practitioners as well as non-technical practitioners even though data scientists are much more comfortable with navigating all the problems of getting to data because they do it regularly.

Let’s say you found where your data lives, and you’ve managed to get the data somewhere where you can access it easily. Before you can actually work with it, you need to get a sense for what shape the data is in, in most cases it’ll be in some sort of structured format such as a table or csv file, it could be in a semi-structured json format or an unstructured format like images, video, text, or audio. It’s important to know the data format because it will affect how you can work with it.

After understanding the structure, the last thing you need to do is take a look at the data and see how high quality it is and see if there are any issues with it that need fixing. Data quality issues can mean big financial and productivity losses for the business so it’s imperative to fix the data’s issues before beginning any more sophisticated processes.

Data quality issues…and what to do about it

Data quality is a problem for everyone. Quality data will enable you to make higher quality decisions, and luckily there are ways to ensure your data is good enough before starting work with it. Here are some things to consider:

  1. Remove irrelevant values — be careful when deciding what’s irrelevant as correlated values may need to be checked later
  2. Remove duplicate values — is the data combined from multiple sources?
  3. Resolve typos and similar errors — models pay attention to difference in values, strings rely heavily on spelling and casing
  4. Resolve data type issues — data types can be strings, numbers, booleans, factions, nulls. Data types should be uniform across the data set.
  5. Address any missing values — if a column is missing too many values either remove the column entirely or input the missing values

How can we keep an eye on data quality without a lot of manual processes? One way is to monitor the data as it flows through the ML system, and set up alerts to be generated when issues do arise.

WhyLabs being used to generate alerts when issues in the data pipeline arise.
WhyLabs being used to generate alerts when issues in the data pipeline arise.

Opportunities driven by data

Practitioners that are driven by data have a world of different opportunities. As we know, businesses are continuing to invest in building AI systems which means that there’s an increasing need for data scientists, ML engineers, and data engineers to build and operate ML applications responsibly. The software and tooling needed to operate and build these ML systems is just as important, as these tools should enable all practitioners to do their jobs with minimal friction, but should also offer a way to create a bridge across the whole organization. The tool should provide transparency around how AI systems operate to help ensure that they’re operated responsibly.

In addition to data scientists, designers, researchers, software engineers, and product managers that build these ML tools have a large part to play due to businesses making significant investments in machine learning. This presents opportunities to work on some of the toughest design and engineering challenges, because for the systems to be designed well you need to have empathy for your users and you need an understanding of the data that they’re working with. The common factor in all of these tools and processes is that they’re all driven by data.

Watch Sam’s talk on our YouTube channel:

About the Machine Learning and User Experience (“MLUX”) Meetup

We’re excited about creating a future of human-centered smart products, and we believe the first step to doing this is to connect UX and Data Science/Machine Learning folks to get together and learn from each other at regular meetups, tech talks, panels, and events (held remotely).

Interested to learn more? Join our meetup, be the first in the know about our events by joining our mailing list, watch past events on our YouTube channel, and follow us on Twitter (@mluxmeetup) and LinkedIn.

--

--