Applying Machine Learning To User Research: 6 Machine Learning Methods To Yield User Experience Insights

Published in

athenahealth design

18 min readMar 6, 2018

by Aaron Powers, Sr. Manager of Experience Measurement, & Jennifer Cardello, Executive Director Of DesignOps

Data analytics is a hot topic, and there’s nothing more popular about it than machine learning. But, how can user experience researchers lean on machine learning to test hypotheses and assumptions and understand more about users? While there are thousands of articles about machine learning, most of them focus on how machine learning can automate work. This article answers a very specific question: Which machine learning methods can be used to answer specific user research questions.

Among the dozens of common machine learning techniques, we’ve zeroed in on 6 key algorithms that UX researchers can apply for achieve significant results. These machine learning algorithms are:

Regression
Decision Trees
Clustering
Association Rules
Process Mining
Dimensionality Reduction

These algorithms share 3 critical traits for deriving user research value:

Successfully used to answer questions about users
Produces human-understandable output
Appropriate for large data sets

This article won’t teach everything you need to know to technically implement these algorithms — instead, use it to understand the lay of the land and pick which algorithms might work for your problem. UX researchers have questions that data can help answer — these fall into three stages of UX problems:

Discover: When your project is early on, you don’t know what you don’t know, and are open to anything — this is when “unsupervised learning” techniques will help you the most. These techniques help you to explore the data when you don’t have a right answer, want to learn more about your customers and then use that to shape your future goals.
Testing Hypotheses: When your team has specific ideas and you’ve collected specific data around those ideas, you’re at the stage to test out your theories. You may have several competing ideas and want to find out which one makes the most sense. You would use these techniques when you’ve narrowed in on the problem. These machine learning algorithms are called “supervised learning”.
Simplify Problems: Maybe your results are too complex for people to understand, or the problem just seems too big to pull it together. These techniques support the first two categories of machine learning. While these techniques don’t directly result in answered questions, they are key tools when applying machine learning to answer user research questions.

1. Discover: Questions & Algorithms

The algorithms used in discovery have one thing in common: there is no “goal” within the data. Instead, the goal is to look at a pile of existing data and extract insights in a bottom-up manner — learn about customers, group them together, and understand how they are already working.

While these techniques are best known when applied to “big” data sets that have log data from millions of users, they’ve also been applied successfully on small datasets too — each of them should seem eerily similar to qualitative methods that extract similar insights from qualitative research methods.

Discover Algorithm #1: Clustering

Also Known As: Segmentation analysis
Qualitative Methods That Yield Similar Insights: Personas, roles & goals, affinitization
Use When: You have a set of data about people (from logs, or manually collected e.g. in surveys), you don’t want to look at every individual customer row by hand, and want to extract themes or groupings to form a bigger picture. e.g. if you hope to describe your customers in 4 groups. You may also segment your customers in order to use those segments to slice other data, e.g. to describe how behavior patterns differ between 4 groups of users. This can feed into qualitative methods such as personas or roles & goals.
Data Recommendations: >100 rows of data where each row represents a person or thing, 2 or more descriptive columns

Cluster analysis is most often used to segment & group users into a smaller set of groups — instead of of working with 100 users who all seem unique, you can use clustering to sort them into a smaller number of groups (typically between 3–8 groups so it’s easy to present). If your goal is to create personas and you want to supplement your qualitative research, clustering is the best method to help you using bottom-up data: start from a data set with traits about users and cluster them together, and then label the groups and create personas out of them.

Clustering can be used on many other data sets, however, using them to segment people into several groups is the most common teaching example — it’s easy to understand why a marketer or designer would want to split your users into a set of groups that behave similarly, not unlike cliques in high school.

It’s very common to use clustering on a data set with a lot of information about user behavior, and then use this to help pick a number of groups and look at traits. You might look at groups and create personas based off the groups identified in the data, for example “Sandra the socialite” (because people in that group use a lot of social media functions), “Ivan the introvert” (if users in that group don’t use a lot of social media), and so on.

*Figure courtesy of* *Wikimedia Commons*.

Types Of Clustering: There are two key types of cluster analysis that might be useful to UX research:

k-means
hierarchical clustering

These two work from different directions — with k-means you start by telling it how many groups you want to create (that’s the “k” in “k-means”). For hierarchical clustering you run the clustering analysis and then look at the traits of the groups to decide how many groups to pick.

Further Reading: How To Make Personas More Scientific

Discover Algorithm #2: Association Rules

Also Known As: Market basket analysis, Shopping cart analysis
Qualitative Methods That Yield Similar Insights: Affinitization
Use When: You have a set of data about behaviors
Data Recommendations: >100 rows of data that could be arranged into a co-occurrence matrix, or arranged as a long list of 2 or more items that appeared together in the data

If you’ve ever shopped on Amazon or Netflix, you’ve experienced association rules in action — so much so that association rules are often called “shopping cart analysis”. While clustering is often used to group users together and users can only be in one group at a time, if you wanted to look at what activities users do together even though users do many things, association rules would be more appropriate. If your product has many different features that are used by different people, you might use association rules to look at, say, which two activities users are most likely to do one after another, or during the same session, or even for a single user to ever use the two together.

*Amazon.com’s product recommendation engine is a well-known example of association rules, often called shopping cart analysis.*

*Netflix recommends movies based on what users have previously watched.*

Types of Association Rules: First time users of association rules often start with the apriori algorithm — there are many different implementations of association rules, and apriori is a relatively easy one to understand and get started with. When you use this, the output is a sorted list of pairs — “if this then that”, e.g. using the Netflix recommendation pictured above, “if someone watched Inside Job, they’re most likely to have also watched The Big Short, and second most likely to have also watched Silicon Cowboys.”

Further Reading: Association Rules and the Apriori Algorithm: A Tutorial, What is a co-occurrence matrix?

Discover Algorithm #3: Process Mining

Qualitative Methods That Yield Similar Insights: Workflow analysis, journey mapping
Use When: You have a mess of log data showing a sequence of events that users performed, and you want to extract workflows.
Data Recommendations: >100 rows of data from an event log. One column is often used to identify individual users, because a single user will have multiple events.

Process mining has a special up and coming place within the UX analytics field as it’s the only machine learning algorithm that operates directly on untransformed event logs and analyzes workflows directly. Workflows are a big part of UX — there are many qualitative approaches to designing workflows. Process mining joins this set of techniques from the opposite direction — take a set of event logs and figure out what workflow users are actually doing.

This can be used to answer many questions about workflow, including:

What are people actually doing?
What’s the most common set of steps users do?
What steps are most common?
What’s the most common sequence of steps?
Which steps do users repeat?
What do users do before activity X or after activity X?
Where’s the slowest part of the workflow?

*A screenshot from Disco, a process mining tool, showing a process derived from an event log. Figure courtesy of* *Fluxicon*.

If you’ve been analyzing web logs, you may have used some versions of these kind of analytics — the field of process mining goes far beyond the capabilities we’ve seen built into analytics tools like Adobe Analytics.

One technique lets you use a slider to focus on just the most common activities, and then drag the slider to increase the number of activities shown to analyze more complex portions of the workflow. This single tool is very powerful, allowing you to pick the right level of depth for the workflow such that it can be visualized and presented, while yet being customized to not miss key activities.

Among machine learning algorithms, process mining is a bit different — it’s a newer field, and has a much stronger following in Europe than the United States, it’s only beginning to be included in lists and courses about machine learning.

Tools For Process Mining: An open source tool, ProM, has over 600 different algorithms that you can use to analyze these processes — while not all will be useful, there are many different angles that can shed light on those event logs that have been so hard to step through. For UX’ers, we have to make a special mention of Disco as a commercial leader — it has a highly simplified user interface that UX’ers will find comfortable and easy to explore.

Further Reading: Process Mining For Usability Tests, Agile Development With Software Process Mining, A process mining approach to measure how users interact with software

2. Test Hypotheses

These algorithms are all used when you have a goal in sight: a key variable you want to optimize or understand. You may have a key business metrics such as attrition, or referral growth, or recurring visits, that you want to understand and break down into influences, e.g. so that you can discuss the lever that is most important to improve your user experience.

Some hypotheses are very general and can be tested using either type of algorithm — the one that makes the most sense or has the best prediction may be the one you use.

Personally, we find these tools particularly helpful when trying to model UX’s effects on business metrics — the thing we want to change is the UX or design, and we want to see how those differences benefit or hurt the business.

Test Hypotheses Algorithm #1: Regression

Qualitative Methods That Yield Similar Insights: Logical reasoning as to what causes what, e.g. root cause analysis
Use When: You have target metrics and you want to understand what might influence them. It’s particularly useful when you have many different factors that might be influencing the target metric.
Data Recommendations: >10 rows of data from an experiment or >50 rows from natural observation, starting with one column you want to understand, and any number of columns that you hypothesize might influence it.

Many UX’ers are surprised to find out that regression is considered a type of machine learning — after all, we’ve been using regression for >20 years and the term “machine learning” has become popular only more recently.

Regression is most often used to test a hypothesis that there is a causal relationship between variables — for example, let’s say you have a bunch of users and for each user you have these two variables:

Number of errors encountered
Did user make a purchase

If your hypothesis is that more errors encountered reduce the chances of users making a purchase, you can use logistic regression to test your hypothesis. The regression analysis will either prove or disprove the hypothesis.

*This example of linear regression shows a 2 dimensional set of data points and plots a line that can be used to illustrate the relationship between the two dimensions. Figure courtesy of* *Wikimedia Commons*.

Types of Regression: There are three basic categories of regression analysis and two higher level analysis:

Logistic regression
Linear regression
Nonlinear regression
Mediation (higher level analysis using regression)
Most influential variables aka key driver analysis (higher level analysis using regression)

You’ll pick between the 3 basic types (logistic, linear, and nonlinear regression) depending on the variable you’re trying to predict and the complexity of relationships between the other variables. In the example above (relating number of errors to whether users converted) logistic regression is a solid choice, because there are only two possibilities: converted or not.

Regression is often used as a swiss army knife for user research — it can answer many questions about the relationship between variables, separate the effects of confounding variables, or discover trends and patterns in log/user tracking data. When there is a lot of data and you are trying to understand the relationship between multiple variables, regression is a great place to start.

Sometimes things get more complicated — you have a lot of variables, and you think that there’s a complex set of relationships between them. For example, let’s say your theory is that age of your participants causes them to have less access to mobile devices (e.g. in theory both the youngest and the oldest users will use their mobile devices less than the middle age groups). To test this relationship hypothesis, use mediation analysis, for example:

Better teachers -> Smarter students -> Better grades -> Higher graduation rates

In this example, teachers aren’t the ones who graduate — so models that complete the chain between causes and effects will be more likely to better explain the data than a simple 2 point model.

Mediation is a complementary analysis to regression — extending the ability to build and model more complex relationships. This helps to reduce noise and is particularly helpful when you have many variables that you believe have a causal relationship.

*Figure courtesy of* *Dan Soper, Wikimedia Commons*.

Example research question:

“Which makes more sense: ease of use affects satisfaction which in turn affects loyalty, or would it make more sense to link ease of use directly to loyalty?”

To answer this question, you’ll use 3 separate regression models, comparing how the variables influence each other and how those influences compare between models. Basic mediation modelling techniques can be learned in a few hours using some of the resources below.

Further Reading: Regression analysis on Wikipedia, Mediation (statistics) on Wikipedia, How to identify the most influential variables on minitab.com, Key Driver Analysis By Jeff Sauro

Test Hypotheses Algorithm #2: Decision Trees

Qualitative Methods That Yield Similar Insights: Logical reasoning as to what causes what
Use When: You have target metrics and you want to understand what might influence them. It’s particularly useful when you have many different factors that might be influencing the target metric.
Data Recommendations: >10 rows of data from an experiment or >50 rows from natural observation, starting with one column you want to understand, and any number of columns that you hypothesize might influence it.

Decision trees are the next step after regression — many machine learning courses teach it next. It can sometimes be applied to similar types of problems — you can even use decision trees on some of the same data sets as regression to answer some of the same kinds of questions, sometimes finding the same results.

While regression is a great representation for linear relationships, decision trees are more useful when there are sharper breaking points in human decision making — this is why it’s called a “decision” tree, because they work better when there are turning points that change the relationships between variables.

A classic teaching example of decision trees uses a data set of Titanic survivors — the first decision point is gender. The following decision tree shows that 73% of women survived the Titanic sinking, but for men it was much more complicated (in the chart below, “sibsp” is a count of the number of siblings and spouses this person had).

*A simple decision tree based on a Titanic survivor dataset. Figure courtesy of* *Stephen Milborrow, Wikimedia Commons*.

Decision trees are often a good choice when you have a large number of variables and aren’t sure which ones are key or how they relate — the decision at the top of the model is always the variable that most affects the outcome. One way to think about when to apply decision trees instead of a regression model is when you have a large amount of data (e.g. log data collected over months or years) and you don’t know what factors matter — it can be useful to help discover relationships where regression is better for testing hypotheses. However, there are a lot of problems like the Titanic solution above that can be modelled relatively well in either regression trees or decision trees — this model happens to be simpler to model and understand as a decision tree.

In UX, decision trees may be particularly helpful when you are modelling users who change their behavior significantly by demographic or group — for example, if college students use your software in a diverging way compared to retired users, or if usage patterns vary by role, such as if you have both users who are engineers and users who are elementary school teachers and have a variable that tracks their age or role.

Types of Decision Trees: The two main types of decision trees:

Classification Trees — when the variable you want to model is a class or categorical variable (e.g. male vs female, or survived vs died).
Regression Trees — when the variable you want to model is a numeric variable. This combines decisions and regression relationships. It can model relationships similarly to a regression model that uses interactions.

There are some warnings to consider when using decision trees; Just like in other models, it can be tempting to put together a decision tree with a hundred variables and get a great fit, but have a indecipherable model. If you’re trying to understand users it may make more sense to start with just a few key variables and let the model grow only as you understand the relationships between them. It’s easy and tempting to “overfit” a decision tree and get something that is too difficult to understand or explain.

Further Reading: Decision Tree Learning on Wikipedia

3. Simplify Problems

These algorithms are often used to preprocess data before applying the above machine learning algorithms — strictly speaking, most aren’t machine learning techniques, but rather techniques that allow you to use machine learning when the data is messy.

Simplify Problems Algorithm #1: Dimensionality Reduction

Qualitative Methods That Yield Similar Insights: Logical reasoning as to what causes what
Use When: You have data and it’s too many variables for people to understand, or too many variables to put into a machine learning algorithm. For example, if you have 10 variables that might mean almost exactly the same thing (as in a survey scale like SUS that has 10 questions but you want to report it as a single metric).
Data Recommendations: any data set that has a larger number of columns that can be collapsed

Dimensionality reduction can be useful when you have many different variables but you want to explain the data using a smaller number. Typically you would apply dimensionality reduction before applying another algorithm or visualization — e.g. dimensionality reduction is more of a technique to do data preprocessing but it rarely answers questions on its own.

We’ve heard of two ways dimensionality reduction has been used successfully in user research.

First, if you want to visualize a data that has many dimensions, for example, groups of users, you can reduce the number of dimensions down to just two dimensions. Then, for example, you could run a clustering analysis using those two dimensions, and visualize the result of the cluster analysis in a 2D visualization, coloring each group.

*A visualization of a 2D cluster analysis. Any number of dimensions can be visualized in 2D after using dimensionality reduction. Figure courtesy of* *Wikimedia Commons*.

The second way would be if you wanted to rank order a group, for example, rank ordering the features used in a product or error messages, but you have many variables that consist of a similar scope. If you have many different variables that show how people use a feature or react to an error message, and there’s no obvious single winner, you can use dimensionality reduction to turn those features into a single variable, and then rank order that variable and sort them — therefore showing users just the top 10 items.

Types of Dimensionality Reduction: While there are many different kinds of dimensionality reduction, the algorithms we’ve heard used successfully on UX projects in this space include:

t-sme (t-distributed stochastic neighbor embedding).
PCA (principal component analysis)
Cronbach’s alpha (to decide whether survey scales can be combined by averaging)
Convert to z-score or percentile, then average
Factor analysis

The first three techniques are complete machine learning techniques — the last three are not machine learning techniques but are often used by survey experts to achieve the same result of fewer dimensions. Cronbach’s alpha is most often used by people implementing surveys to combine multiple questions into a single scale. Converting to Z-scores or percentiles may also be useful when combining measures of diverse forms — when your metrics all use very different scales, both z-scoring or percentile ranks can unify them to a single scale by measuring them in terms of number of standard deviations from the mean. Both work well when you have all of the data — but note that if you have data streaming in, these scores will change over time (e.g. you run one study and the average is a 5, then you run another study and the average is a 6 — the overall percentile will change which could confuse your stakeholders when you report findings).

If you’re familiar with factor analysis, as UX’ers have sometimes used, you may find machine learning algorithms that do dimensionality reduction familiar, solving a similar problem in a different way — Sunil Ray has even argued that factor analysis is a type of dimensionality reduction.

Further Reading: Wikipedia Article on Dimensionality Reduction How to calculate a z-score by Jeff Sauro

Simplify Problems #2: Data Preparation

Also Known As: Data preprocessing, data munging, data wrangling, data cleaning, data prep
Qualitative Methods That Yield Similar Insights: Organizing notes before a meeting, or any other preparation you do to clean things up before another method is applied
Use When: Your data is complex, messy, or not organized in a way that can be just plugged into a machine learning algorithm.
Data Recommendations: any data set

Data scientists are often said to spend most of their time doing data preparation. In the world of “big” data where data is often collected in massive quantities and then reused for a different purpose, many analysts can find the data but it’s not organized the way we’d like. It can take hours to months to reorganize the data into a format that is ready to analyze.

This technique involves many sub-techniques, such as connecting to databases, importing data, filtering data, removing missing values, grouping and aggregation, joining, union/concatenation, import and export to and from data sources such as databases, & many more. There are whole job categories that deal just with data preparation topics, including ETL (Extract, Transform, Load), Data Warehousing, & Data Engineering.

Tools for Data Preparation: While there are a variety of tools to do this that have been around for >30 years, and general purpose programming languages like MATLAB, R, & Python can be used, UX’ers are more likely to drift towards a new crop of tools that are designed for their ease of use These are often considered self-service data preparation tools, because they’re designed for a larger audiences. Leaders in this market include Trifacta, KNIME, Alteryx, & Tableau’s Maestro — however there are dozens of additional products in this field and a lot of venture capital investments. We hope to see tools in this space becoming significantly easier to use over the next few years.

Further Reading: Gartner on Self Service Data Preparation, Bloor Data Preparation (self-service)

Machine Learning For UX Is An Evolving Space

While we’ve found 6 machine learning algorithms that have been successfully used by UX researchers, we believe there will be many more. These 6 are commonly taught outside the UX field, and the number of practitioners using these techniques is rapidly growing amidst the excitement of machine learning.

Special thanks to Sam Zaiss for suggesting Process Mining, to Clayton Stanley for suggesting Dimensionality Reduction, to Yiran Buckley for help with visual design, and to Kendall Eskine, Will Benson, and Jen Cardello for asking hard questions that couldn’t be answered in 30 seconds or less.

If you think there’s a 8th machine learning algorithm or if there should only be 5, we’d like to hear from you — contact me directly or post comments or questions.

For more information on opportunities with athenahealth, please visit us at https://www.athenahealth.com/careers/design