The Role of Design in Machine Learning

Owen Schoppe
Salesforce Designer
8 min readAug 4, 2020


Today, machine learning (ML) is a component of practically all new software products. For designers there is sometimes a question, “what is the role of design in machine learning?” How can designers engage in the process of creating a machine learning powered product? We’ve been working on Salesforce’s new Einstein Designer tool, which automatically generates design variations to improve UX. While this project was specifically focused on ML for design, what we learned is more broadly applicable. During the Einstein Designer project our team of hybrid designers/developers/data scientists discovered new ways of incorporating design techniques into ML projects.

Overview of the machine learning development process

When you’re working with machine learning, the traditional functions of design — crafting a product vision and communicating with stakeholders — apply, but ML also brings new factors to the table. This article explores how design techniques can be applied in ML development At base, it’s all about data — getting it (a lot of it!), cleaning it, understanding it, and ultimately building software on top of it. The process goes something like this:

  1. Collecting data
  2. Visualizing and cleaning that data
  3. Creating models and algorithms
  4. Evaluating those models and algorithms

Collecting Data

Design can contribute to data collection in several ways; the most direct is designing data collection tools. Think of this as a subset of product design, with a few key distinctions. Data collection tools are usually internal-facing, with the product team as the user. Productivity is key, and the tool needs to evolve along with the team. Data collection is also part of the discovery process. Labels and visualizations have a huge impact on team conversations and direction, and clear and effective UIs are crucial.

While in an ideal world, development teams have access to large, clean, public datasets, we often need to collect our own data. This part of the process is rarely published or discussed; most teams create their own solutions, which add time and cost. Sometimes you’ll have to log data over a period of time. Other times, the raw bytes exist but must be gathered into a unified dataset — making collection tools extremely valuable.

To figure out what features you need in a collection tool, view the problem through a product-design lens. For Einstein Designer, for example, we built a tool that reads a webpage, takes a screenshot, and uses the information in it to create a high-level preview of the site’s design system. We’ve run this process on many sites, and repeat it as we learn more about what information is important, so a tool that makes it more efficient was critical.

Some design contributions are more elemental. Take button and section labels, which often include critical terminology. Making them clear and consistent helps clarify design problems. In our data collection tool, we labeled one of the matrices and the list of individual text styles “palette,” which helped team members understand the scope of that term and differentiated it from other key concepts.

Applying clear, consistent labels to tool elements

Data collected for machine learning is often unlabeled, lacking descriptors telling how to categorize it. For example, we might have a pile of images, but no descriptors about what they contain — e.g., cats or dogs. Often, humans must label the data.

To expedite this process for Einstein Designer, we built a game-like experience in which human users were challenged to classify product tile designs as good or bad, using keyboard arrow keys. Design quality was then defined by an image’s aggregate score. As designers, we contributed to this process by creating a a clear, functional UI and making the evaluation process as seamless as possible. The simpler the interaction, the more images users could evaluate. In this case, speed also forced users to make intuitive judgments. Since consumer users will also be making split-second decisions, this helped create analogous data.

Product tile classification training game

Another exciting area of machine learning is synthetic data. Constructed to reflect expectations for real data, it’s used to bootstrap ML models before you have access to a large real dataset. For Einstein Designer, where we’re training models to understand design, we’re lucky to have Sketch, a powerful tool for generating new designs that can provide both an image of the generated design and a JSON representation that closely mirrors collected web data. We first train models on synthetic Sketch data, then later augment those same models with real data from the web.

The Sketch app interface showing synthetic data

We used Sketch to create 50 design templates, making heavy use of symbols and auto-layouts, so they’d still look good as the data changed — a process much like real web design engineering. We next used Sketch’s data features to wire up the templates with custom values. For each template we created good and bad versions, breaking the design in some specific way, such as overlapping or misaligned text. We then cloned these 25 times to generate 2500 labeled examples (50X2X25 = 2500). Each clone with attached Sketch data creates a unique example, resulting in a balanced dataset. Finally, we systematically labeled all layers.

Cleaning Data

The next step in ML development is cleaning your data. If you’re using synthetic data, this is already done — otherwise, some cleaning is necessary. Either way, visualizing the data will help you understand what’s actually in there.

Here again, designers have valuable skills to bring to the table. Whether you’re drawing possible plots on a whiteboard or creating them with Python or D3, you can use the design process to think through visualizations.

Visualization of colors across 1000 top websites, by Moritz Stefaner.

Creating Models

Once you have a clean data set and an understanding of its contents, the next step is to create algorithms or models based on your insights. While it can be tempting to jump straight into code and start training models, we’ve found it useful to first use lo-fi design prototypes to try out algorithms. For example, we prototyped the output of our generative design tool by simply creating a matrix of design options on a slide. We then asked our customers to eliminate any designs that didn’t meet their brand guidelines. This gave us confidence in the product requirements for our service and helped us understand what actions users might want to take.

You can also prototype the algorithm itself, to understand what kinds of predictions might be possible with given data. For example, we wanted to train a model to understand a brand’s existing style guide. We could collect individual styles used on a site, but wanted to understand how those styles work together. We realized we could prototype multiple approaches to this task and inform our intuition about how to use styles appropriately just by drawing them.

We created input data and wrote down the rules of the proposed algorithm, which team members then executed manually in Sketch. This process revealed that creating pairs of styles that follow each other in the original page was the best way to understand relationships between designs. It also gave us the confidence to start automating style relationship collection and further refine our methods.

Sketch algorithm prototype

This exercise also supported our intuition that certain algorithms wouldn’t work. Humans can’t perfectly simulate a machine learning algorithm, but machine learning isn’t magic; there must be some underlying pattern or information that tells the algorithm what to do. For example, we wanted to know how little information the design generator would need to pick an appropriate font size. We asked team members to draw designs based solely on data type and place in the design hierarchy, and quickly discovered that it’s necessary to know more about the values. Like a human designer, to make appropriate choices, the algorithm needs to know the approximate amount of text in each field.

Whiteboard and sticky notes from manual algorithm exercises

Evaluating Models

Once you’ve created your algorithms and models, you need to identify, learn from, and correct mistakes and biases. Design can play an important role here, in creating tools to streamline the evaluation process.

For Einstein Designer, we needed to evaluate the output of our generative models. While there’s no easy way to automate scoring the quality of design, we could build a tool for evaluating the results. Our process drew on best practices from user research and design leaders including Charles Owens of Chicago’s IIT Institute of Design and Miles and Huberman’s Qualitative Data Analysis.

First we created a tool that would allow everyone on the team to visualize and review the same designs. It codified a set of test cases, created standardized inputs for each, and let us consistently test our system. Unlike in an actual product, where users should see only the highest quality output, it showed the model’s complete output to help us understand how it was functioning.

Our model visualization and review tool

Next, each member of the team used the tool to independently rank designs, much as researchers might code a transcript to quantify qualitative data.

In Owen’s IIT Structured Planning course, for example, each student is asked to rank the relatedness of individual topics. The resulting data is then combined, and used to inform which themes are most central to a design project. Standardizing the questions students ask when comparing topics makes it possible to distribute work and cluster and compare the results. And having multiple students rank the same data helps reduce the impact of individual bias.

We took the same approach, having each team member rank the designs, then combining results to generate a more objective view of design quality, an inherently subjective topic. Repeating this process at the end of each design sprint helped us track our progress toward the goal of a robust design generation product.

Spreadsheet tracking team members’ rankings

As machine learning becomes ever more present in product design, remember to consider how design can play a role. The core design skills such as identifying and communicating needs, visualizing data, prototyping, building tools, and doing research all play important roles in the core machine learning process. Most of all, remember that the process is collaborative, everyone can participate, and more perspectives lead to better products.


Sönke Rohde, Jessica Lundin, Michael Sollami, Tim Sheiner, Alan Ross, Brian Lonsdorf, David Woodward

Follow us at @SalesforceUX.

Want to work with us? Contact us at

Check out the Salesforce Lightning Design System