Google AI Products

Design of Artificial Intelligence Products

Spring 2018 | Carnegie Mellon University

This is the class note of Design of Artificial Intelligence Products taught by John Zimmerman in the HCII at Carnegie Mellon University

Week 1–1

Before class

Professor asks us to think about an AI product or service and think about below questions.

  1. How is AI used in the product or service (what does it actually do)?
  2. What does the AI take as input and provide as output?
  3. How does the use of AI produce value for the user?
  4. How does the use of AI produce value for the product maker or service company?
  5. What kind of AI is used … what machine learning method, etc.? It’s OK if you do not know. Many products and services do not make this clear.
  6. What kind of errors does the system make?
  7. How well does it need to work to be valuable?


When we discussed the examples we brought to the class, John poked us about some thinking point.

  1. The recommendation engine could work under different algorithms. There are pros and cons in each of them. It would be useable to learn about that.
  2. What is good enough and which service require absolute best? For Voice search, there is a term called ‘Soft Fail’. When someone asks questions, a single answer would be either correct or wrong. One page of search result that guarantees the answer is somewhere on the page would be more beneficial to the users.
  3. As for VUI, it is easier for users to skim through text but there is no way to skim through voice. Furthermore, how do you know the search isn’t ad?
  4. Once the AI is introduced, human often tries to cheat it. We need to consider this kind of situation.

The Challenge for Traditional Design Process

Double Diamond is a common design process diagram which illustrates the nature of converging and diverging stage through the design process. But when it comes to design with AI, the process often falls apart.

  • User Research: Don’t recognize resources or opportunities in the field
  • Synthesis: Don’t call out insights that are AI actionable
  • Ideation: Have not internalized capabilities; cannot envision the future.
  • Evaluation: Can’t reduce risk by prototyping; can’t simulate the performance

Human-Centered Design does not work

  • No guarantee that if you select a target user (starting point of UCD), that the research will reveal a need for AI
  • No guarantee that if you find a user need for AI, then there will already be a data to train the system

Week 1–2

You don’t need to know the chemistry about glue to use it. The core question is how much or what do we need to learn about AI so that we could design with AI. Most of the time, people leverage analogy better than an algorithm. Furthermore, do we care more about accuracy or explainability?

5 Tribes of Machine Learning

  • Symbolists — decision trees
  • Connectionists — neural networks
  • Evolutionaries — genetic algorithms
  • Bayesians — probabilistic inference (prediction)
  • Analogizers — clusters, support vectors (detect pattern)

This reading explains the capability of machine learning quite well. He further points out better machine learning algorithm often combine more than one tribes. However, what does he mean better might not be valuable to human?

John further talked about the example of the cassette tape. People who invent cassette tape does not see the Walkman coming and the change of the music industry. That’s the analogy for machine learning.

Machine Learning Report from The Royal Society, UK

In this report, I found it more useful is about deployment

  • Online: Learning algorithms continue to be applied to the trained model after deployment. This means that the performance of the system ‘in the wild’ can continue to improve in response to real-world data. It also means that there is no opportunity for human checking of the consequences of updates to the model.
  • Offline: Systems are trained and tested in an offline setting, and the trained models are then ‘frozen’ before being deployed. It gives an opportunity for human verification of the system before the system interacts with any user.

Which industry could be most impactful

  • Where there is sufficient data available to enable machine learning methods to be developed and put to use
  • Where this data is used effectively
  • Where there is access to sufficient computing power

Oversimplified to think about it

In order for designers to think about it, we can think about its feature as

  • Classifier
  • Optimizer

Google flu trend

Then we talked google flue trend. Although some reports say that it is inaccurate, what we concern is

  • Who can benefit from such prediction?
  • What’s the value of being more accurate?

In Google’s perspective, this is an unintended value from the existing dataset. (Monetizing the data).

Matchmaking (Tech-centered design?)

  1. Identify technology’s capabilities
  2. Map tech capability to human activities
  3. Ideate domains that might benefit
  4. Select a good target and give form to product/service
Example of GPS

Week 2–2

My Favorite UX Design Process Diagrams (link)

The Analysis-Synthesis Bridge Model (link)

This is how Netflix’s top-secret recommendation system works (link)

More than 80 percent of the TV shows people watch on Netflix are discovered through the platform’s recommendation system.

There is a trap in this statement. Everything you see on the front page is recommended by Netflix. This statement simply indicates that 20% of people will actively search for specific content. 80% of people simply wandering around to see content.

…one in eight people who watch one of Netflix’s Marvel shows are completely new to comic book-based stuff on Netflix.

The fact is that these people might see Marvel movies in the theater which Netflix can’t know. As a result, these people might not new to Marvel shows.

  • What is a false positive on Netflix? What is a false negative on Netflix? (ex. If users do not click on something, it is not likes or dislikes.)
  • How does it impact the user experience of the service?

Furthermore, the behavior of watching video is very different from listening to music. You only see most of the videos once.

Data Scientist vs. UX Designer

Professor then talks about the tension between data scientists and designers. Data scientist tend to build models from the dataset to explain what is already happen. They claim to be the user expert by understanding users through data. On the other hand, designers are also user expert because we know how to conduct user research. As both sides claim they are user expert, there is a tension between these two disciplines.

Business vs. User Experience

How do we think about the intersection of business, AI and user? Thinking about the example of Netflix. The bandwidth is expensive so Netflix wants you to subscribe then not watch it at all. While a good recommendation seems to have great value for users, how does it benefit its business? What else can be done with the Netflix’s data besides recommendation?

How can AI contribute to User Experience?

“Over the past decades computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder.” — Jeff Bezos

What UX value can ML offer?

  • ML is something that can personalize for a lot of people.
  • ML enables higher level abstraction in user instructions.
  • Users foresee the relationship (with the product) improve, where the relationship is the recommendations we are giving them.

What interactions can ML power?

Individual Users

  • Identify individuals
  • Collect personal knowledge
  • Recognize individual’s instantaneous activity
  • Infer meaning of the individual’s activity
  • Infer individual’s instantaneous internal status
  • Identify patterns of the individual’s activity
  • Characterize an individual

Collective Users

  • Map social connections
  • Detect and track collective activities
  • Collect opinions


  • Detect context
  • Detect resources

What ML techniques are out there?

Topics based on clusters of HCI/ML research

Week 3–1

DESIGNING AI by Elizabeth F. Churchill, Philip van Allen, and Mike Kuniavsky Interactions 25, no. 6 ( 2018): 35–37

This week we read about a very recent publication about desiging AI. While the content is very rich and inspiring, Professor is still critical about it. First of all, it feels like it scrath the surface but not explicit talk about HOW. (The Cybernetics one). Secondly, for the pathologist piece about machine learnig to machine teaching. Do we really need human to teach machine how to do that? Or we could simply gather all the caner image and use deep learning to learn without human interference. What is the benefits of teaching by human if the machine will eventually exceed the human accuracy?

Professor mentioned that the fundamental things HCI do is replacing the human by machine. (To be more ethical, at least engineers focus on the bad or dangerous job). The Cybernetics one says that AI and Cybernetics could learn a lot from each other. I think the tricky part is that if machine is designed to have a goal, is it possible to produce unexpected outcome without human involvement? Is it possible that machine could act beyond designers’ intention? I think it is interesting to think about it.

Week 3–2

Class cancel due to the polar vortex.

Week 4–1 P1 Presentation

Today, we presented the outcome from Project 1. This project is about thinking how to find applicable use case for the potential field.

  • Dataset (from Uber): It includes things like data on ride requests, food requests, and driver movements during and between rides.
  • Technical Capability: A depth camera that can recognize gestures; intentional and unintentional movements and poses people make.
  • Platform (Technical Capability Bounded by Product Form): The new iPhone and iPad have face recognition capability (Face ID). How can you leverage that for your 3rd party app?

We need to brainstorm ideas and evaluate based on below metrics.

  • Need of the target customers (how much are they underserved by current offerings)
  • Ease of developing the system
  • Risk of errors (How a false positive or negative will impact user value)
  • Size of the potential market.

For me, I eventually choose the idea of Uber Shuttle.

Overall Feedback about P1


In the Uber data set, John thinks that there is a social entropy in the data which could be applied to the different domain. The ideal one is like Uber EAT which leveraging current resource and build a totally different business.

However, many ideas still deal with location or something that is simply not technologically feasible. For example, selling ride-to-hospital data to the insurance company. In this case, unless you know the specific name, otherwise, it is not useful at all.


Many talks about detecting people with suspicious activity. This is hard because how do you get the initial data set to train such model. Many postures might simply too subtle to label. This exercise is about innovating within the known mechanism to make it valuable

FaceID for 3RD PARTY

Most people misunderstand the prompt. FaceID is not a face recognition but a you / not-you binary classifier. Ideas like face recognition for music festival entry is not the same as using Face ID to log in.

John said that the fundamental challenge of designing with AI is How can we consistently think of things that could be built? And the least I could do to maximize the value.

Week 4–2

We discussed the idea of human computation. The first reading is VizWiz which use human to help blind people identify objects in their photo in real time. The second reading is the emergency crowd which is about using human as a sensor to report the situation during emergencies such as the hurricane or the earthquake.

When we talk about AI as a black box to process tasks, we ask what is the benefits to make it transparent? What will people do if they 100% understand how the algorithm works? It seems ethically correct that people have the right to understand. But what will happen if that is true?

We also talk about the boundary between AI and human computation. John mentions a very interesting point. He thinks that if the service is conducted by a human. You somehow have a connection with him. If you turn to a different service provider, you will feel guilt. There is a sense of loyalty there which does not exist in machine or algorithm.

Why there is so much money invested in machine learning? The economic incentive is all about reducing cost or replacing human labors with machines. For example, taking care of elder people is expensive. Using machines to take care of them could potentially save money. But is that good for elders? Or it is a simple cost consideration. There is a value judgment happening.

There are plenty of spaces that human computation still has privileges. Take meme for example, since it is so emergent, the current ML algorithm is not able to deal with it. Whenever there is a new meme come out, the machine is not capable of understanding it not to mention seasonal fashion or trend.

Week 5–1 Contribution from Online Communities

  • Should we use shame/obligation to drive behavior? What is the implication of using negative factors?
  • Do we need quality from a few or participation from many?
  • Is FB human computation?

Week 5–2 Interactive Machine Learning

The recommendation system from Netflix is interactive machine learning while the facial detection algorithm in Nikon’s camera is not.

Instead of having a bunch of data, interactive machine learning has small quantity and incrementally change. The question is do we need to focus on explicit signal or implicit behavioral as input.

Everyone asks for transparency of AI. But does that really what they want? Do they want to know exactly how the machine works or simply want to know how to get a better result?

ML could optimize the recommendation you get from Netflix but never able to come out with a conclusion that Netflix should produce their own shows.

What is a good problem for ML?

  • Focus on problems that would be difficult to solve with traditional programming.
  • Know the Problem Before Focusing on the Data
Focus on Interaction Decisions, not Predictions
Framing ML problem

Hard ML problem

  • Clustering
  • Causation (Can’t find cause-effect)
  • No existing data
  • From data to predictive power (considering signals and noise)

Project 2 Human as AI Proxy (Week 6)

For project 2, we combine our readings about human computation, encouraging community contribution and interactive machine learning into this project. We need to think about services that could benefit from a crowd as an AI proxy.

Why do we need to think about it?

  1. Innovators find themselves in situations where they want to deploy an intelligent AI system to solve a particular problem; however, there is no data set available to train the system. Or, the problem could be that ML approaches are not very good at addressing the specific challenge. VizWiz, a system we will read about, offers one example of this innovation path.
  2. Innovators recognize that people are engaged in an activity that could be recaptured to build data for an AI system. Luis von Ahn’s ESP Game and reCAPTCHA both provide an example of this second innovation path. People play the role of an AI system through the work they are already doing, and this produces valuable training data to develop such a system.

We try to brainstorm our ideas and evaluated through three indexes:

  • Is there an existing product in the market?
  • What is the motivation for people to engage the service (so they are willing to contribute data)?
  • Can it be transformed into AI-powered in the future?

While we narrow down into several ideas, Professor Zimmerman challenges us to think about what is the value if the service is provided by AI? Is this tedious, dangerous work that needs machine replacement? Is this repetitive work that could benefit from automation? We should not simply think AI could do that but also think why do need AI to do that? Who might want this? In what scenario?

New Ideas — Voice Actor for Youtuber

As a result, we turn to the idea of dialects language translation which will probably a good service in China. However, we have a hard time to figure out the context that could generate positive value flow. After discussion, we turn to stream video voice-over translation service. Unlike professional translation, stream video does not require high accuracy. Current approach mainly focuses on using ML to generate captions which might be not suitable for a certain user group. For people like Youtuber to gather more audience, voice-over translation might be an interesting opportunity to discover.

We drew out the value flow diagram for our service and decide to come out with wireframes for the next meeting.

We also think about possible scenarios that could use this service.


Today, we finished the value flow and discuss the wireframe. Through the reading about encouraging online community to contribute, we have leveraged some of the principles for our wireframe.

Encouraging Contributions to Online Communities

Before we draw down the individual page, we need to define the function flow which leverages the motivation principles as well. Finally, we landed on three different major categories: Youtube, Youtube Studio and Say-Youtube.

Week 7–1

Adaptive UI

Contextual Search


If you are going to redesign Clippy, how do you make it better?

With a human-like image, it seems to increase people’s expectation about what it is capable of. Furthermore, it affects how people communicate with it. Thinking about features like autocorrect or spam filter, they are all sorts of AI but human treat them in very different ways.

Adaptive UI

Planning Adaptive Mobile Experiences When Wireframing

Adaptive UI works with well with the pattern, routine and repeat tasks, trying to minimize the navigation efforts for the user. It fails when the user is new to the system or users often not stick to the pattern.

Between pattern and change of pattern, how do you negotiate the balance?

Contextual Search

The perfect search is not enough. The same word might have different meanings in different context. Search should be contextualized in order to be helpful.

Week 7–2

Being accurate is not enough

The article discusses a very interesting topic: Sometimes people do not need total accuracy. They need some kind of surprise. For example, if you buy a toilet seat at Amazon, you don’t need to buy another one in the short term. It would be annoying when Amazon keep prompting about the deal of toilet seat.

  • Can UX fix a bad recommender?
  • Can a good recommender overcome bad UX?

While the quality of recommendation on E-commerce sit effect revenue, what is the strategy to increase profit? Furthermore, for subscription services like Netflix, what is the goal for its recommender?

The paper comes out with the idea of serendipity vs. ratability, but how do you measure the serendipity?

  • People seek familiar for comfort because they are stress.
  • People seek stimulation because they are bored

AI Design Principles

According to my classmate, most guidelines are like customer service guideline. However, do we want to replace customer service with the machine? Thinking about why you call customer service? The machine probably not able to fulfill your needs and that’s why you call.

Furthermore, most of these guidelines are effective in the final evaluative stage while they still don’t imply what AI could contribute to research, synthesis or ideation which are the most challenging part. Adaptive UI is only for the final stage but still, there is rich space for AI to add value

Someone says he feels problematic by inducting the observation into guidelines.

Mix-initiative principles

  • Should the use see the system taking the action of adapting?
  • If yes, do user want to see it more than 3 times?
  • When to show? How to show?

P3 Adaptive Mobile UI (Week 8–9)

Step 1

Identify 10+ familiar mobile apps that can improve their UX via adaptation.

  • Automate mundane, repetitive tasks; look for long sections of navigation and selection
  • Personalization (individual users, group users)
  • Make the UI context-aware
  • Other opportunities based on other signals that the app can potentially capture

Here is some reflection from step1.

  • It is really hard to optimize app from a big company. The user flow is well considered. Take Amazon for example, it takes several steps before checkout but it is hard to find ways to optimize the flow (Every step seems to be inevitable or not worth automating)
  • Many ideas fall into the context-aware category.
  • Many ideas need to think thoroughly about users’ behavior. Sometimes, the user will jump between different part of the app to use its functions. Or there is an only a limited set of keyword search. We will need to take that into consideration so that the design decision could be rational.
  • For example, if you want to optimize the experience on venmo then you will need to consider how people pay. For rent, each month have exactly the same amount of money. For utility bill, each month have a different amount of money. For eating together with friends, why people pay afterward rather than in-the-moment?

Step 2

Select 4 to explore further based on the following criteria:

  • Size of value experienced by the target users
    (How many people use this app and how often do they navigate the transaction you want to adopt?)
  • Value service provider gains by improving the user experience
    (Will users generate more value for service if they service invests in the adaptation?)
  • Ease of ML development
    (How likely can the adaptation be correctly triggered and inferred?)
  • Risk of errors
    (How much will false positive or negative impact user value?)
  • Proactive vs. reactive adaptation, intelligent vs. if-then adaptation

In Step 2, we have narrowed down to 4 apps, Venmo, Starbucks, Bumble and Vivino. Then we decide to investigate more deeply on Bumble and Vivino. For bumble, we think about several different ways for the user to directly talk to the person skipping 4 steps to chatting page.

For Vivino, we think about the users’ behavior at the liquor store and think about how to make the experience seamless.

ps. One thing we found out is that many popular apps might not be adaptive but already have streamlining user experience which makes it hard to be adapted.

Step 3

Select a single app, and then design new interaction flows that show both the current interaction and how this will collapse as the system learns. Consider:

  • How will the app collect signals needed to make the UX adaptive?
  • How can it motivate people to provide these signals?
  • How confident does the system need to be to trigger the adaptation?
  • How will a user recover from errors

Week 10

4/5 Meeting with Professor

For Project 3 NLP, our team had come out with 20+ ideas and voted to land on 5 final ones.

  1. Use Amazon review to generate a summarized report for specific types of product.
  2. Use News headline and content to illustrate the standpoint of certain news provider and provide a counter perspective to encourage civic engagement.
  3. Determine what makes a good ad by looking at the text of the ad and how many people click it.
  4. Analyze social media posts and groups to define if they are children-friendly.
  5. Analyze lyrics of songs users love as a guideline for artists composing.
  6. A new product could collect user’s previous post as a digital legacy and learn their tune to build a virtual agent of himself/herself as a memorial space for the other living to communicate with after user passing away.

The general feedback we have is:

  • Think about who wants to pay for the service? Does the effort worth for the company to generate extra revenue? Who pays for that? Most often, B2B is a better space than B2C.
  • Innovation is to use one totally doable idea and put into a different context to succeed. NLP has limited capability and could do simple things really well. Do not try to push the boundary of technology.
  • For the 3rd idea, maybe we can think about what makes a good twit? A popular twit is public information while a popular ad is not. Think about what form could be applied? Guidebook? Twitter suggestion bot? …etc.
  • For the 4th idea, John came out with another idea about analyzing corporation email to better evaluate employees’ performance. Even giving out instructions.
  • For the lyric one, it is more like a research question: do lyrics affect people’s preference on the music? It is not an application question yet.
  • The final one, digital legacy, it seems to ask too much from technology. Making a Harry Porter Bot probably makes more sense.




Product Designer @ iCHEF Taiwan

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How can governments keep algorithms accountable? A look at the first wave of policy implementation

What’s the formula for decision making about AI and other emerging technologies?

Neural Networks — demystified

A New AI Lexicon: EMPATHY

AI ML in Autonomous Network

Guess what? We have Exciting News

A New AI Lexicon: Human

A lyricist with no heartbeat

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jeffrey Chou

Jeffrey Chou

Product Designer @ iCHEF Taiwan

More from Medium

Collaboration in the digital age: The value proposition at the intersection of people and machines

Combining user research & data science methods

Right Thing, Right Design, Right Data

How to “voicify” an existing chatbot, part 2