How Do You Design With Magic?

Obstacles, insights & lessons learned designing a computer vision feature for myfitnesspal

Published in

The Startup

7 min readFeb 4, 2021

In 2019, I was a designer at MyFitnessPal. Over a few months, I worked closely with a small team to discover and design a solution for scanning food into a diary. Here’s a peek into my process.

The problem

For those determined enough to count their calories, a simple plate of food poses a few problems.

Imagine, a bright yellow cob of corn, smothered in butter. A pile of green beans. In the middle of the plate is a red hot dog, in a brioche bun, a dollop of mustard, ketchup and what looks like relish.

Before apps, the food on this plate would need to be tracked manually, with pen or paper, or looked up in a little reference book.

We have apps now, like MyFitnessPal, but the process is still laborious. For each food, you need to figure out what it is, type in something like ‘tomato’ and press search.

Searching and logging a food to your diary

With over 170m users, all adding unique entries, meant that for common items, there’s thousands of entries and wildly varying names, serving sizes and nutritional information.

The database was a jungle. It was described as a “messy filing cabinet”, a “jumbled mess”. Searching for foods was “confusing” and “tedious”. Our users were spending far too much time searching, and not enough time tracking.

The idea

What if with a wave of your phone, you could identify and match the food on your plate to the database? No searching, sifting, reviewing. A quick scan and done.

Outside of our headquarters, people were happily using apps with similar technology to translate foreign languages, assist people with blindness, identify clothing brands or the identify the species of birds or mushrooms.

The product team believed image recognition technology was mature enough to do this, and could help automate some of the many steps in the food logging process.

Our users loved our barcode scanner feature too, which had a similar ‘scan’ interaction – and importantly retained at a higher rate if they found and used it.

Over time, we’d also learnt that tracking calories didn’t need to be exact to be helpful. Most people are adding tracking common foods and just want roughly the right nutrition values.

Food scanning might not be as accurate as finding the exact correct item in the database, but it had the potential to be easier, faster, and hopefully build new habits. It was time to make it a reality.

Looking

A single item like a strawberry isn’t too tricky to scan. Unfortunately, research, and our gut, told us most of our users didn’t eat like this. I realized I rarely ate ‘single’ foods, even while snacking.

A big single scan button. One food at a time. This was a dead-end.

It wasn’t feasible to support every type of food combination, but the team agreed if this feature was to simplify and speed up logging, it needed to feel fast, not tiresome.

Snapping and logging a single food at a time wasn’t going to cut it. We would need to let users scan multiple foods, fast, in one session.

“Adding to cart” was a metaphor we used to make sense of storing multiple items in one continuous scan.

Thinking

Depending on the lighting, angle, distance, camera, and a bunch of other factors, detecting an object like a handful of goldfish crackers could take a few seconds. The model is doing a lot of heavy lifting, but since this isn’t apparent to the user until they receive a result, the experience felt broken.

After roughly mapping out the ideal scanning flow, it was clear we were missing two critical states: “looking” and “thinking”.

Mapping helped me identify missing states.

To communicate these states, we experimented with live tracking, color changes, dots, pulses and ‘snapshot’ effects.

The viewfinder could give feedback to the user: Hold still, I see a food!

We settled on a square viewfinder that we hoped would encourage users to frame foods – improving accuracy of the results.

Aside from the viewfinder, we explored guidance messages to explain how to use the scanner correctly

Reviewing

The next step was to display the results. One approach, was to show all the guesses, like a long stream of consciousness, floating in as the user was scanning.

The user would need to sift through and pluck out what they needed. I imagined users swiping the correct results. In testing, nobody knew to swipe.

Maybe users would swipe to reveal actions… They didn’t.

Another concept, that got more yawns from the design team, but that I kept coming back to was the concept of a closed tray at bottom of screen.

Sending results to a tray meant they wouldn’t obscure the live camera and was an affordance to access all the results at once.

By swiping up, the tray would fill the screen. We decided once the tray was up, we should treat it like ‘taking a breath’ and pause the scanner in the background.

Note: The most recently identified foods appear at top of tray, pushing older results down.

Swapping & Refining

We were dogfooding the scanner every day, and we realized it was very likely for the object detector model to make little mistakes, like thinking a peach was a nectarine.

Luckily, the model returned something called alternatives. For every commonly logged food item, we had a small list of similar looking foods we could display.

Initially, we displayed alternatives on a the secondary review screen, after you finished scanning.

During design critiques we realized that if the initial guess wasn’t correct, nobody would tap into it. They would never see the alternatives, and be frustrated with the false positive.

So we included the alternatives within the result card and let you tap to swap if needed. In the future, these could be ranked, and personalized based on your logging history.

Removing

While the alternatives were great for quickly swapping or refining an item, sometimes the model returned something really wrong. You’d be pointing the camera at a bowl of rice and it would see marshmallows. It was painful.

The obvious idea was to include a small ‘x’ to delete items. But during testing, users either struggled to accurately hit the tap target, or preferred to select foods that were correct instead.

So we switched to a checkbox, reversing the interaction. The incorrect results could simply be ignored and would not be logged to your diary.

Final thoughts

The first iteration of meal scan shipped (to subscribers) in December 2020 as a new way to track your nutrition and calories without spending so much time searching for foods.

Members can now point a camera at food and get real-time suggestions for food it sees. You can tap to add items directly to your diary — a single food or a whole meal.

This was one of the most unique products I helped create at MyFitnessPal. Here are a couple of things I enjoyed and learnt along the way.

The power of a polished prototype: I learnt and lived in Principle, showed work often and early, worked with video, motion, new gestures, touch interactions and managing timing, all crucial to making the experience feel right.
Known unknowns: My team held a lot of assumptions that we had to constantly rank, challenge, test and validate. With new interactions, technology and use-cases, we were flying blind but our team became fluent talking about risk and what we did and didn’t know.
Bet big: I felt uneasy about developing and building out something fairly unproven in the market and with our users. We constantly asked ourselves, will people find this valuable? But big bets can light a fire — encouraging creativity and strengthening relationships across the organization. We developed a robust practice of onsite usability testing, and lightning fast mobile dev builds. There’s a healthy medium to be found between iterating and fixing bugs and doing something bold.

Over the next few years, product designers will face exciting challenges and opportunities enabled by machine learning. The technology already feels like magic, but it will be our job to make it usable, beautiful, and useful for everyone.