Lessons Learned from Google Mobile: Apps Need Instrumentation

Published in

@ Promoted

6 min readAug 11, 2022

Many apps don’t have instrumentation in the form of usage stats and alarms when they’re first launched. Often these features aren’t prioritized, either because leadership believes them to be not as important as other features, or because they are hard to implement. This can be a huge problem for reasons that I’ll get into below.

Usage stats and experimentation are crucial in understanding user behavior, and by extension they are key to growing your market as quickly as possible and avoiding expensive pitfalls along the way. Furthermore, usage stats are required for alarms in production. If you don’t have these practices in place, then your team is potentially leaving a massive chunk of revenue on the table.

Also, not all instrumentation comes in the form of logged stats. I’m going to talk about user studies, which falls under the instrumentation umbrella, but is often overlooked entirely in the design and development process. User studies are a critical tool in understanding how your users see your app, and can unlock insights with minimal investment on your part.

Case Study

Although I hail from Maps, here’s a cautionary example from another big mobile team I worked with at Google. Details and numbers are changed, but the spirit of the lesson remains intact. This mobile team once updated their entire UI with Material Design back in 2016. The update was developed with a build-time configuration switch, so that a single binary couldn’t contain both the old and the new UI.

After months of internal dogfooding, Google launched Material Design across the board with great fanfare. As part of the launch, this mobile team released a new version of their app on the App Store alongside many other highly-visible Google properties, with Material in the spotlight. All went (seemingly) as planned, and the mobile team had cupcakes to celebrate.

Almost two years later, a backend engineer working on the project noticed that servers were over-provisioned for the amount of traffic the app was generating in production. What was going on? The backend team did some archaeology and found that capacity estimates were based on traffic when the app first launched, several years ago. But that didn’t make sense — current traffic was almost 40% less than those initial numbers, even though daily active users had more than doubled. How could that be?

It turned out that although DAU had doubled, usage of that backend feature had dropped nearly 75%. Frantic meetings were called, eventually looping in the mobile team. The mobile team quickly confirmed that usage of that feature in their app had taken a nosedive compared to earlier numbers.

A senior PM convened a series of user studies around the app, and discovered a likely culprit for the drop in feature usage: The Material Design update of the feature’s entry point had deleted a labeled button in the UI, and replaced it with a Floating Action Button (FAB), which contained only an inscrutable icon. Some users in the study didn’t know what this icon meant and were afraid to tap the FAB. Other users didn’t notice the FAB at all, despite its colorful prominence. Only a minority of users even tried to use the FAB. For the most part, users in the study assumed that Google had dropped the feature from the app entirely!

It seemed likely that users in production would behave the same way. The PM arranged for a test. By this time in the app’s lifecycle, there was a robust experimentation framework in place, and user interaction logging had been a practice for years. The team launched an experiment in which they put a text label on the FAB — contrary to Material guidelines, but necessary for the test.

The results were conclusive. Users who saw the text label flocked to the feature, and showed nearly 30% more usage compared to the control group. It was undeniably stat sig and the team acted on the result. A Material Design refresh in 2019 created better affordances for the feature, including omnipresent text labels, and usage improved remarkably.

Unfortunately for the team, and for Google, usage of this feature as measured by a percentage of overall user base never recovered to pre-Material levels. It turned out that most power users, who thought the feature missing, had already switched to competing apps with strong ecosystems and were loathe to switch back.

The postmortem was interesting.

What Went Right: Logging and Instrumentation

The mobile team had implemented detailed user interaction logging many years ago, and so it was easy to compare numbers pre- and post-Material. This made the discrepancy easy to spot, once the team knew what they were looking for.

Logging usage metrics establishes a critical baseline for new changes and features. Without this logging, it’s almost impossible to be certain that changes to the app don’t tank your bottom line. Even changes that seem overwhelmingly positive on the drawing board, such as Material Design, can cause changes in user behavior that are nearly impossible to predict. Never launch without measurement, especially when the change is in part of your revenue flow.

What Went Wrong: Lack of Experimentation

In their eagerness to release, the team had decided to forgo a runtime feature flag in favor of a build-time feature flag. This saved engineering time, but made it impossible to run an experiment or roll back the feature in production. This lack of experiment caused the team to miss a crucial change in usage prior to 100% launch.

Once experimentation became a regular practice on the mobile team, it was simple to launch an experiment to confirm the PM’s hypothesis. This allowed the team to move forward on a single solution using hard evidence, instead of guessing at the effectiveness of multiple solutions, and having to implement them all.

Experiments are not only useful for preventing losses, but can also help you grow your marketplace more quickly. It’s possible to launch multiple branches of a UI treatment under experiment, for example, and figure out which one tests most positive. A longitudinal study involving a holdback population can ensure that your app usage isn’t regressing over time as new features are introduced. Experiments are the key to fast growth, and personally, I’d never plan a major launch without one.

What Went Wrong: Lack of External Testing

There were no external user studies done prior to the Material launch. The team had assumed that internal dogfooding would be enough to catch any serious issues. But here again was a flaw in their logic: Internal users at Google had already been testing Material Design for months, and had been privy to countless UX presentations around the beauty of FABs. They already knew FABs inside and out, and as such, they were a poor representation of the behavior of external users.

In my experience, user studies have always revealed new and fascinating insights as to how people interact with your product. Sometimes they have a mental model that differs completely from the way that UX and engineers think. Other times they base their entire workflow around some embarrassing clunkiness in the UI. In any case, user studies provide immediate and actionable feedback around the state of your product, and can reveal huge growth opportunities.

I have also seen user studies done as a prelude to full experimentation. Studies can sometimes be run with only UX mocks, thus saving engineering time. A study with good representation can save you weeks of work in building and launching a UI or an experiment, only to redo the work after finding a UX design issue. Like the old woodworking saying goes: “Study twice, build once.”

What Went Wrong: Lack of Alarms

Alarms are like unit tests. It’s worth implementing them on seemingly mundane or stable systems to ensure that future work doesn’t break these systems. It should not have taken two years for Google to notice a change in feature usage. If there had been alarms back when Material first launched, then perhaps the team could have pushed a fix more quickly, and retained some of those users they lost.

Summary

It was an expensive lesson for Google. Don’t let it be an expensive lesson for your marketplace. An investment in instrumentation and alarms at the outset can have a big payoff. There’s no shortcut to growth, but these come pretty darn close.