Breadth and Depth in Startup Metrics

Geoff Ralston’s (b * d) / c is only as good as the metrics you put into it


tl;dr: Breadth = % of users using, while Depth = key usage per user

Geoff Ralston recently wrote a post on his brilliant (b * d) / c formula for prioritizing product development features. You should definitely read his post if you haven’t. I call it Ralston’s Unified Theorem of Product Development.

Basically, the formula takes how many users will benefit from a feature (breadth, aka b) multiplied by how much it will improve their experience (depth, aka d) divided by how long it will take to build (cost, aka c). This formula is awesome and Geoff has used it in more than once in the Imagine K12 office hours our team at Trinket has had with him.

As he notes in his post, though, the key to success with this formula is picking the right metric to use with it. Are there any more specific guidelines we can use to pick these metrics? I think so.

Metrics are People too

Eric Ries’ famous line “metrics are people too” reminds us that product metrics should always relate back to users’ experience. There are two main ways to do this:

  • look at the ratio of users that do something, such as a conversion or utilization rate (i.e. users using / total users)
  • look at the degree of usage, usually through events per user or time on site per user

The former is perfectly suited to breadth measurement, while the latter is perfect to depth. Geoff is right that there’s a lot of going by your gut in early stage startups. But Ries forces us to make our assumptions and hypotheses explicit so that we can verify their truth. If you combine Ries’ explicit hypotheses with Geoff’s formula you have a perfect setup for validating your hunches while moving quickly and trusting your gut.

Breadth Measurement by Usage Rates

Anyone who’s built a product from scratch has run into a situation where they’ve built a feature or set of features expecting it to take off with a majority of users, only to find out it was in fact used by very few.

At Trinket this was the case with our HTML trinket. My hypothesis was that HTML was used in much more widely in classes around the world than Python is, so the new feature would soon dwarf our Python usage. Hypothesis: after a few weeks, the utilization rate of the HTML trinket would be greater than the Python trinket.

Well, I was very wrong about this. Python continues to be our most popular language and is outgrowing all other trinket types. The utilization rates told this tale easily and we haven’t seen any difference in depth of usage, which means that improvements to our Python trinket should outrank work on HTML. This focus has helped us deliver new features most of our users will use.

A nice feature of breadth measurement, by the way, is the ease of setup. You don’t need a full-on split test to gauge usage as long as the feature is presented side by side with an alternative, as is the case in our New Trinket dialog.

With the clarity of this new data, I realized that one of the problems we solve, quick and easy setup, was not experienced strongly by HTML teachers. So they were not searching for a solution as actively as Python teachers. Using Geoff’s formula, we realized that the breadth of the feature was not as great as we had predicted, so we decided to focus on improving our support for Python.

Thinking about breadth can also influence your product architecture. We now try to share as much code between Trinket types as possible so that feature improvements touch all users regardless of the language they’re coding in.

Measuring Depth with Events or Time

Depth is how much a feature matters to users. We can measure it by, roughly, usage per user. In most cases this boils down to how often they use it and by how the presence of the feature affects their time on site.

Measuring depth of engagement is more difficult than breadth because it most often requires a split test setup (aka A/B test). If you’re not set up to do split tests via feature flags, custom flows, or a service like Optimizely you should drop everything and get set up to do so. Without the ability to split test it’s almost impossible to produce the validated learning that startups live and die by.

Google Analytics events are a surprisingly robust and easy tool for this kind of measurement. At trinket we use them to understand how many times users run code, edit code, and interact with the code’s output (i.e. play with games). Each new feature ships with its own analytics event so we can track how many times users are using it.

Some features are important to users but happen rarely, like accepting comments from other users on a blog post (one of the many reasons I love Medium). In these cases, the real value to the user may need to be measured by how the presence of the feature affects the users’ time on site or number of sessions overall. To do this reliably, a split test is likely needed.

Picking one metric for (b * d) / c

So far we’ve talked about characteristics of metrics you should use for b and d. Geoff’s formula is designed to optimize one particular metric over time. So which one metric should you pick?

b is just a measure of how many people use a feature, and doesn’t have units attached to it. That means that d is your key metric. Your choice of interactions, signups, sessions per user or time on site per user will be the key determinant of how the formula ‘grades’ your product development options. Your product will go through phases when different metrics assume different levels of importance. Just make sure to pick a metric consciously and stick with it long enough to move the needle. Shifting your target metric will shift your priorities entirely.

Hope this has been helpful! Tweet at me with suggestions or comments!