Metrics every design team should be tracking.

Ricky Johnston
UXR @ Microsoft
Published in
10 min readDec 13, 2022

Welcome back to our series on taking a product from zero to data science. A bit of a detour from the continuation of my planned blog. I thought a deep dive covering the bare bones, fundamental metrics every design team should be tracking was necessary before we discuss causal inference and experimentation (because we cannot do either without metrics in place)

Why should Design be paying attention to metrics? Isn’t that a focus for Engineering or Business?

In short, designers need feedback on designs. Every designer desires to improve the customer experience. True improvements in customer experiences result in behavior changes which will show in the product usage. Measuring this impact will provide designers useful feedback on what works and what does not work. Sharing results between designers helps future designs and inspires more focused creativity. With experimentation in place, designs can also be tested before full releases to select only those improving the user experience (a great benefit for reducing user pain from fruitless product changes).

How can I discuss metrics with non-data-oriented people?

Everyone in the workplace has their specializations and weaknesses. Designers tend to not be deep data people (and I don’t try to be a designer). Even without this specialized skill set, a Designer, PM or other product decision maker should be able to clearly define, in a simple sentence, how a design (or feature) improves the customer experience. With a proper set of metrics in place, this written design intent can be defined (or proxied) as a combination of user metrics it intends to impact. A data analyst can partner with a designer to translate this into meaningful metrics. With a few walk-throughs designers will pick up the rhythm of how this works and start identifying metrics themselves.

For example, if a new design intends to make creating a new bot easier in Power Virtual Agents; we ought to be able to measure the improvement through an increase in the number of bots created, a decrease in time to create, a decrease in access to help or reduction in support tickets filed or some combination of like metrics. At an even higher metric level, (in the long term) this design should create an overall better product resulting in higher retention, higher satisfaction, and more revenue. Initially, the design impacts may be too small to measure, but in theory, even the smallest changes should have an incremental impact, which metrics can help evaluate.

Alright let’s get started.

This blog post is going to layout some fundamental metrics; think of this list as the essentials for design teams. Depending on your product’s stage, many of these will already be covered by the product team. Don’t reinvent the wheel. Check with your product team to see what they already have in place before building. Building these metrics will require interacting with PM/Dev anyway.

Below is my suggested list of essential metrics every design team should be tracking followed by a description and deeper dive discussion about each of these metrics. None of these are new or groundbreaking nor is this list comprehensive of everything. That's why these are fundamentals. 😊

Metrics every Design team should track

High level metrics for all products :

· North Star metric

· MAU (Monthly active users) with audience cuts

· NPS, NSAT or CSAT (at least one)

· Revenue

User level metrics:

· New User / Acquisition

· Retention (Churn)

Task completion metrics/funnel:

· Task completion funnel

· First Run Experience funnel metrics

Usage level metrics:

· Feature engagement

Now that you have seen the list lets deep dive into each of these in more detail.

Deep Dive

High level metrics for all products

Product teams typically track and report certain metrics for all products. These are typically the set of metrics that are established as OKRs/KPIs. Your larger organization probably already has these OKR/KPIs covered. A common question asked is “How did my design change impact <insert KPI>?” The problem is most of these metrics are too high level for an individual feature or design change to be tied to. They typically exist for senior leadership and goal setting and not small incremental changes. So why are these on the list? First, everyone in the organization needs to align to the high-level goals. Second, all metrics should (at least theoretical) tie back to driving a higher level OKRs. Ideally, causal data science projects should be done to show the connection of all metrics back to the OKRs (if you have time and adequate data)

Most high-level metrics tend to be aggregated lagging indicators. Once you observe a change occur the true event that caused the change happened a while ago. Aggregations often occur at a monthly level where it can take months before change is statistically significant to show. Proactive product designers need leading and real time indicators to track your releases against. This calls for lower-level metrics. Before we get to lower-level metrics let’s discuss the most common high-level metrics.

Monthly active users (MAU) — This is the standard metric used at the LT level for target setting. It should already be in place for your product. Defining what an active user is a tricky endeavor and is product dependent. Most products use at least one action but this can have many consequences on how much ‘true’ product usage lags. Often these metrics are reported in quarterly earnings.

User Satisfaction metric- A way to measure how happy, satisfied, or delighted customers are with the product. Typically, this is done through directly asking the user in some fashion often as a pop up in product or on the web page. Some of the most common questions are Net Promoter Score or Net Satisfaction.

NPS — Net Promoter score. The question: “How likely are you to recommend <product>?” Responses range on a 10 point scale, where 9 and 10 are promoters, 7 and 8 are passives and 6 or lower are detractors. The score is then calculated by subtracting the percentage of detractors from percentage of promoters. The thought behind this metric is to measure customer loyalty. One challenge of this metric is some products do not lend themselves as things that are “recommended to others.” Users then feel unsure how to answer or feel the question is irrelevant to them.

NSAT — Net Satisfaction. The question: “Thinking about your experience in the last 3 months rate your satisfaction with <product>” Responds options are on a 4 or 5 point scale: very satisfied, somewhat satisfied, (neutral), somewhat dissatisfied, very dissatisfied. The scoring is done by taking the top box and subtracting the bottom two boxes then dividing by the total number of respondents. When selecting between 4 or 5 options the metric maker must decide if they want to force a positive or negative opinion or allow a neutral.

Revenue -Revenue generated from paying customers licenses. Depending on the life cycle stage and role of your product this might not be a relevant metric. Instead, there might be a metric related to how it contributes to the key metrics or revenue of some revenue generating product at your company. For design, revenue is a lower priority and not a focus of this blog

North Star metric- All products should have a metric measuring the intended use of the product. There is the possibility your products NorthStar is MAU, but most likely this is not a strong enough indicator of true product purpose. This might be a metric your organization has not defined and you might need to help them out. One way to help define this metric is to ask the simple question: “What does this product help the user accomplish?” or “What is the purpose of this product?” If you created a one sentence explanation of “what the product does” is a great way start to define a good candidate of what you want to measure. Then you need to figure out what metric best proxies for this.

For example, Power Virtual Agents helps people create bots so that others can find solutions to their problem. Suggested NorthStar metrics could be something like number of bots published or number of times bots run a successful session. Examples from a variety of other products are shown in the table below:

Sources: https://www.growth-academy.com/north-star-metric-examples; https://growwithward.com/north-star-metric/#north-star-metric-examples

User Level metrics

User level metrics are a natural outflow of monthly active users (MAU). If we break down what comprises MAU, we see where the change in users is coming from.

MAU = New Users + Retained Users + Returning Users — Inactive Users

Why split this out? So we can see what’s moving MAU. For example, MAU might like it’s growing but in reality, has a leaky bucket where the number of new users added is greater than those going inactive/churning. Even though MAU looks good the product is in trouble from heavy churn which will dramatically show once new users are no longer joining.

Returning and New users need to be defined by your team in terms of how long a user must be ‘gone’ before they count as churned and a new user again. Some products might require an inactive user to be gone for multiple months before counting as new (or never new again).

The churned user does not show directly in this equation because it depends on how your product defines “churn.” Some products will define it as one month away, others will require multiple months away to be churned. It really depends on the purpose of and frequency of use of your product.

MAU is a lagging variable. Once a user has churned it is a minimum of a month before you will notice and a trending decline in MAU will take multiple months to see. Additionally, MAU has the challenge of defining ‘active user’. Is visiting once for a second count? Many products eventually define an engaged user metric later for more clarity around tracking long time users.

Task completion metrics/funnel

In most cases, the task completion funnel flows naturally from the north star metric. What are the needed steps to take to complete the main task of this product? With the north star defined we break the metrics down into steps. Set up these up as specifically tracked metrics in a funnel dashboard view. The more steps you can break this into the better you will be able to notice problem areas and target each with a specific design change. Probably the biggest challenge here is building a clean flowing funnel since many steps are optional or looping steps can occur.

New User funnel

Setting up a specific funnel for tracking new users is particularly helpful with finding problematic areas in attracting and retaining new users. These first run experiences (FRE) contribute to the count of new monthly users and retention of those new users impacts MAU numbers. A first run experiences funnel is great for near real time measurement. Bad first run experiences will show failed task completions or a loss of users quickly. In comparison, MAU will only show new user lose more than a month later. Building a funnel tracking first run experience and activity completion is a great way to see where your product struggles in retaining new users. This funnel might be the same as your task completion funnel (in which case you want a filter on that for new users only), but likely you also want to include other steps. Before the first project can begin other activities need to occur, such as creating an account, login, set up, etc. These are part of the first run experience funnel and key for design to track for making the best first impressions. First run experiences are great for building and testing intuitive consumer focused products. How quickly the user feels comfortable with the product is much easier to observe on the first usage of a product.

Usage (feature) level metrics

Some features of a product do not fit into a clean funnel flow, but we still believe they are pivotal to the product. These features should still be tracked in a dashboard/scorecard. Each of these features should have a justification of how they tie to driving other key metrics that in the long run drive our north star metric and our SLT metrics. An example of this can be seen from how Spotify ties ‘input’ metrics, which feature engagement metrics, to its larger north star metric of time spent listening to music. When outlining feature metrics there should be a write up (or nice visual) of how these connect to the higher-level metrics.

Image source: https://www.reforge.com/blog/north-star-metric-growth

Concluding thoughts

Once these metrics are set up it opens the door to many follow-up questions and metrics for further actionable and customer improvement. What is causing my metric to move? Who is causing the metric to move? If we do this, will it impact these metrics? How did our feature/design change improve the metric? These are all questions of causality, which means we need experimentation or causal inference methods to answer them. In addition to generating questions, these metrics become data sources for experimentation, causal inference, and other advanced methods. In my next post I will cover causal inference methods to build on top of this.

--

--

Ricky Johnston
UXR @ Microsoft

A PhD tech data scientist who also does health research on the side.