The Art and Science Behind Effective Product Goal Setting

Paige DeRaedt
strava-engineering
Published in
8 min readMay 17, 2021

As analysts at Strava, we work on many different subject areas that range from defining how we track the usage of features our teams build and supporting experiments, to sizing opportunities for new projects and investing in forward thinking research. However, one of the most exciting areas that analysts support our teams on is choosing metrics and setting goals.

Like the title of this post suggests, choosing the right metrics and goals is both an art and a science. At the most basic level, we want our metrics to be easily understood by our teams and sensitive enough to be affected by our team’s work, but also directly connected to larger company initiatives. We want our goals to map to the intent of what we are building, and be a healthy mix of aggressive but achievable. How do these come together? We identify metrics to reflect the impact of our work, and then set goals for how we’d like to see those metrics move. Let’s dig into four principles for doing this effectively:

  1. Ease of understanding
  2. Balance of stability and sensitivity
  3. Connection to the big picture
  4. Mapping to intent

Ease of understanding

If our metrics closely measure what we are building, they will be easy for people to comprehend and more motivating for those who have the ability to make an impact. There is, however, a delicate balance here. If our metrics are too specific to a project, they can become more complicated, and can be harder to connect to company and business goals. On the flip side, if a metric is too high level, it can feel too abstract and less motivating for the team. Implementing events to be able to calculate a metric is a prerequisite for tracking it, and having a metric that is easy for many to understand can often mean it’s easier to implement tracking for as well.

Stability vs sensitivity

In addition to a metric being easy to understand, a metric must also be a healthy balance of movable but not too noisy. Teams need to be able to see a change in the metric for the work that they do in order to be motivated by it. If a metric is so stable that it hasn’t varied by more than 2% in the past year, it might not be sensitive enough for a team to see movement when a change is made. However if a metric varies by 5% each day, it might be too noisy to detect changes. Experimentation is a lever we can pull to understand the incremental impact of any changes when all other factors are controlled.

Connection to the big picture

Finally, whatever teams decide to measure and work on must ladder up to larger company goals. In order to do this while also effectively managing the ease of understanding and balance of stability and sensitivity, we use an input → output metric framework to bring everything together. Input metrics track early indicators that are closely connected to user behavior, and represent what we need to get right to generate the outcomes we want. Output metrics often lag quite a bit and represent a combination of many different actions that are taken. This framework helps segregate more abstract or higher level success metrics into various independent secondary success metrics that are functions of different components of the product.

Mapping goal to intent

Now that we’ve gone through the work of choosing metrics, we must set goals for them. Even with the “perfect” metric, setting the wrong goal can be detrimental. Setting an overly aggressive goal can end up in waning motivation as success seems unattainable. Setting a goal that is reached too easily can lead to missed potential and opportunity. Goals that are directly mapped to the intent of what we want to achieve from a user or business perspective end up being the most successful.

In a recent round of metric and goal setting I was reminded of my favorite example of finding a set of metrics and goals to fit an amorphous problem: Strava’s mobile app performance. After a rigorous product planning process, Strava decided to focus on the theme of performance, or more specifically, our mobile app load times. Because we ask athletes from all over the world to venture outside and try an activity that may take them well outside of the bands of LTE connectivity, we committed to making sure that our app performed even with poor connectivity.

There are plenty of places in our app where we could focus, but we decided to zero in on the very first moment that athletes experience Strava: the time it takes for an athlete to click on the Strava app on their phone to see content in the landing page, our feed. In order to embark on work in this new space, the team first started with implementing events to track the time it took for someone to click on the Strava app to be able to see and interact with content in the feed. We began referring to this as Strava’s “time to something useful” or TTSU. This chart shows the distribution of load times over the course of one month of data for our athletes loading the Strava app before we started working on improving our performance.

Distribution of feed load times over one month

Upon researching industry standards for app load times across the world, and looking at where we were starting from, we wanted to target a specific TTSU for as many athletes as possible. We knew that the TTSU we chose wasn’t state of the art, but that it would bring a good chunk of our athletes from what we considered acceptable to good performance. And so our input metric was born: % of feed loads under a threshold time. This chart shows the % of feed loads under a threshold time over the course of a couple of weeks after we started work on improving performance. We can see an early update that we made on the iPhone platform showed a step change increase in the % of feed loads under our time threshold early on in our performance improvement work.

*Previous week overlay is grey line

This metric was very closely related to the types of changes the team was making in the product, and sensitive enough to see movement after an early test improvement that the team made. However, we had no concept of how this might contribute to higher level company goals. It’s definitely nice if our app loads faster, but will this make athletes use our app more frequently or become subscribers more often?

Now we begin the quest for our output metric. Given that improving app load time performance affects all athletes on Strava, and that we hope that this ultimately results in athletes using the app more to record activities, see their progress, and engage with their community, we decided that a retention metric might be a good output metric candidate. While we do want athletes to find value in using Strava, our end goal is not to optimize for time in app. We want to encourage people to be active and hope that making our app more accessible via faster performance can help facilitate how they celebrate their efforts using our app as their tool. Our core focus is on creating more value for our athletes, but retention is also strongly correlated with free athletes subscribing and subscribed athletes staying subscribed. Not only does it measure people finding value in coming back to Strava, it helps athletes discover the value of our subscription, which is a win for Strava’s business as well. Strava defines retention as the % of athletes that were active at t0 (starting time) that are still active at tn (some amount of time later).

Ideally, we want to optimize for the longest term retention possible for athletes, but we also want to be able to move and learn quickly. Understanding which tn retention timeframe we should pick was fruit for further investigation. We started by visualizing a bunch of different retention timeframes for both new and existing athletes:

Retention rate windows over time for new and existing athletes

We learned that shorter term retention seemed to trend pretty closely with longer term retention (noting that we need to consider seasonality), so we chose to track short term retention as our output metric for both new and existing users as it would allow us to learn more quickly.

Now that we had our input and output metrics settled, we needed to set meaningful goals for them. Given that both mobile platforms (iOS and Android) were being tracked separately, starting from different baselines, and had different limitations, we decided to set different goals for each platform. We were starting from pretty low baselines, but we knew that we wanted more than half of athletes from each platform to experience close to industry standard load times in principle. We mapped our goals directly to our intent for our athletes and set out to improve the % of feed loads under a threshold from 5% to 66% on Android and 3% to 75% on iOS.

Through the incredible collaboration of both server and mobile engineers, the % of feed loads under a threshold input metric facilitated a healthy competition between platforms towards their goals. After 6 months of work putting in improvements to consolidate, speed up, and parallelize processes all throughout our app startup, the team surpassed our input metric goals and have even maintained them 2 years later.

You might be wondering…but what about the output metric?! Through a series of experiments, we were able to successfully find a causal relationship between improvements in feed load time performance and our output metric of short term retention. We attributed our work to a 4% increase in retention that we were able to project to hundreds of thousands more athletes retained over the course of a year, and an increase in our subscription bookings from those retained athletes. Speeding up our load time made Strava more accessible to athletes in countries with poor connectivity, and allowed other athletes to access meaningful content faster. Because of this, athletes are coming back to Strava even more to connect with their network and accomplish their goals. The video below shows the difference in athlete experience between where we started (left) and where we were after this effort (right):

Performance work, before (left) and after (right)

While Strava worked on this performance initiative over a year ago, this problem space felt like a big inflection point for collaboration between analytics, product, and engineering, and set the tone for how we are choosing metrics and setting goals today.

--

--