4 steps to better goals and metrics
Marty Weiner | Pinterest engineering manager, BlackOps
“Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat.” — Sun Tzu
I’ve found over and over again that many organizations suffer from the same problem, goal setting . It’s not always clear how goals are set or how to set them. This is especially true of startups. I had to learn this process the hard way by making lots of mistakes and banging my head against walls. So you can spare your own head and a few innocent walls, I’m sharing my brain dump on formulating goals and metrics.
Goals are a combination of what you’re trying to accomplish in a defined amount of time and how you’re measuring progress against those accomplishments. They can be aspirational (goals your team hopes to achieve) and/or commitments (promises to others outside your team).
Goals are useful for several reasons. Here are two of my favorites:
- They’re great for picking your head up out of the trenches of engineering/product warfare to think hard about the direction you’re going and what you can reasonably achieve in a certain amount of time.
- Goals are an effective means of gaining alignment on where you’re headed while instilling a sense of urgency.
Everybody responsible for setting goals must know how goal lines are chosen (step 1) and the importance of hitting goals (step 2). In steps 3 and 4, I’ll discuss how to actually set a good goal. WARNING: Skipping past the first two steps will cause you headaches, loss of appetite, thinning of the hair and occasional nausea.
Step 1. Communicate how the goal line is chosen
Everybody has to know how the goal line is set in the company. The purpose is to ensure alignment about where on the football field we’re trying to run. Here are a few options to seed the discussion:
- Set goals that are incredible big stretches
- Set goals you’re 70 percent likely to hit
- Set goals you’re likely to hit
- Set goals you’re easily going to hit
Each option has interesting ramifications, and you should choose the one that works best for your culture. At Pinterest, we have a culture of setting goals that we can likely hit 70 percent of the time, and we push really hard to hit them. If we miss our goal, we discuss how we could improve on our strategy and/or tactics or our ability to set goals.
If it’s not clear which style the whole org uses, communication will break down, likely in subtle ways (you should read “subtle” as a curse word in this context). For example, if Alice assumes goal (2) is how the company operates and Bob is setting a goal for (3), Alice could think Bob is a sandbagging *expletive*. If Alice and Bob aren’t aware they’re working on different basic assumptions, their communications will likely break down and they won’t even know it.
Step 2. Communicate the importance of hitting a goal
Your company needs a well communicated philosophy of how goal setting and meeting/not meeting goals is treated. A few options:
- You MUST hit your goal!
- Strive hard to meet your goal. Big kudos if you do. Discuss what could have been better if you don’t.
- Goals are just guidelines. No big deal.
Again, it’s important to choose which one best matches your company culture. Even more importantly, ensure the choice is well communicated. The impact of poor communication could be that Alice assumed (1) and Bob assumed (2), and the goal wasn’t met. The result could be Alice is upset the goal wasn’t met, and Bob is confused because he feels that he did nothing wrong. Likely, Bob will be defensive , a sure sign effective communication has ceased.
So which do you choose?
Different companies and departments approach setting their goal lines differently. If you’re not sure which to choose, get the appropriate folks together, gather sentiment and choose one combination to start with.
After you’ve selected one “goal line” choice and one “importance of goals” choice, communicate it. Communicate it over and over again until it’s adopted in the DNA of the company. Communicate it at each goal meeting. Communicate it over beer. Then drink lots of beer so more communication happens.
Once you’ve had enough beer, it’s time to choose a goal.
Step 3. Choose a goal
Strive for a metric-driven goal, but not to the point of losing the human element. A great way to define goals is with OKRs. If you’re already familiar with OKRs, skip the next paragraph. If not, read on.
An OKR is an Objective and a Key Result. The objective is what you’re trying to achieve with this goal. It should be qualitative and inspirational (e.g. grow our user base to the moon). The key result is the metric you’ll be using to monitor your progress (e.g. grow to 10 million active users or version 1.3 shipped). A key result should be qualitative and specify a measurement window. Let’s improve the first example to “grow to 10 million monthly active users,” meaning we’re measuring ourselves against all active users in the last month. Here’s more about OKRs. Eat this stuff up. It’s good for you and low in sodium.
Furthermore, goals should cascade. At the highest level, you might have a goal around growth. Supporting that goal could be sub-goals other teams maintain for improving performance and reliability of site load time. Underlying that goal could be improving new server deployment speed. And so on.
With this in mind, let’s discuss choosing a solid metric. I hate acronyms, but I’m gonna use one anyhow. A metric should be Meaningful, Measurable, Operational, and Motivational, otherwise known as MMOM. (Why does “Operational” have to start with “O”? Ruined a potentially great acronym!)
Your metric needs to measure or contribute to your business objective in some fairly obvious way (or at least in a way everybody can agree on). Growing the number of active users to 10 million is a pretty good way to gauge your progress toward increasing the user base. On the other hand, using the number of episodes of Star Trek you can quote as a way to measure revenue is not so good.
The times you’ll struggle with meaningfulness will usually be when you have a metric that defines your objective pretty well, but not perfectly. For instance, does “number of times content is flagged” meaningfully measure bad experiences? Perhaps, but you’d prefer to know “number of times somebody has a bad experience”(which can be impossible to measure). You’ll have to make some tough trade-offs between metrics and constantly striving to improve.
You should be able to measure progress on a regular basis. For instance, if you want to improve growth, you could measure how many people visited your site in the last seven days, which is pretty easy to do with a simple map-reduce job.
But sometimes measurability can be damn near impossible. For instance, how do you measure how much money a spammer is making from your site? No matter how many times I thought a metric would be impossible to measure, we’ve found a way. It may have not been perfect, but starting with something, anything, will help push progress forward. For the spam metric above, we started with measuring how much traffic we sent to spammers’ sites in the last month. This at least gives us some approximation of their revenue on which we can start operating on.
A highly operational metric is one that your team can affect and see the effect quickly. You need to be able to move the needle on your metric? And, the faster your metric responds to a change in the system, the faster you can iterate.
Measuring “how many people are currently on your site in the last 10 seconds” is very operational. You could change the color of the home screen and immediately see if it has an impact. You could change the color 20+ times in the next hour, if you want.
On the other end of the stick, you might measure how many people return to your site after 14 days. Iterating now becomes much slower because you may have to wait 14 days, but sometimes that’s a necessary tradeoff so that you measure what you actually want to measure (more meaningful to the company goals).
You could also consider having a few goals, one highly operational one for your team, and, separately, one less operational one that’s more meaningful to the rest of the company. They should be closely related.
Don’t forget motivation! We’re dealing with people here. If people aren’t motivated, getting up in the morning to conquer this metric kinda sucks. Sometimes metrics themselves are motivational, such as increasing growth or increasing the snack to person ratio. The amount you push a metric can also have a heavy influence on motivation (discussed next in Step 4).
An important but naturally unmotivational metric can usually be remedied by discussing the impact or linking it to something more interesting. For instance, increasing light bulb brightness by 0.00003 percent sounds boring. Instead, how about stating the impact will increase our revenue by $3 million? Wow!
Others are motivated by mastery of the field. Relating this metric to how they’ll be the engineers of the best light bulb in the world can be quite compelling to some.
Some are motivated by the challenge. But, if it’s a damn near impossible goal, some will feel the it’s worthless or out of touch. If the goal is too easy, you lose others.
And still others are motivated in other ways: parties, money, recognition, beer, donuts, bacon. In reality, people are motivated by several of these at once. You must deeply understand what drives your team, not just for setting goals, but also for being an effective inspirational leader. Consider learning more about human motivation, starting with watching this great talk by Dan Pink.
Step 4. Push the liiiine!
Once we know what we’re measuring and how we’re measuring it, the next step is to figure out how far you can push the metric on what timeline, and why.
One very common timeline to operate on is quarters. Set up monthly check-ins with higher ups to build trust with them and provide relevant updates.
First, what is the time range to which your metric holds you accountable?
Some metrics should cover the whole quarter, such as maintenance metrics (e.g., don’t lose ground on availability). Some metrics, especially covering areas of fast improvement, could be unmotivational if the window is too long. For instance, measuring the teams performance before critical infrastructure has had a chance to be built doesn’t make a whole lot of sense. If you’re at 99 percent availability and want to push to 99.9 percent by the end of the quarter, you’re going to need to ship several key optimizations that may not be ready until half way. In this instance, perhaps it’s better to only measure against the last two weeks.
As a rule of thumb, I feel a metric should never cover a window shorter than two weeks. There’s generally way too much noise. Second, how far can you move the metric?
This is where things get tougher. Sometimes a gut feel is a sufficient answer, but propping your answer up with data can give you a far better guess of where you’ll be by the end of the quarter and build trust with everybody else involved. Use whatever data you have available and gather new data to understand what leverage you have.
Progress Over Perfection
When choosing a goal for the first time, it can be very hard to discover one that meets all three M’s and one O. Sometimes there are too many metrics to choose from, sometimes there seem to be none. Just remember, you’re better off choosing a less than perfect goal to begin with rather than nothing at all. In some cases, you may find there isn’t a perfect metric or even that a metric isn’t appropriate, but give it a good hard try.
Here are two examples of this model applied to different situations: site performance and amount of spam.
Example 1 — Site performance
Let’s set a goal around site performance. Say you have a young site that’s never measured site performance. You first need a baseline, an awareness of what levers to use to push the metric in the right direction, as well as a goal.
First, choose your strategy for setting goal lines and tell everybody in the goal setting meeting. For example, set a goal line that you can hit with a 70 percent chance, and, if you don’t hit it, study why and get better at setting the goal. Also, let’s assume you’re setting a goal that you’d like to achieve by the end of the quarter that’s about to begin.
After doing some profiling to determine why the site is slow and sometimes not responding, you find that the databases are causing major availability and latency issues and that performing some key optimizations can improve latency and availability:
Objective: Improve customer facing site performance
Key Result 1: Increase availability measured over the last two weeks of the quarter from 98.5 percent to 99 percent
Key Result 2: Decrease 99.9 percentile latency from 200ms to 100ms measured over the last two weeks of the quarter
When discussing performance, it’s always a good idea to include an availability metric (a measure of how often you return an answer to your client without timing out or erroring) and a performance metric (a measure of how fast the site loads for a reasonable portion of your users). (By the way, if you think you understand latency, think again.)
KR1 and KR2 combine to give a pretty great MMOM story. First, these meaningfully measure your customer’s experience. If you improve the metric, the (hopefully small) leap of faith you’re making is that user satisfaction will go up (you can measure that as another higher OKR).
These are easily measured. You could set up StatsD and get this data now. You could set up alerting to know when you might be risking violating your goal.
These metrics are measured over a week, so operationally they’re a little too long. But, you can measure the most recent five minutes and one hour of availability and latency and report that to your engineers. That represents the wider goal pretty well. That way the engineers know if an optimization has an impact in just a few minutes.
Finally, these metrics, in my opinion, are very motivational. Nothing gets my engineering gears fired up like making the site faster and more reliable! Plus, we’re talking about a fairly large jump Also, you have until the last two weeks to push really hard on some of those projects so the majority of the team can focus.
You should probably designate a (implicit or explicit) goal around maintenance, as well as some resourcing to that end. It’d be bad to go from 98 percent last quarter to 93 percent for most of this quarter and then back up to 99 percent during the measurement window.
Example 2 — Amount of spam
Measuring the success of reducing spam is surprisingly difficult. This is because there’s a paradox afoot. If I knew something was spam, I’d get rid of it. I’d like to know how much spam is left, but how do I measure the stuff I don’t know about?
At Pinterest we went through several rounds of refining our spam metric, and we continue refining to this day. As mentioned earlier, we assume we’re shooting for a 70 percent likelihood of hitting a goal. One metric we used to use was the following:
Objective: Reduce negative experiences of Pinners due to spam
Key Result: Decrease pin reports by 30 percent this quarter over last
Pin reports are a count of the number of times somebody flagged a Pin on Pinterest as spam. This metric is super easy to measure — just count the number of reports that have come in with any standard stats package. Operationally, looking at a range of three months could be hard to react to quickly, but this metric is simply a count of all reports. We can keep a minute-by-minute graph that allows us to observe and react to attacks quickly and see if our rules and models are effective within minutes. Therefore, this metric is still very operational.
This metric was very motivational. We saw occasional daily spikes which could push us in the wrong direction, and low level spam attacks that made the daily average higher. e tore into the data and while we felt that a 30 percent reduction would be tough, we had a strong plan of attack. We could also try lots of different approaches and react quickly.
Meaningful is where this metric is interesting. We want to measure negative experiences as a result of spam. Sometimes Pin reports are false positives (e.g., the Pinner flagged something they didn’t like, but it wasn’t necessarily spam). Additionally, Pin reports don’t really tell us how bad of an experience somebody had. Sometimes people don’t report Pins because they don’t know that they can. And, ideally, it’d be nice to know how successful spammers are (though this isn’t explicitly called out in the objective). However, Pin reports did show us when big attacks were hitting us, and they correlated with helpdesk tickets Pinners were sending in during big attacks.
As we’ve gotten better at fighting spam, Pin reports are so low that they’re largely noise. In response, we’ve now swapped to measuring how many MAUs (monthly active users) click on spam each day. This metric is more meaningful and maps directly to our spam fighting strategy (but was a bit harder to measure and operationalize at first).
Thanks to Jimmy Sopko and Chris Walters for sharing this goal setting, head banging adventure with me!
Thanks to Philip Ogden-Fisher and Sriram Sankar for their substantial feedback and insights!
Marty is a manager on the BlackOps team.