The Google PM Analytical Interview —The Most Common Mistakes

16 min readAug 9, 2022

The Most Common Mistakes in the Google PM Analytical Interview

This post will take you through a sample response and common mistakes to an analytical interview question I used for years as a PM interviewer at Google (I left Google in April of 2022 for a role at Shopify). Candidates typically spent all 45 minutes on this question and were evaluated on multiple analytical skills.

This post assumes familiarity with:

Commonly used tech metrics*
Fermi estimations

*Knowledge of both is not always expected for Associate Product Manager (APM) interviews, but I still recommend spending a little time familiarizing yourself with the basics of each.

The question

You are responsible for launching the offline downloads feature for the Netflix mobile app. What does it mean for this feature to be successful? What metric do you care about most to measure success and why?

If we had time, I also asked:

What magnitude of change should Netflix be targeting with this feature? (ex: +1%, 10X, etc.) Success here can be anything from “help more users accomplish a task” to “grow revenue,” etc. Explain your reasoning.

Why this question

I asked about Netflix because it’s a widely used service that most people know. The goal of an interview should never be to trip someone up by asking about an app they don’t know! Also, if someone had just been asked a question about offline downloads or Netflix by another interviewer, I could replace Netflix with another app and “offline downloads” with another feature. The important thing was picking on a feature that adds value but probably wouldn’t make or break the business. For evaluation purposes, we don’t want the answer to be too obvious, like “success = more revenue.”

Interview Structure

There are many ways this question can be answered well. This post walks you through a good sample response with common mistakes. I also explain what the interviewer assesses in each step to show what matters and what doesn’t.

Ideally, the full answer will touch on most of the following:

[optional] Who is this for?
How big is this group?
What is the behavior change we want to see?
How do we measure this change?
How much change makes this investment worthwhile? Justify your answer.

For what it’s worth, most conversations do not follow such an organized outline, and this is not expected. But as the interviewer, I try to have the candidate weigh in on all of the above.

[OPTIONAL] Part 1: What is our goal and/or who is this for? (<5 minutes)

Identifying the high-level goal is a great first step in almost any PM interview question, not just the analytical ones. To be fair, thinking through specific use cases taps into traditional “product” skills more than “analytical” skills. However, I’ve found it helps most candidates think through the relevant use cases, which can set a foundation for defining behavior change. Still, plenty of successful answers skip this step altogether.

A common [good] answer is:

Offline downloads should help users with limited or no connectivity still use Netflix during these times.

Ideally, a candidate will spend a couple of minutes exploring the solution space. To simplify things, I told more recent candidates to assume we were looking at pre-COVID behavior.

Any subset of the following would be good use cases:

Travelers — users who experience no connectivity during situations like air travel, camping, or visiting other destinations with no internet.
Commuters — users who low or no connectivity on public transit
Slow or unreliable wifi at home — this could include everything from poorly wired homes in the US to those living in countries where everyone has slow connections.
Pricy data — Countries where users pay per GB of wifi used

Common mistakes:

The good news is I don’t see many mistakes here! Just be careful not to spend too much time on this. If you aren’t sure, ask the interviewer if you should keep brainstorming, go deeper on some, or move on.

Part 2: How big is this group of target users? (5 minutes)

By this point, many candidates have already identified one use case as the most important. If they haven’t, I will ask them to pick one for the opportunity sizing. There are many good answers here, so just make sure there’s a reason behind the choice (ex: one group is a subset of another, the pain point is more acute for one group, etc.).

Let’s say the interviewer asked to focus on commuters. How many Netflix users will this be?

This is where the Fermi estimation comes in! Explaining your thinking is more important than coming up with the perfect final numbers.

In a case like this, I usually give the following instructions:

Focus only on the US (or Europe if the candidate is there, or any sub-group)
Assume Netflix has 100 million users in this area
Don’t worry about the distinction between users and subscribers, just think about individual people logging in and using Netflix

Depending on how the candidate is thinking about the target user group definition, they may need to make estimates about:

What % of subscribers (or the US population) work (or some other proxy for how many commute)
How many of them can watch TV during the commute? (What transportation methods matter here? It’s probably safe to assume commuters using train/bus matter more than than drivers, cyclists, walkers, etc.)
How many users have a long enough commute to consider watching Netflix?
Bonus — how many are already able to stream Netflix?

There are lots of great resources on Fermi estimation techniques, so I won’t go into too much detail about this, except to reiterate that simplifications are necessary not to spend too much time here (ideally just a few minutes tops). If you aren’t sure if a specific simplification is okay, you can always check with your interviewer.

Let’s say someone’s Fermi estimation came up with:

50% of subscribers have a daily commute, so ~20 days a month
10% of these commuters use public transit
To consider watching Netflix on the commute, the commute time one way should be at least 20 minutes (lots of numbers you could argue for this)
If the average commute time is 60 minutes round trip, let’s estimate that 80% of these public transit commuters have 20 minute stretches

Remember, specific numbers don’t matter as much as a clear breakdown of your methodology!

Target users = [monthly active users]* 50% * 10% * 80% = 4% of 100 million, or 4 million Netflix users.

There’s your opportunity size!

Most common mistakes

Candidate is uncomfortable making an estimate with limited information
Coming up with a surprisingly large or small number and not going back to double-check the math (ex: if you somehow calculated that the offline downloads feature would add a trillion dollars to Netflix’s revenue and said this was reasonable)
Spending too much time on a part of the estimate that won’t really affect the final number (ex: how should we account for commute delays in expected watch time? How often do commute delays happen? Are strikes of transit workers more common in some countries? You only have 45 minutes, don’t go there.)

Part 3: What is the behavior change we want to see? (5–10 minutes)

A common answer is:

Candidate: Netflix cares most about subscribers to sustain the business, so the success metric for all features, including this one, should be increasing subscriber growth or decreasing churn.

This is a reasonable answer, but candidates who have worked at large companies with mature products already know metrics like churn are almost impossible to move overnight with the launch of a single feature like this. Often the impact of a specific feature on churn is only something established after months or even years of data. But candidates shouldn’t be penalized for not knowing this! If someone suggests churn, I give them this additional context.

Interviewer: Let’s say that historically no individual feature changes had ever been shown to directly impact churn in the initial months post-launch. On mature products, it’s more likely that the impact from a cascade of changes will compound over a year or two. After some time has passed, you might find a relationship with churn for some users in a longitudinal analysis. Is there any other change you could look for right after launch that would indicate the impact was positive?

Some good answers

Depending on the group chosen earlier:

Significant time spent on offline viewing from target users (ex: commuters) — this is a simple approach, but what is the threshold for “significant”? Also, be prepared to justify why an outcome where more offline viewing only from people in well-connected countries is still a good outcome.
More viewing from users with bad connectivity (bad connectivity users, low connectivity countries) — ex: this is critical to broadening its reach in other countries where Netflix has more growth potential
Watching new times of day (commute) — should include an argument to convince me this is better than just offline viewing, ex: a new time of day indicates people are forming a new habit, which makes Netflix stickier. “Oh no, my commute would be awful without Bridgerton!”
Offline viewing results in a greater variety of content viewed per user — this approach also requires a way to measure content diversity. You also need a hypothesis as to why this matters (ex: if people spend more time re-watching the same episodes of The Office, is your feature a failure? Convince me!)

Less good answers:

More Netflix watching in general, online or offline — we do hope this is the long-term outcome, but it’s going to be very hard to know what extra viewing was from this feature or something else (say, organic growth or the release of a new hit TV show)
Lots of downloads — this could still be a huge number even if no one ever watched anything. This could also mean a very small percentage of users downloaded a lot of content, but ultimately the feature had low penetration.
More downloads at airports in preparation for flights — since we are starting with zero downloads, this will be positive no matter what.

Most common mistakes

Jumping into the metric details too quickly: To be fair, defining the larger behavior change isn’t required for a good response. However, candidates who define the behavior change early are more likely to pick a good metric on the first try. Candidates who jump straight into the number of downloads are more likely to get distracted by details like “do we care more about TV or movie downloads” and “what percentage of a program needs to be viewed for it to count?”
Not being able to refocus: “Well, if this isn’t directly impacting churn, then we should find something else to work on.” The challenge is, with mature products, most changes that directly impact the bottom line have already been made.

Part 4: How do we measure this behavior change? (5–15 minutes)

Here’s where you get specific with the exact definition of your metric(s). I ask candidates to focus on the first 30 days after the launch.

*I want to acknowledge that this section can have a lot of overlap with Part 3, and depending on how you answer part 3, you may already have covered this content. However, candidates who dive right into this section with an answer like “more monthly active users” without defining the broader behavior change are more likely to pick a less helpful metric. They usually realize this and change course to something better but not before they’ve already used up the first 10 minutes of the interview. Defining higher-level goals at the outset can help you avoid doing this.

The most common issue I see here is that candidates struggle to express a metric in measurable terms.

The book Superforcasting does a great job defining what it means for a prediction to be “measurable.” Without a precise definition of the behavior, you can’t measure it, which means you don’t have a metric. This article covers some highlights if you don’t have time to read (or listen!) to the book.

Common not-so-great metrics

People use the feature

This is a good start, but by this definition, if two users each downloaded one movie but never use the feature again, you can still say, “people used the feature.” Even saying “30-day active users of the feature should be high” is not specific enough without a definition of “use” and “high.”

30-day active users of Netflix will increase

See part 3; similar to “decrease churn,” the active user count is unlikely to be moved by a feature like this within the first 30 days.

Users watch at least half of the content they download

On the plus side, this is technically measurable! But that doesn’t automatically make it a great answer.
This might be a metric of interest, but it’s not the clearest signal of the value of a feature. If users downloaded a lot of content to watch later and only watched ⅓ of it, but they were still watching a lot more content overall, how is the feature a failure?

Users watch at least one movie offline or 2 episodes of a TV show

At least this is also technically measurable!
However, getting this specific isn’t helpful and makes things very complicated going forward. What if someone starts two movies and doesn’t finish either? Candidates who go in this direction usually spend way too much time thinking about specifics like “does it count if they finished ⅔ of the movie, or if they watched the first 10 minutes of a lot of TV shows?” If total viewing is what you care about, just go with some variation on watch time (see more below).

Better metrics:

Average offline viewing hours per user per month

Calculated by — [total minutes viewed offline]/[monthly active users]
In the next section, you might say, this should be at least X minutes
This is pretty much guaranteed to go up, so this metric only becomes meaningful once you attach a specific number/goal for the increase in the next section.

X% of Netflix monthly active users watch at least Y hours of offline content per month

If you go this route, you would expect to pull in your numbers from the opportunity sizing exercise and extra watch time these users (ex: commuters) might have to devote to Netflix.
This one also might be a little easier to estimate without knowing the average monthly watch time of Netflix before launching this feature.

Time or % increase to monthly total viewing hours contributed by offline hours.

If you chose this one, be prepared to explain why more offline viewing but no increase in total time is a success or failure.

Contribution to “active days on Netflix” in a month

To use this metric, you might need to establish a threshold for an “active day,” ex: viewing content for at least 2 minutes.
To see the impact of offline viewing, you’d want to look at the share of days in which offline viewing was responsible for the additional active day.

For the sample answer, let’s go with X% of Netflix monthly active users watch at least Y hours of offline content per month.

Part 5: How much of a change makes this investment worthwhile? (5–10 minutes)

This is often the part of analytical interviews candidates struggle with the most — picking a specific number that indicates this change was meaningful.

Continuing with the commuter sample answer, where we’ve calculated each commuter has ~20 hours a month which can now be spent watching Netflix, what is our threshold for meaningful use? (measured in the first month since launch)

There are usually two specific numbers I ask for in this question

What % of target users do we expect to use the feature during the first month?
How much will they use it? (how many monthly offline watch hours?)

Part 1 — what % of target users will use the feature in the first month?

There are many possible good answers here. Just be sure to give some kind of justification. As an interviewer, I want to know that you are comfortable estimating numbers with limited data. At an actual company, you would have access to a lot more data, but it’s rare for a PM to have ALL the data they’d ever want, and you can still expect to use estimation periodically.

Remember from before, we calculated that 4 million of the 100 million Netflix users were potential target users.

Example estimates:

50% — I’m going to estimate that about half of target users have longer commutes over 30 minutes. These will be most of our early adopters. The other 50% with shorter commutes or frequent transfers might prefer to check email or play games so they don’t accidentally miss their stop.
10% — It’s going to take time for people to start thinking of Netflix on their commutes. I think it will take more time for everyone to remember to try it, even with in-app promotions, because it’s such a new behavior. I also don’t think everyone who commutes will want to watch TV on the commute, some people are expected to be online working during that time.

TIP: In practice, at larger companies, adoption of new features unrelated to the core use case tends to be smaller at launch (ex: less than 10%, sometimes a lot less). Candidates are often influenced by past product experiences, so I’ve seen a range of numbers here. Be sure to share your thought process so the interviewer can see what’s informing your calculations. This might look like:

At my previous company, we only considered a feature successful if we saw at least 20% uplift
On my current team, anything above 0.5% is seen as a win

Part 2 — how much use = success?

The most common not-so-great answers:

Any increase in offline watch hours
Any increase in total watch hours [from candidates who chose general watch hours]

Of course, we want watch hours (offline or regular) to increase! But we’re starting from zero offline hours, so even minimal usage will cause an increase. I suspect the request for real numbers might be uncommon outside of Google since candidates were often surprised when asked for a specific value (ex: 2% increase, 1 hour a month per user, etc.)

The goal of this part of the question is to pick a threshold for “meaningful use”, and make a case for this number. Again, there are many ways to come up with a good number. What matters is that you explain your reasoning and argue why this is the right way to think about the problem.

Sample answers and justifications

A minimum of 5 minutes a month — while this is a small number, as long as the value is statistically significant, we are increasing engagement, which is likely good for retention.
A minimum of 10 hours a month — if users aren’t watching Netflix spread out over at least 10 days of the month. If users aren’t using the feature twice a week, this isn’t enough to form a sticky habit that will prevent churn in the long run.
2 hours a month — this averages out to about 30 minutes a week, essentially an extra TV episode. One new action a week could be enough to establish a new routine (ex: watching your show every Friday morning on your train ride).

Again, there are no right or wrong answers here. Just pick something and come up with an argument. It’s always okay to change your answer later if the interviewer gives you new information.

Common mistake

Candidates use “minimum offline watch time” and “average watch time across all users” interchangeably. Both can be great in this context, but be consistent and remember the implications of your choice. (ex: averages can be skewed by a few huge numbers, using minimum means there may be lots of uncounted users who are just below your threshold)
Not being able to come up with a specific number. Ex: “Isn’t any improvement good?” Unfortunately being able to come up with estimate numbers with limited data is often an important part of a PM’s role. If it helps, know that no one will be holding you accountable for these made up numbers!

Summary

So to summarize what an answer might look like, let’s pick a couple of the values from above.

Key success metric — % of Netflix users who watch at least 2 hours a month

At least 10% of target users (commuters, estimated to be 4 million monthly active users) watch at least 2 hours a month of offline content.

10% of 4 million users = 400K users will watch at least 2 hours a month of offline content.

Note: In retrospect, this looks too high to me, I would probably lower the minimum offline watch time to 30 or 60 minutes from 2 hours. However in the context of an analytical interview, I think it’s a reasonable answer.

While it’s true that if you were launching this as a real feature you’d definitely have other metrics you’d want to track (ex: the search, download, and install funnel) but the goal of this exercise is for the candidate to have to pick the one metric that best represents the impact of the feature.

[if there is time] What if the numbers look nothing like your prediction? [3–5 minutes]

If there is time, once a candidate has given sample numbers, I come back with some very different numbers and ask what might be happening.

Candidate: 50% of users watch at least 2 hours per month

Interviewer: What if you saw 2% of users were watching an average of 30 minutes a month?

Again, there are lots of possible good answers here. I just want to see that you can revisit your assumptions and troubleshoot a surprise.

Good answers tend to include at least a couple of these:

Is there an issue with discoverability? How many users view a screen with this action button? How many users who download something go back to view it? Do we have any indication users are looking for downloaded content but cannot find it?
What percentage of content is downloadable? What content is most popular generally? How much of the popular content can be downloaded?
How much of the downloadable content is commute-appropriate? If many users already watch movies in spurts, then both movies and TV shows are relevant here. If most users watch content only from start to finish, then only shorter TV shows are suitable for most commute windows.
[related to discoverability, but a little broader] What is happening in our usage funnel? How many people initiate a download, start to view downloaded content, stop viewing the content? What kinds of users are dropping off where? How much time passes in-between steps?
Are there any reliability or performance issues after downloading when they try to view?

You may also want to revisit previous assumptions & estimates (if everything looked good with anything mentioned above), such as:

Fewer users want to watch Netflix on their commute
The number of commuters may have been overestimated
Commuters may have been the wrong target group (ex: they can already watch streaming content and aren’t worried about data)

A not-so-great answer

Let’s send push notifications to everyone so they know about this feature!

Liberal use of push notifications might be fine at other companies but might raise eyebrows at Google. During my tenure, employees were very cautious about how push notifications and any kind of outreach communication was used. Typically for something like this, a team would start with an in-app promo (assuming users were still regularly opening the app).

To conclude

A note on the structure

I used an outline (Part 1, 2, 3, etc.) to make it easier to browse this post and organize the categories of common mistakes. This structure is by no means an industry standard, and is not expected from candidates. However, defining goals/success early on in the interview is a best practice for all types of product management interviews.

My answer

The sample answer here is far from perfect, and I believe the best PM analytical questions have many great answers.

With an analytical question like this one, you could easily spend the entire day digging into the many nuances and considerations. There’s only so much you can cover in 45 minutes, so there are bound to be simplifications and forgotten edge cases. That’s okay. It’s much better to skim some of the details and reach a conclusion than to spend too much time on the details that you miss a conclusion.

Further practice

Test yourself by thinking about success metrics with other combinations of launching X feature in Y app. Write out your response and some real numbers.

I also recommend checking out my post General Tips for PM Interviews. If you have any other questions about PM life and the PM job hunt feel free to message me on LinkedIn. Best of luck!