Crash Course: Intro to Machine Learning for Product Managers

Ken Kehoe
The Startup
Published in
16 min readMar 16, 2020

--

My first brush with machine learning came in 2013 while I was working at an app marketing start-up as a Product Analyst. The team was composed of subject matter experts for each of our biggest media partners: Facebook, Twitter, and Google Adwords. Facebook was the powerhouse, Twitter was the hot new network that everyone wanted to try, and Adwords was… well, Adwords. Google had acquired a mobile ad network called “AdMob” a few years earlier and it was my responsibility to determine how to leverage it for paid app installs.

I had my work cut out for me — internally it was referred to as “Badwords” due to limited, generic targeting options, ineffective lookalike modeling, and optimization algorithms that drove zero traffic. The keyword-based targeting that made Adwords a juggernaut on web just didn’t translate to app installs. In short, Adwords was a dog, and my job was a thankless endeavor for the better part of a year. Then one day I glanced at my Adwords Tableau dashboard and saw something — conversion rates had more than doubled, and CPMs had actually decreased. I started looking at trends client-by-client, wondering if an outlier was throwing off the numbers, but nope — they were consistent. I sounded the alert to the campaign management teams — “Adwords doesn’t suck anymore! buy, buy, buy!” Then I reached out to my account rep at Google and asked what had happened. His answer? Google had thrown a couple dozen machine learning engineers at the problem for a few months, and they’d optimized the hell out of their ad delivery.

That’s when I became intrigued with machine learning. It’s one thing to read about it, but it’s another to see it harnessed to deliver real results, even from a distance. I took some data sciences courses, got my hands dirty with python and sci-kit-learn, and a few years later I found myself working alongside a group of world-class ML engineers as a Product Manager at TripAdvisor. Over the past 3 years I’ve learned how teams can leverage ML to create better consumer experiences and drive business outcomes.

As Product Managers we’re problem solvers first and foremost, so I’ve structured this as a hypothetical case study that addresses a series of realistic problem statements. Through the lens of the case study we’ll explore foundational concepts like supervised vs. unsupervised learning, precision / recall, classification, clustering, etc. The goal is to keep you (the PM) grounded in the practical application of an approach, rather than simply describing abstract concepts.

You’ll notice that most of the provided references are short articles and videos, rather than literature. This is intentional given the intended audience. Be aware that the footnotes of technical ML articles typically take the form of published literature, and as you gain experience you should get more familiar with literature review.

The PM’s role on a machine learning-driven project

When you first start working on a machine learning project, you may wonder how involved you should be in the technical execution of the roadmap. There’s no “right” answer to this and I’ve found that my role varies from project to project depending on the personalities and skill sets of the team I’m working with. That being said, I see the PM as being the steward of the what and the why (business case, problem statement, user story, etc.), while our peers in Engineering and Design should drive decision-making for the how (technical approach, UX design, etc.). If you have a strong team, delegation is key. With this in mind, here’s how a PM might fit into a typical ML workflow:

Step 1: Identifying the problem - Identify the business and product objectives and criteria for success.

  • PM Obligation: High. This is your key responsibility — make sure that the problem you’re trying to solve (problem statement), the reason you’re trying to solve it (business case) and the measures for success (KPIs) are crystal clear to the team.

Step 2: Gathering & cleaning the data — Identify the right data sets, verify their quality, and format / clean / combine as necessary.

  • PM Obligation: Medium. In order to identify the right data sets you’ll need to brainstorm predictive features for your target outcome. This is typically guided by domain knowledge, human insight, and common sense. PM and Eng benefit from partnering on this, but the engineer will do the heavy lifting.

Step 3: Feature engineering— Create necessary derived columns from the data and identify trends / outliers.

  • PM Obligation: Medium / Low. Related to the above, a model’s predictive accuracy can be improved by performing transformations on the data (deriving ratios, raising to a power, etc.). Think critically about whether these types of transformations make sense for your data. An experienced data scientist will ideally take the lead here, with your input.

Step 4: Building the data model— Select appropriate model, train it on the sample data, fine-tune for out-of-sample accuracy.

  • PM Obligation: Low. Model selection and optimization should be handled by the data scientist — ask questions if you’re curious.

Step 5: Testing and QA— Observe the model output / accuracy on out-of-sample data and refine as needed.

  • PM Obligation: High. As PM, you’re likely accountable for the success / failure of the project, so you must determine whether you can ship a model based on its behavior. Ask for a sample output sheet that demonstrates the model’s behavior in a variety of scenarios. If it isn’t doing what you want it to, sit down with the data scientist and highlight examples where the behavior isn’t acceptable, and ask them to explain how they will address these issues on the next pass.

Step 6: Launching & testing — Productionize the model and see if it actually works.

  • PM Obligation: High / Medium. Unless you have a counterpart in Analytics, it’s likely your responsibility to design a test plan, ensure proper tracking is in place, and analyze the results.

A few words of advice before we dive in:

Keep it simple!

Don’t let yourself become fixated on hip things like deep learning, neural networks, gradient boosted trees, etc. Odds are that something much simpler will work for what you’re trying to do (unless you’re trying to build something like a self-driving car). Keep it as simple as possible and don’t be surprised when your engineers do the same; many times less sexy approaches like logistic regression are the best option.

Machine learning isn’t always the right tool for the job

You need a lot of clean data to drive predictions. If you’re trying to optimize a page that gets little traffic or where the target behavior (click, purchase, etc.) is sparse, you’re going to have a tough time implementing a model-driven solution. In those cases, just A/B test some ideas and optimize.

SQL and data proficiency can help

There are many successful PMs who can’t write SQL queries but I’ve always found it immensely useful. In addition to enabling you to pull your own data, learning some basic SQL will help you understand data types and structures. Being able to conceptualize how data is transformed, organized, and stored will help you visualize the data pipelines that account for major components of the ML back-end. Learning SQL is a straightforward way to build this muscle.

Stats 101 is a good starting point

Probability and statistics form the foundation of many machine learning techniques and they’re also critical to running and interpreting A/B tests, so they’re required learning for most PMs. Being conversant on topics like the central limit theorem and causal inference will go a long way towards helping you talk shop with the engineers.

Resources: There are a ton of free intro to stats classes out there. Carnegie Mellon’s Probability & Statistics course is a great place to start.

Case study: Slingin’ boots and makin’ loot

So let’s say you’ve got an eye for trendy shoes and you think you can make some money by curating a few cool suppliers and setting up a little drop-ship marketplace. You build a website featuring a few shoes and your sales start to take off, so you expand your offering to a few thousand products in several categories — awesome! But your site is still relatively simple, featuring a static homepage, navigation bar for different product categories, list pages for each category, and detail pages for each product.

You’ve done some A/B testing to optimize CTR and conversion rates, but you’re hitting diminishing returns. You want to start implementing more sophisticated optimization and personalization techniques, but you’re not sure where to start. Let’s take a look at a few approaches…

Optimizing product detail pages

Problem statement: You’re getting a lot of traffic to your product detail pages through SEO and SEM, but you’re not converting enough traffic into purchasers.

Cross-sells

You might hypothesize that a high volume of users are visiting your site, deciding that a specific pair of shoes isn’t right for them, then bouncing back to Google — frustrating! They’re in the market for a new pair of shoes and you’re losing that customer. If only you could offer that user some more options they might be interested in, you could get them browsing and you’d have a better shot at converting them to a paying customer.

What you need here is a recommender that can intelligently suggest related items in order to cross-sell the user. One well-known way to do this is an approach called collaborative filtering. Netflix, Spotify, Amazon, and Tripadvisor all use collaborative filtering to drive some of their recommendations.

indatalabs.com

Collaborative filtering comes in a few varieties, but one method that’s worked well in my experience is item-based collaborative filtering, which is essentially a “people who liked this also liked that” recommender. This requires a data set of behaviors that overlap multiple products for the same users (i.e., views, saves, purchases, etc.). The higher intent the behavior, the more trustworthy the recommendation — i.e., we’d expect a recommendation based on purchases to be more effective than a recommendation based on page views. The trade-off here is sparsity; if you don’t have enough purchase data, you won’t have great coverage. If you have a behavior that’s higher-intent than page views but higher volume than purchases (such as a “save to shopping list” feature) then this can be a good compromise.

Resources:

Personalized product detail page treatments

Let’s say you estimate that 5% of users who visit the page have purchase intent, but your static page (“Treatment A”) only yields a 4% conversion rate. You hypothesize that by showing a different treatment (“Treatment B” — less cross-sells, bigger “Add to cart” button, etc.) to users who are highly likely to purchase the product, you can capture that extra 1% of purchases. One approach would be to develop a classifier that can identify users who are likely to buy.

There are a wide variety of approaches to classification, but any binary classification model can be evaluated based on precision and recall:

  • Precision is a measure of how accurate your classification is and is defined as [ (true positives) / (true positives + false positives)]
  • Recall is a measure of how many true positives you’re capturing with your classification and is defined as [ (true positives) / (true positives + false negatives) ]
wikipedia.org

In our example, if you simply showed Treatment B to all users who visited the page, your recall would be 100% and your precision would be 5%. If you showed Treatment A to all users who visited the page, your recall would be 0% and your precision would be 95%.

If you plot the model’s false positive rate on the y-axis against the true positive rate on the x-axis, you get what’s known as an ROC AUC Curve. The data scientist’s job is to maximize the area under the curve but it’s the PM’s responsibility to make sure the data scientist understands the tradeoffs between false positives and false negatives.

Alex Rogozhnikov

Let’s do a quick thought experiment to demonstrate these concepts: imagine that your website consists of a single niche shoe and it’s the only product you sell. In this scenario, the 95% of users who visit the page without purchase intent for the shoe don’t generate any revenue, so you don’t care about precision — you only care about maximizing recall. If that’s the case, you don’t need a predictive model at all because you stand nothing to gain by showing a different treatment to those 95% of users — you can simply show Treatment B to everyone and you’ll capture 100% of the users you’re after without losing out on any revenue.

Now let’s say you’re further along and your site has evolved into a robust e-commerce platform that sells thousands of shoes and related products (socks, bags, belts, etc.). Suppose the 95% of “non-purchasing” traffic on any given shoe page goes on to generate some revenue after engaging with the cross-sells we described earlier, and that they average about $10 per user on related products. If your 5% cohort of shoe-buyers generates $25 per user, then you have something to gain from teasing these cohorts apart:

Depending on the shape of the curve, there will be a point that maximizes net revenue. You and your data scientist can identify this point together using these inputs.

Resources:

Optimizing list pages

Problem statement: So now your detail pages are converting more traffic, but suppose you’re getting even more traffic to your list pages and you’ve got a 50%+ bounce rate on those pages.

Ranking items on list pages

You might hypothesize that your lists aren’t putting the most relevant results at the top. Maybe you’re using a basic heuristic like page views or revenue for ranking, but you’re worried that this becomes a self-fulfilling prophecy because the items at the top are getting a huge boost to traffic (“position bias”).

This is a ranking problem and can be addressed by learning to rank techniques. Rather than making a prediction for a single item (like classification or regression) a LTR approach aims to establish a relative ordering for a list of items. Popular LTR approaches include pointwise, pairwise, and listwise. I’ve seen pairwise models work well in practice. A pairwise LTR model will examine pairs of items and use classification or regression to determine the optimal order of each pair, then order the list such that the number of “inversions” (instances where 2 items are out of order according to their pairwise prediction) is minimized.

These are commonly trained on click data. In order to account for the position bias, clicks that occur lower on the page are effectively “worth” more than clicks that occur higher on the page.

Resources:

Better tagging and filters

Another hypothesis for the high bounce rate could be that users aren’t finding intuitive ways to filter the result set. Maybe someone is looking for a waterproof chukka, but you don’t have a “waterproof” filter on your page. This is another classification problem.

javatpoint.com

In order to create the “waterproof” tag you’ll need to establish a truth set for your classifier. One option could be to prompt users who purchased the product to answer whether it’s waterproof or not. Another could be to take incomplete supplier data and use it to expand tag coverage. You may aggregate these truth sets depending on how trustworthy you believe them to be.

Once you have a truth set you can use a number of different classification techniques to extend the tag to other products, but you’ll need to think critically about what features are likely to be predictive of the tag in question.

For example, waterproof shoes likely have a higher incidence of the words “water,” “rain,” “snow,” and “water-proof” in their review content. Your data scientist can use NLP techniques to engineer this feature as an input to the classifier. It’s also possible that certain brands are much more likely to have waterproof boots than others, so manufacturer might be a predictive feature for the classifier. Your classifier will only be effective if you find the right predictive features, so jump in and help brainstorm these.

Resources:

Optimizing product imagery

Problem statement: So you’ve made big improvements to your list and detail pages, but now you’re realizing that the thumbnail images for many products just aren’t that great and you notice there are often better photos buried in the photo albums. You might hypothesize that if you could choose the right photo for each product, then you’d improve CTR on list pages and conversion rates on detail pages.

This is a machine vision problem, and until the last few years the solutions weren’t straightforward. Thanks to the development of open-source neural networks like resnet50, however, image classification is now relatively easy and can be used to create rich feature representations for images at scale. Features might include things like “contains human T/F,” “high-resolution image T/F,” “Outdoor vs. indoor,” etc.

mathworks.com

Like any supervised learning approach, machine vision requires a truth set and there are a handful of options to consider. One is mechanical turk, where you can hire humans to label pairs of images by preference. You may need to spin up a web UI for the “turkers” to make their pairwise preference selections. You can then train a model on the labeling data to learn which features tend to be predictive of image preference and run that model on all of your site images. This raises the question of how to segment this labeling, since the features that are predictive of favorable photos for men’s boots might not be the same as those for, say, women’s high heels. You might need to execute two completely separate labeling exercises for these two product categories. Use your best judgment when making these segmentation decisions.

Once you train your model(s) you can run them on all of your site images to produce static quality scores and use these to select thumbnail photos and sort photo albums. I’ve seen strong results using this method in the past:

One of the challenges of this approach is that it introduces some bias into your training data; if the turker really likes outdoor action shots, then that’s what your model will favor.

You could take this approach a step further and implement something like a multi-armed bandit that will train the model directly on your target outcome, rather than on human-sourced labels. In our example, we’re looking to maximize clicks from the list page to the detail page, so a click is the target outcome. A multi-armed bandit functions like an automated A/B test, where different users are shown different photos and and the highest performing variant is automatically allocated more impressions:

conductrics.com

The multi-armed bandit approach requires a much higher level of investment in order to select the appropriate variants, track the target outcome and automatically allocate more impressions. If your site frequently receives updated images from users or suppliers, then you might consider setting up an ongoing bandit approach to make sure new photos have a chance to be evaluated and selected. Due to the complexity of such projects, it’s wise to run low-cost A/B tests to validate the opportunity of improving your site imagery before building a bandit-based machine vision solution.

Clustering & cohort analysis

Problem statement: So now you’ve grown your site to millions of users and you want to craft CRM and product strategies targeted to specific cohorts, but you don’t know how to group users together. Maybe you’ve done some qualitative research and attitudinal profiling to develop user segments, but you want to confirm these with real site data. Unfortunately, usage patterns are so varied that they defy easy categorization through descriptive statistics.

Machine learning can aid in cohort discovery and analysis through the use of exploratory analytics that leverage unsupervised learning methods (as opposed to the scalable supervised approaches described above). Unsupervised learning is generally used to discover patterns in data that aren’t easily visible while supervised learning uses a “truth set” to train a model to make predictions.

mathworks.com

For this example you might try clustering your users to “discover” the segments of your population via the behaviors and attributes you feel are important. For example, you might develop a feature set that includes things like “visit frequency,” “number of purchases,” “logged-in T/F,” “visited domain-direct T/F” etc. Once these features are in place you can use a variety of clustering techniques to tease out cohorts.

It’s important to keep in mind that these cohorts are only valuable if they’re actionable — if you can’t clearly conceive of how you would tailor messaging strategies or product offerings to the different cohorts, then they’re essentially useless. Keep this in mind when developing the feature set and partner closely with your data scientist to ensure the results are practical.

Resources:

And much more!

We’ve only scratched the surface of potential product applications for machine learning in this article. There’s an entire domain of Natural Language Processing that we barely touched upon (for example, how might you build a search function on your site?), and any of the topics we covered can be explored in much greater depth.

Take this as a starting point for understanding how machine learning can help accomplish your product goals and how you as a PM can fit into machine learning workflows. Remember, the best way to learn is through experience, so work with your team to test some approaches. Not every test is a winner, but with time and experience you can gain a sense of when to tackle a problem via machine learning.

--

--

Ken Kehoe
The Startup

Product @Klaviyo, formerly Product @Tripadvisor