Data Science For Product Managers (DS4PM)
Success of a data science project has as much to do with product management as with data science. I see this in my work. It comes up in conversations with other practitioners. Yet most articles and talks about data science leave the product management topics out.
First Kara spoke to a panel of data scientists — Zahra Ferdowsi of Snapchat, Noelle Sio of Pivotal Labs and Rachel Wang of TrueCar, and Diana Villanueva — a product manager who works with data scientists at Pivotal Labs. Then I gave my talk — Data Science for Product Managers.
This post is a mix of ideas from my talk and questions that came up during the panel and the conversation that followed.
The audience was product managers (mostly for mobile products) whose companies are not primarily data science/machine learning/AI companies, but who see opportunities for improving their products using data.
A lot of attention is on ‘AI-first’ companies, where data science IS THE product, PhDs are abundant, challenges are huge, and the promises are to remake, disrupt, and revolutionize some area of human activity.
Some of these promises will come true, some will flop, but in the meantime, data science will (much more quietly) make its way into thousands of products and make them slightly less dumb.
For this to happen data science must graduate from “dark arts” to another another tool and discipline.
Product managers have a mental model and a shared language when it comes to various types of projects, such as developing a mobile app, updating look and feel of a website, revamping on-boarding experience, scaling infrastructure, etc. This is not yet true for data science projects.
It is useful to talk separately about two kinds of data science — A (for “analyze”) and B (for “build) .
“A” helps to inform the process of product development with data. Product managers adopted this practice widely and developed a structured way to think about it and a shared language to talk about it. Instrumenting products for analytics, running A/B tests, analyzing churn, doing cohort and funnel analysis became part of the product management culture.
“B” is about building smarts into the product itself. It’s this kind of projects where product managers could use more of a playbook. There is no playbook for data science, but at least a mental model would be welcome — how do you start a project, what are the steps, when are you done, what’s a reasonable time to achieve something, what’s reasonable to expect, what dangers to look for, how to explain and create the right expectations with CEO, partners, and customers.
A framework for making products incrementally smarter is finding the ‘Don’t you know me?’ moments of user frustration and converting them to ‘You get me!’ moments. Smart products get smarter with every interaction. Even if a product doesn’t know exactly user’s next step, it should reduce the search space.
Implementing some of smarts requires machine learning and it is important for product managers to understand the level of complexity of some techniques that apply to their products. However, when machine learning is discussed, too much emphasis is put on the algorithms. Just like a product manager does not have to be developer to manage a software product, she does not have to be a mathematician or a data scientist to manage a data product, but it is necessary to understand some core concepts.
When products get smarter, they “go probabilistic” — introduce behavior that is changing based on the context and what they think the user needs in this context. This creates new challenges. Thinking though how a smart product gains humans’ trust and make them feel good about using it is exactly product managers’ job.
To make the issues more concrete, I used two data products to demonstrate a few concepts that keep coming up. One project was using common food pairings to make meal logging easier in the Jawbone UP app, and was led by by Emi Nomura. The other was Predictive Routing at Directly — a system that chooses which customer support tickets can be resolved successfully by expert users.
Low Hanging Fruits
Product managers are used to asking what goals of a project can be achieved with relatively simple techniques — “low hanging fruit”. What fruits do hang low in data science? Monica Rogati’s quote “My favorite data science is division” is a great example of a mental model — try a simple division-like algorithm first before going for a much more advanced algorithm. Simple can get you pretty far.
Returns you get from cleaning your data can beat those you get from selecting a better algorithm.
The plot of the movie Big Short can be summarized as “guys clean a dataset, get rich”.
Not paying attention to where the data you are using have been leads to cascading problems that can fail the whole project. Cleaning your data requires a good understanding of the domain you are working with. Product managers and data scientists must collaborate on this early in a project. Which properties of your data you do and don’t use is to a significant degree a product management decision.
Evaluating Data Products
When “normal software” breaks, it breaks with high visibility. An issue with machine learning is that it will always give you an answer — as long as some numbers go in, some numbers will come out.
How do we know if a model is good? How we compare models? What metrics we use?
An obvious first choice is accuracy — percentage of decisions that the algorithm gets right. However, depending on your data, this can be a very bad metric.
This depends on how balanced or unbalanced the classes that you are predicting are.
Fraud detection is a classic example. If 0.1% of transactions are fraudulent, you can create a predictive model with near-perfect accuracy of 99.9% (and zero utility) without much work. When asked “Is this transaction fraudulent?”, it will always say “no”.
Deciding which metrics to use is yet again product manager’s job. The choice depends not only on the properties of the dataset, but also on the goal of the project, e.g., the relative cost of false positive vs. false negative errors.
Knowing how responsibilities are shared between product managers, data scientists and engineers helps product managers focus on the right areas when they learn data science. In this case they do not have to understand the internal details of various models, but can focus on the evaluation metrics.
“Why did you show me this picture?”, “Why did you suggest this restaurant to me?”, “Why did you decide that this transaction is fraudulent?”, “Why did you decide that this customer support ticket is resolvable?” — these are the kind of questions users and stake holders ask about data products. Sometimes these questions are possible to answer, sometimes they are not. In general, the simpler the model, the more interpretable it is.
When a model’s output cannot not be easily interpreted, but the model performs well, it is up to product manager to manage expectations.
Machine Learning in a Box
The last topic I had time to cover in the talk comes up a lot when product managers are planning to deploy data products in production. Many companies, both established ones and startups, are offering MLaaS — Machine Learning as a Service as an alternative to implementing your own data infrastructure and logic. The promise of these offerings is very tempting, but the deciding whether to use them can be a minefield for product managers. This is where understanding of the previously discussed topics helps. What parts (if any) of my data product can be represented in a generic way and outsourced to this kind of machine learning in a box?
Clearly, data science for product managers cannot be covered in a short talk. Here are my main impressions from the audience’s questions and my conversations after the talk:
- Product managers are extremely interested in making their products smarter using data.
- Product managers want a mental model for the process of developing data products and for working with data scientists and engineers.
- A simple playbook is impossible, but there are some guidelines that product managers can follow.
- If you are a data scientist, help the product managers you are working with to develop a mental model and a shared language for working with you.
- If you are a product manager, talk to your data scientists, ask for help, invest in understanding the process of developing data products.
— — — — — — —
Eugene leads Data Science at Directly. If you are a product manager or data scientist and want to talk about the topics in this post, ping him on Twitter @eugmandel — he loves this stuff! :)