Rise of the Data Product Manager

Over the next several years, there will be a new kind of product manager in demand — the data product manager. I have previously argued that good data scientists make good product managers, but this is incomplete. Data is at the core of product development; not only in the post-hoc analysis of usage metrics and A/B tests. The continual intake and exhaust of data is determining how products behave and what new classes of products are possible. Machine learning models automatically adapt products to users’ preferences, make recommendations about next actions, and suggest future features and products. Data product managers understand this and incorporate it into their products.

Working with data at the core of a product requires a level of understanding of data modeling, data infrastructure, and statistical and machine learning. It goes beyond understanding the results of experiments and reading dashboards — it requires a deep appreciation for what is possible and what will soon be possible by taking full advantage of the flow of data. If the traditional PM operates at the intersection of business, engineering, and user experience, the data PM must also have domain knowledge of data and data science.

[Insert terrible six-circle Venn diagram here that marries Drew Conway’s data science Venn diagram and the traditional PM diagram.]

The data PM understands that building products with data requires a data strategy — what is your plan for how data will be generated, collected, and consumed, and how does this uniquely position you to win in your market? It’s not enough to collect data and stash it in a data warehouse for post hoc analysis. The data PM has a plan for why the data generated by the product will be used to improve the product, algorithmically or otherwise, over time and why this creates a defensible moat to increase the product’s chances of long-term success. In other words, the data PM makes product decisions that get the flywheel moving with data.

The data PM understands the technological infrastructure involved in building products at a technical level. What kind of infrastructure is needed to support the product? Do machine learning models need to be scored in real-time or can they be prescored offline? What is the plan for retraining models on new data? How will the model’s success be evaluated over time? What is the complexity cost for implementing the model in production? Yes, data scientists will be answering these questions as well, but the data PM needs to be an active participant in these discussions as part of the inevitable tradeoffs involved in product development.

The data PM understands that collecting data and using data are two different parts of building with data with different tradeoffs and often involving different parts of the engineering team. They help drive the product development process so that the handoff between these two processes is seamless and helps both sides succeed at their jobs.

The data PM understands that machine learning is useful for a lot of problems, but knows when a heuristic model may be more appropriate. When it’s not clear, they timebox exploration to see which approach may be more useful. They also know that there may come a time when a switch from heuristic model to machine learning model will be appropriate and plan ahead for this contingency.

The data PM can perform their own analyses — they can write their own SQL, build their own dashboards, interpret their own experiments. They are skeptical of anyone who claims “the data speaks for itself” since they know this is never true. People speak for data and knowing how the analysis was conducted is just as important as having the results. Data PMs aren’t slavishly “data driven” decision-makers that blindly make calls based on single numbers. They’re appropriately skeptical producers and consumers of data.

The data PM can translate requirements between data scientists, engineers, designers, marketers, and other PMs. They build product instrumentation and data storage into their acceptance criteria while collaborating with data scientists to ensure that data will be accessible and usable for analysis and modeling as soon as possible. They don’t leave it to engineers who are not data scientists to make assumptions about what kinds of data will be valuable for data scientists.

Finally, the data PM knows that data, models, and outputs aren’t enough — they still have to be product managers and bring these components back to the business model and their organization’s strategy. Machine learning models that don’t align with the business model will not only waste time and money for no reason, they undermine an organization’s trust in machine learning. This is especially true in companies that are late to data science, have skepticism about the power of data science, or are very qualitative in their leadership.

For these reasons, I still believe that good data scientists make good product managers, but it seems clear that a new kind of product manager is just on the horizon.