The Data Product Manager and his Experiments

Published in

Rapido Labs

9 min readJun 11, 2020

If you’ve followed my earlier post (from a long time ago), you will know how Vahanalytics got acquired by Rapido and we started setting up the Data team at. It’s been a year since then and I have had some time to collect my thoughts on what I have learnt in this one year: specifically, on how to be a Data Science Product Manager.

Onwards and upwards: finding a rocket-ship and an acquisition story

Data don’t lie: how to use data to your advantage when your Data Science Startup is about to be acquired.

towardsdatascience.com

My transition into Data Product Management happened not so smoothly at first. Upon our arrival into the Rapido tech team, we were tasked with setting up a Data Science team. The two of us founders from Vahanalytics split this up into Data Engineering (Someshwar) and Data Product (myself). The current Product team at Rapido was comprised of a bunch of brilliant PMs and Designers. I was de-facto added as the Data PM. I was the sole Data PM for the first 8 months, till the second member of the Data Product group joined me. Those initial months were the hardest, as all data requirements were fired at us, along with expectations of seeing some datamagik while from a Data lifecycle point of view, we were still crawling — we did not have a big data platform that anyone could query and retrieve stored data. Imagine the frustration of your internal users (like Business managers, PMs, analysts) when they cannot query a week’s transactional data without crashing the DBs! From there, to the state we are at now — with a state of the art platform capable of ingesting and cleaning all of our intense data streams as well as transactions in real time, powering our hubs of Jupyter Notebooks and interactive Metabase dashboards — we have really come a long way! And I have learned what it really means to be a Data Science Product Manager in that process.

So let me tell you who is a Data Science Product Manager and what does he do.

What is different about a Data Science Product Manager?

Before we delve into the details, let us try to understand how a Data Science Product Manager stands out from a regular Product Manager.

For starters, the regular Product Manager would focus on the product experience with the goal of shipping products that delight users. To achieve this, a regular Product Manager would strive to be very empathetic to their users. Data insights are usually a secondary parameter. They would primarily use data only to support product decisions and verify insights. As such, a Product Manager is not technically required to have data skills, so long as they can interpret graphs and tables. A Product Manager is supposed to “be the voice of the User”. He/she understands the pains or the needs of the User, marries it with Business goals and priorities and comes up with a product (or a process) which can be created to achieve this vision.

A Data Product Manager sits at the nexus of data, engineering and business

A Data Science Product Manager, on the other hand, predominantly makes decisions based on the data available through experiments. They have to go beyond understanding the results of experiments and reading dashboards — it requires a deep appreciation for what is possible and what will soon be possible by taking full advantage of the flow of data. This could be in a lot of different forms, such as:

Serving Data — identifying a common data request from business teams and building that as a dashboard
Optimising an existing process — realising that the manual process of verifying personal ID cards can be automated using existing Image Processing libraries
Introducing a new feature — understanding that users want an accurate delivery time on their orders and forecasting delivery times from historical data before the user places the order
Introducing a new internal metric — creating a metric to capture the volatility of supply personnel, which lets Operations teams take better on-ground decisions

And many more. All of these fall under the typical responsibilities of a Data PM.

Speaking of typical responsibilities, a Data Product Manager, like the any other PM, is still expected to build a roadmap, compile a backlog, present release plans, develop business cases, and act as an interface for the team with internal and external stakeholders.

If a traditional Product Manager operates at the intersection of business, engineering, and user experience, a Data Product Manager sits at the nexus of data, engineering and business.

To sum it up: To be a Data Product Manager, you need to be more than just Data Literate, but not necessarily a full-blown Data Expert.

From Experiments to Products

A Data Science Product Manager should, by dint of being involved in ‘Science’, focus on conceiving, setting-up, running and analysing as many experiments as possible.

However, this is often only an ideal dream, while reality is a lot greyer. Real-world experiments are, very often, expensive and fraught with peril. They have to be extremely carefully thought out and implemented and all possible assumptions have to be detailed before starting and accounted for. If any unforeseen and irreversible interference, whether external like Covid-19 or internal like human error leading to random intervention, happens, experiments may fail — leading to a significant loss of effort, investment and time.

In the best scenario, a question has to be posed first and data has to answer it. If this leads to, or theoretically proves, a Hypothesis, experiments have to be conducted to prove or disprove it. Very often, a single, well-thought question can lead to multiple hypotheses, forming the basis for an experiment, yielding very rich insights.

And not only that, but historical data and experiment data — both have to be analysed. Basis the findings from historical data and experimental A/B tests, algorithms/models have to be built and engineered into systems.

These systems are then integrated with front-facing products (whether internal or external). Additionally, the data collection/pipelines feeding the models, the output of the models, the various system integrations, and the functioning of the end product need to be tested. Only after all of this is in place, the release can be shipped. Congratulations: It all started with a question, and now you have in your hands, an actual product!

But that’s not all. In addition to the traditional post-deploy PM responsibilities, there are a few things that need to be factored for. There needs to be a plan to continuously test model performance and accuracy (Pro tip: all models deteriorate with time and new data, in the real world). Future iteration plans need to have this taken into account. You cannot feature pack while your model accuracy starts slipping!

There are a lot of steps to be traversed, and not all of them, necessarily, sequentially.

For example: without asking questions and seeing what the current data says, it is not advised to jump into experimentation. Ideally, experiments should give rise to more questions and more answers, some of which can be used to start building the models. These need to be necessarily sequential.

However, if the final product/feature is a must-have and there is an explicit need/ask for it and a clear idea of what it should look like — it might not be a bad idea to start thinking backward from a first version release.

A good starting point is to sketch out a lean MVP (minimum viable product) first, then work backward, treating each complicated dependency as a black box. Instead of trying to figure out the engineering of the system or the algorithms of the model it can be temporarily marked with a big black box and named MODEL inside it and move on.

This will help getting started on the non-blocking pathways and components almost right away.

Of course, this is subject to enough resources being available, (which according to us Product Managers is an absolute myth anyway!)

A typical roadmap for building a Data Product

Putting things into perspective, the timeline for building a Data product in User Retention (sample use case) would look like this:

Sprint 1 — Analysis work to answer three fundamental questions about the nature of the product users

Are they discount sensitive?
Are they price-sensitive?
Are power users more discount sensitive than normal users?

These are answered by individual analysis threads that can spawn more child questions and keep digging till self-evident truths are found.

Simultaneously, a Product thread to draw up a Product Doc (whatever format or process you follow) and work with designers for discount automation dashboard designs.

Sprint 2 — Experiment thread to A/B test on actual users where the insights from previous Analysis thread are being verified (considering that the results indicate that only Power Users are discount sensitive but not price sensitive). The Data Engineering team starts putting in place required datasets like the discount sensitivity of customers. Systems monitoring is also put in place and will be needed soon. Front end team starts working on the discount recommendation dashboard.

Sprint 3 — Experiment thread to continue A/B/N test on actual users to build on previous experiment results (considering that the results indicate that not all power users were discount sensitive, even though all of them were price insensitive). The Data Scientists’ insight is that the power users who are not adequately discount sensitive are the ones who have been acquired through referrals.

To better segment this, the Power Users are split by their channel of acquisition and checked for Discount Sensitivity. The hypothesis here is that Power Users from all other channels except Referrals, should be Discount Sensitive. The Data Engineering team builds endpoint APIs which the Front-end team will integrate with their dashboards. Datasets are filled with dummy data to enable testing.

Sprint 4 — Data Science thread where the experiment results are analyzed, and the insights fed into the User Discount Sensitivity model which buckets Users according to the amount of discount they should be shown. Backtests show that this model has an F1 Score of 75%, which is acceptable, and a rollout is Okay-ed.

Data Engineering and DS teams put the model in production as a service/cron job to update the datasets with the Users, their Acquisition Channels, and the ideal Discount ranges. The Discount Automation dashboard shows for each product line, each type of User (Power or not, Channel of Acquisition), and the recommended discount range. Based on this, marketing teams can see the recommendations and choose the final discount amount to show.

Sprint 5 — Here onwards, the Data Science and Data Engineering teams keep a regular check on model performance, data refresh rates, and other sanity checks.

Parallelise independent tasks wherever possible, by identifying them from your roadmap

Of course, this was just a happy path. Things can get unexpectedly complicated if one or more experiment threads fail due to unforeseen circumstances or some error. However, if everything works as expected, a lot of the complex tasks required to build a Data Science product from end to end can be parallelised to save time.

To sum everything up, yes, you can be a Data Product Manager as long as you can have (or can pick up) some understanding of Data Science/Statistics. You might find yourself in a new dimension, as Data becomes your new best friend but a very challenging one to work with. You might get to do real-world experiments in the truest sense of Data Science, but even if you don’t, there are more ways to gain insights and build products. Building Data products is no less challenging than the usual, just that the challenges are domain specific. A lot of tasks can be parallelized and you should strive to unblock as many simultaneous threads as possible.

And remember: Always be experimenting, always be learning

If you liked reading about the life and times of a Data Product Manager, and want to learn more about what it means to be a Data Product Manager at Rapido, feel free to reach out to me on LinkedIn. And if you are intrigued enough to explore something out of the box, check out this link for open roles in our Data Science & Engineering department: https://bit.ly/2V08LNc