6 Reasons Why Product Managers Should Learn Data Science

Ruben Hambardzumyan
productspace
Published in
6 min readNov 26, 2019
Source

During the recent years, data science and machine learning became inseparable from software development, and this entanglement continues to strengthen every day. Data science opens the gates to harnessing insights and valuable knowledge from data that helps to shape the product and its services towards the better satisfaction of customer needs. In this post, I want to encourage aspiring and mature Product Managers to invest in learning data science (and, yes, even coding), and below are the reasons of why you, as a Product Manager, should invest in that. Though each of these topics could be a separate post, I’ve tried to sum up here what I think are the most important reasons to start learning data science now.

#1 Talking data with a Data Analyst

Have you, as a Product Manager, ever been in a situation where you can’t express what’s in your thoughts to a Data Analyst? Misunderstanding of when and how data is provided and collected in your product can lead to mistakes in communicating the business needs to the Analyst. As a result, the DA is unable to collect the proper data and provide the needed outcome of the analysis. This leads to back-and-forthing, consuming valuable time.

Speaking the language of data with a Data Analyst is a significant skill mastering which will save you lots of time. if you’re a Product Manager working in a data-intensive company who actually knows how data is collected, it becomes much easier to set proper business tasks for the Data Analyst. For example, if you need to have the amount of daily registered users for a given period of the last month, knowing that your final result is most probably a time series, your request would sound like this:

I need the time series of registered users over the last month with the moving average.

#2 Getting hands on data

In data-intensive companies, there’s always a need to analyze data to learn more about your product and users. Unfortunately, not always a data analyst is available for you at the moment, and you either have to wait or communicate with the data team to prioritize requests. That being said, if you’re a PM who is skilled in data analysis, all you need to do is request access to data and get your hands on it by querying it. Now, depending on how data is collected at your organization, you would need to use the appropriate querying language. The most widespread one is SQL (Structured Query Language) for querying data from relational databases.

If you know SQL, then having the data you need is a matter of minutes. All you have to do is form a question that you need to be asked and transform it into a query. For example, you have a situation where data on your customers’ purchases is stored in one table and data about their information and location is stored in another (both the tables are in a single relational database, and are related to each other by the ID of the customer). You need to get the list of orders that also contain the customer information. Your SQL query would look like this:

The query above would return a table with the columns that you’ve requested containing all the rows that match your given conditions (matching customer IDs). SQL is fun!

#3 Designing product analytics

When working on a feature or improving existing ones, as a Product Manager, you need not only to define the success but also understand what and how you want to track to tell the feature indeed was a success. You would need to design the architecture of events since you’re the one who knows the business domain of the product. Thus, you’re the one who knows what is needed from the product to understand how it performs.

Designing events and their triggering conditions requires an understanding of how the events are working and what data they generate. Inexperienced Product Managers overload their products with unnecessary events, causing fireworks in the clickstream (the stream of events). That leads to garbage being collected, and when analyzing the data from such a clickstream, what you get is also garbage (garbage in — garbage out). It is vital to design events that would scale with the product. Otherwise, you would lose data since your events architecture has no backward compatibility.

#4 Understanding biases and issues in data

There are biases in every dataset, and the bigger it is, the more are the biases it contains. There are several types of biases that you, as a Product Manager, should be aware of:

  • Product biases during data collection — intentionally or unintentionally emitting a flow from the clickstream
  • Technical biases during data collection — the clickstream contains buggy events
  • Selection biases during experimentation — the sample of the chosen cohort is not representative of the whole cohort
  • Modeling biases during analysis — biased data trains biased models (garbage in — garbage out)

To understand that the analysis is biased or is simply wrong, you have to know how to read data. For example, if you request (or, better, do it on your own) a data distribution of users of your product and you see a skewed distribution that indicates lots of observations of a value that you wouldn’t expect to see there, you should be skeptical about the analysis and start investigating the whole process of getting those results with the Data Analyst, since it probably contains some sort of a bias.

#5 Understanding machine learning and how to incorporate it in your product

This point alone requires dozens of posts, yet I decided to include it here, since machine learning is now an inseparable part of software development. Hence, Product Managers need to know how it works and where to use it, avoiding the media buzz of AI. Machine learning, deep learning, neural networks — you probably couldn’t tell the difference if you’re not into data science. Yet these are the techniques Data Scientists use to understand data on a deeper level.

For example, if you’re a Product Manager of a consumer-facing product, at some point, you’d need to understand your users. That would lead to the need of classifying the users around some parameters. But what if you don’t want to think about any of those parameters? What if you want to learn the parameters from the data that you have? Well, we’re speaking of clustering here — a technique of unsupervised classification.

And what if you want to predict whether or not 80% of your users will return next month? That is a task of prediction (a regression task, specifically, since you want to predict a numeric value). There are lots of ways to predict the amount of users that would return or, even better, to predict the churn of customers.

Naturally, these are the tasks that only an experienced Data Scientist should perform. That being said, if you’re a Product Manager that works with such a Data Scientist, you have to ensure that she has all the information about the product (domain expertise), business, and users.

#6 Visualizing needed data

I consider data visualization a form of art. Charts, images, plots, choropleths, chords, and dozens of other ways of presenting information are tools that a data-driven Product Manager would use to communicate with the Team. Learning how to visualize lots of information on a single piece of visual content is a crucial skill for a Product Manager. I strongly believe that knowing your data (and how it is collected) is the only way to know what and how you want to present to your team.

The The Data Visualisation Catalogue is one of the many websites that provides information about the different types of data visualization, and how and why to use those.

Skills you would need to learn

It may look hard at first, but the time you would invest in learning data science would make you a better data-driven Product Manager. Therefore, I’m providing below the minimum skillset that you need to have and sources from where you could learn them:

  • Python — this is one of the most used (and easy-to-learn) programming languages to work with data.
  • SQL — Structured Query Language designed to query data from the relational databases
  • Jupyter Lab — an IDE to write and run the Python code
  • Pandas — the most essential Python library to know when working with data
  • Brush up on your high school math and statistics
  • Machine learning course by Andrew Ng

--

--

Ruben Hambardzumyan
productspace

Ph.D, Entrepreneur, Product Manager, and Data Scientist focusing on AI-driven products and platforms. Co-founder and CEO of cerebrus.ai