Duet AI generated this image of a research lab where product scientists work.

How to Science the Product — Part 2

7 min readJan 25, 2024

In the previous part of this series, I made a case for adopting science as a way to foster sustainable product development and innovation. In this part, I get closer to discussing how data science supports this process, the two primary flavors of data science, the optimal mode of operation for product data science teams, and making a good product data science hire.¹

Product Data Science as a Discipline

When we talk about “science” as a verb, it’s essential to understand who practices it in the realm of products. I call that Product Data Scientist. “Product Data Science,” at its core, is a craft — a discipline that employs the scientific method, particularly statistical techniques, to make sense of observed data and draw conclusions. These conclusions are vital for either validating or debunking hypotheses and guiding decision-making processes within the product development cycle.

Now, why do we specifically call the discipline “data science”? It’s a bit of a misnomer because the focus isn’t solely on studying the data itself. Rather, it’s about systematically examining the evidence and observations gathered from the domain of interest (i.e., product). These observations, termed as “data,” serve as the raw material for data scientists to apply their scientific methods of analysis, drawing meaningful insights and conclusions. Data scientist expertise lies in deciphering these observations and employing statistical tools and methodologies to make informed decisions that propel product development and innovation. Product data scientists serve as the bridge between raw data and actionable insights, driving the continuous improvement and evolution of products and services within an organization.

Two Primary Flavors of Data Science

In 2012, Harvard Business Review provocatively dubbed “Data Scientist” as “The Sexiest Job of the 21st Century.” This marked the beginning of a decade-long period filled with hype, buzzwords, and an evolving data science and analytics landscape.

As the field matured, it branched into specialized domains. These specializations include Fullstack, Product Analytics, AI/ML, ML Engineering, and more. Job descriptions predominantly gravitate toward two primary flavors: Product Data Science or Data Science, Product Analytics on the one hand, and Data Science, AI/ML on the other. ML Engineering, often seen as a sibling or sub-field, concentrates more on the engineering aspect of developing and deploying ML models into production environments.

This diversification illustrates the multidimensional nature of data science. Let’s delve into the detailed facets of these two primary flavors of data science: Product Data Science (Data Science, Product Analytics) and Data Science, AI/ML.

Product Data Science embodies a day-to-day focus on several key areas, predominantly revolving around close collaboration with cross-functional partners in product and engineering domains. This collaboration extends to designing intricate statistical analyses to address specific questions or test hypotheses within the product landscape. Given the prevalence of vast and digitized datasets, proficiency in data wrangling and processing is essential to prepare this data for analytical and statistical purposes. Furthermore, these professionals collaborate on building predictive models, typically delving into shallow learning techniques and conceptual proof-of-concepts ML models for broader deployment strategies.

Conversely, Data Science, AI/ML, leans more toward a concentrated focus on machine learning, often extending to deep learning methodologies. The core competencies here revolve around crafting versatile and efficient pipelines for model deployment, with an acute consideration for latency constraints — where every millisecond holds significance, particularly in online model contexts. Unlike their Product-focused counterparts, professionals in this domain may interact less with the day-to-day aspects of product and business functionalities, channeling their efforts more into the intricate world of machine learning and AI frameworks.

While both flavors share foundational elements such as statistical analysis, coding, and an understanding of machine learning concepts, their divergence lies in their primary focus areas and the intensity of their involvement in product development, engineering, and the application of predictive models. The graph below is a simplistic view of how each area’s primary focus and expertise define these two flavors.

Primary Focus and Expertise of the Two Flavors of Data Science

Outline of a Product Data Science Job

Product data scientists are champions of a culture steeped in scientific, data-driven decision-making. Their roles span the entire product life cycle, from ideation and development to launch, evaluation, and ongoing monitoring. To effectively navigate this landscape, they require a diverse analytical toolkit encompassing various crucial areas, including

Opportunity Analysis: Identifying the most impactful business opportunities based on collected evidence (e.g., client churn reduction).
Inferential and Causal Analysis: Establishing relationships between factors impacting outcomes and discerning causality (e.g., understanding factors leading to client churn).
Predictive Modeling: Leveraging insights from prior analyses to predict future scenarios (e.g., predicting clients at risk of churn and devising mitigation strategies).
Experimentation: Testing different ideas to enhance outcomes (e.g., testing different interventions to prevent client churn).
Diagnostic Analysis: Identifying the root causes behind anomalies or shifts in key performance indicators (e.g., detecting sudden increases in client churn and identifying the root cause).

Regarding the operational framework for data science teams, two main models exist:

1. Central Service Unit Model: This setup involves a team responding to data-related requests from different organizational segments. While it centralizes expertise, it may sometimes limit deep business acumen development within the data teams, akin to what’s witnessed in Business Intelligence units.

2. Embedded Model: In this matrix organization structure, each team operates within the broader data science organization but is embedded within specific product areas. This model fosters deep engagement with cross-functional partners (XFN) and nurtures robust business acumen within each product domain.

In my experience, the embedded model proves more effective. While central service units operate reactively and transactionally — responding to requests without sustained engagement — embedded teams establish a proactive, involved partnership. Data scientists in this setup are integral team members invested in the product’s success. They proactively contribute to decisions, owning the product alongside other team members. This approach enables data scientists to become true partners in the product’s journey. It ensures they live and breathe the product, significantly enhancing their effectiveness in driving data-informed decisions and outcomes. The caveat is that the embedded model usually works in more mature or larger organizations that can afford dedicated scientists in each product pod.

How to Recruit Product Data Scientists?

Recruiting for product data science roles can be a challenging yet rewarding endeavor. These positions demand a multifaceted skill set, making a well-rounded product data scientist a rare and highly sought-after professional in today’s competitive landscape.

To excel in this role, candidates should possess a blend of skills in two categories: technical core and cross-functional competencies. Cross-functional competencies include effective communication, collaboration, and the ability to distill the information for XFN consumption. Technical core competencies encompass:

1. Strong Product Sense: An intuitive understanding of product dynamics and the ability to align data insights with overarching product goals.

2. Deep Statistical Knowledge: Proficiency in statistical methodologies to derive meaningful insights and make informed decisions from data.

3. Solid Computational Skills: The capability to handle and manipulate large volumes of data efficiently, employing programming and computational tools effectively.

4. Understanding of Predictive Modeling and Machine Learning: Proficiency in developing and applying predictive models and machine learning algorithms.

Given the rarity of finding individuals with expertise across all these domains, a strategic approach to recruitment involves identifying candidates with a couple of solid core competencies (not necessarily all of the required skills), high motivation, and a propensity for continuous learning. Once onboarded, upskilling the team to bridge any skill gaps becomes crucial for success. For example, when hiring, focusing on candidates with strong problem-solving abilities, decent product sense, computational proficiency, and a solid understanding of basic statistics can serve as a decent foundation. This approach allows for a starting point in building a competent team with the potential to develop skills further through training and upskilling initiatives. See my other post on a Curriculum for Product Data Science for upskilling.

How to Practice Product Data Science?

Practicing product data science isn’t merely about utilizing tools or creating visually appealing dashboards; it’s about adopting the rigor and principles inherent in scientific practices. Adherence to these best practices becomes imperative to embody the essence of data science.

Reproducibility: reproducibility stands as a cornerstone. Adopting a script-based approach to data science ensures that processes are replicable and outcomes reproducible. This mitigates discrepancies and fosters confidence in the reliability of results (I do not call point-and-click dashboard building and Excel jujitsu Data Science).
Peer Review: embracing the concept of peer review amplifies the integrity of analyses. Code reviews, analysis reviews, and experiment reviews serve as crucial checkpoints, allowing for thorough scrutiny and validation of methodologies and findings. This collective scrutiny helps identify blind spots, errors, or biases, elevating the overall quality and trustworthiness of the work.
Falsification and Establishing Causality: Ideas should be testable, and in pursuing scientific integrity, establishing causality and fostering trustworthy experimentation take center stage. This involves rigorously testing hypotheses and, importantly, being prepared to falsify them if necessary. Robust experimental design, devoid of shortcuts like p-hacking, ensures the credibility and robustness of conclusions drawn from analyses.

Practicing product data science isn’t merely about creating visualizations or leveraging data tools; it’s about upholding the fundamental principles of science. It’s about ensuring that every step — from data processing to experimentation and analysis — adheres to these scientific best practices, thereby warranting the name “science.”

Notes:

[1] I used an LLM for copyediting to improve the grammar and readability of the post.