Role of a Data Scientist

Sequoia
Sequoia Capital Publication
11 min readApr 17, 2019

Building a data-informed product requires a high-functioning data organization, which in turn requires highly effective data professionals. These are the people who will build and shape the company’s data culture, which informs product strategy and shapes key decisions across the company.

The composition of a data organization varies based on the maturity of the company and data team’s mandate, but it generally includes some combination of data engineers, infrastructure architects, machine-learning engineers, product analysts, and data scientists.

In this post, we will cover the different roles that data scientists fill and the skills required to do each job well. We will also cover some of the common pitfalls and myths associated with hiring data scientists. We will cover other roles, such as data engineers, and the skill sets they require in future posts.

THE FIVE CORE SKILLS

There are five core skills and abilities that all good data scientists need. They should be able to deconstruct and identify the components of complex business problems, they should have the technical skills to extract and manipulate data, they must possess an analytical ability that enables them to extract value from the data, and finally, they must be able to clearly synthesize and communicate the results of their analysis. Here’s how these core skills are connected:

Let’s take a closer look at each skill.

  1. Problem formulation. Data scientists must be able to formulate and structure problems. This requires deconstructing complex business problems into their constituent pieces by asking the right business questions. Much of the process behind asking questions requires curiosity, which leads to hypothesis generation. Next, these business questions can be posed as a set of technical problems.
  2. Technical ability. Once business questions have been posed as technical problems, technical skills like coding, statistics and quantitative abilities are needed to extract data. This process may be iterative as some of the questions asked are not answerable due to multiple reasons including unavailability of data.
  3. Analytical ability. Once all the data is in place, data scientists need analytical skills to extract and manipulate data sets, and to extract value from the data in the form of tables, charts, etc.
  4. Synthesis. Although outputs from data analyses are numbers and figures, data scientists need to connect all of the information their analysis has produced back to their original problem formulation questions. They need to interpret the results, simplify and synthesize. The output at this stage is simplifying to the fewest images, tables, and numbers.
  5. Influence. Connecting the business problems to specific actionable insights (decisions) and influencing these decisions by storytelling is important for creating impact. Telling a compelling story can be oral or written or a combination of the two. Being able to tell the entire story succinctly (one-pager); notate what really matters (an executive summary); and clearly articulating the outcomes rather than inputs are all important skills needed.

The skills required for the generalist will suffice for most types of problems. However, there are specific types of analytical problem that may require some specialization. Even for the specialists, the role and responsibilities is a relatively small change from the generalist and in most cases the emphasis of some skills over others is all one may need.

DATA SCIENTIST ROLES AND RESPONSIBILITIES

The role of a data scientist depends on the type and maturity of the product they work on. In the early stages of product development, all data scientists have similar functions, and they are primarily focused on setting up the computational and analytical infrastructure. As the product evolves, data scientists’ roles change depending on the needs of the product team.

Generally speaking, data scientists fall into six categories:

  1. Product generalists who are generic problem solvers working across product issues you may encounter
  2. Early product analyst to determine product market fit for a nascent product
  3. Growth analyst to move a metric
  4. Core marketplace analyst to ensure the healthy liquidity on your platform
  5. Ecosystem analyst to identify competitive threats and strategic opportunities
  6. Machine-learning analyst to ensure healthy operation of the algorithms that power your product

Product Generalist

Unsurprisingly, product generalists are the most frequently hired data scientists because their broad skill sets enable them to take on a wide range of functions and problems. The primary focus of product generalists is to inform, influence, support, and execute product decisions. At a high level, they help set goals, roadmap, and strategy for products, and execute on product operations. More specifically, product generalists do the following:

Define product success by determining and evaluating key metrics and goals for the product team.

Monitor product health by building dashboards and reports, understand root causes of changes in metrics, and propose courses of action. This includes:

  • Ensuring that the right metrics are tracked (e.g., measurement of users, messages, posts, purchases).
  • Building a robust infrastructure to support data analysis.
  • Ensuring data integrity by verifying that raw and derived fields are accurate so that metrics are correctly counted.
  • Monitor key performance indicators (KPIs) via dashboards or other reporting tools.
  • Diagnose issues and propose solutions, including with respect to setting targets, forecasting, and investigating anomalies, as well as understanding drivers of metric changes and diagnosing the underlying causes of those changes. (Are they behavioral changes, mix shift, data issues, product changes or related to seasonality?)

Design, evaluate, and ship experiments and products.

  • Work closely with marketing, design, product, and engineering teams to design the right experiments and quantify the impact of existing product features and future changes (A/B testing), and then make recommendations based on the findings.
  • Work with the data engineering team to develop and implement new analytical tools and modules, and to scale analytics. Help build product roadmaps in partnership with the data engineering and data infrastructure teams.

Set product roadmaps and strategy

  • Build key data sets to empower exploratory analysis that helps set product roadmaps.
  • Run exploratory analyses (analyze and interpret trends or patterns to develop a thorough understanding of products, users, and acquisition channels) to uncover issues and new areas of opportunity, generate hypotheses, and prioritize product changes and improvements.
  • Influence product teams by presenting data-based recommendations.

Early Product Analyst

The primary focus of early product analysts is on identifying whether there is product market fit and if so, what the characteristics would be of the product’s users who love the product. The key to leveraging the expertise of early product analysts is to build the right infrastructure so that they can answer these questions in a scalable way.

Below are the roles and responsibilities of an early product analyst. Many overlap with those of product generalists. The key difference is that their focus is on defining and tracking the right metrics and ensuring data integrity rather than on setting goals and experimentation, which come at later stages of product development.

  • Monitor product health.
  • Define product success by setting the right KPIs for the product/business.
  • Identify whether there is early product market fit through exploratory data analysis.
  • Help drive the early product roadmap by building a persona of the ideal product user for the product team. Deeply understand their characteristics through behavioral analysis. Generally, much of the analysis at this stage is bottom-up rather than top-down since there is far less data to perform top-down user segmentation.

An early product analyst should have enough technical proficiency to understand the basics of data pipelines, storage, and software engineering. Some also strive to automate their analyses and data pipelines, creating enduring value from their work. The impact of even the most technically proficient early product analyst is blunted, however, without certain non-technical skills, including the ability to ask the right questions in the context of the product, and the ability to tie analytical results to actions — delivering not just interesting but useful insights. This is where the early product analyst’s skills converge with those of the product generalist.

Growth Analyst

The primary focus of growth analysts is to move metrics. These metrics may measure data around users, developers, payers, advertisers, content creators, or anything valuable for the business. Ultimately, this is done by deeply understanding any phenomena, uncovering issues and opportunities related to the problem space, identifying key drivers of the issues, and recommending improvements. Specifically, a growth analyst needs to:

  • Define product success and monitor the health of a product, including identifying and tracking the right metrics as well as building growth accounting funnels to understand conversion and opportunities.
  • Set product goals and roadmap, and optimize the product in line with both.
  • Build a data-informed culture of growth culture within the company.

A growth analyst must be highly quantitative and have their fingers on the pulse of the business by deeply understanding the drivers of changes in the business. They also need strong growth marketing mindsets, the ability to run exploratory analysis and identify roadmaps, and an iterative approach that allows them to continuously make small improvements that compound over time. A strong growth analyst should also have knowledge of statistics and experimental design since good growth teams have a strong test-and-learn philosophy.

Machine-Learning Analyst

The primary role of a machine-learning analyst is to identify opportunities to improve products through machine learning. Their primary role is not to build models, but instead to monitor it’s health and suggest recommendations. They do this by identifying root causes and suggesting areas for improvement, including data quality, adding new features, improving algorithms, and determining the right objective functions and tradeoffs. Specifically

  • Define success by proposing the right objective function. With the wrong objective function, one would not be able to truly reach success in the product.
  • Monitor the health of a model by identifying and tracking the right metrics, and building frameworks to conduct a root cause analysis. A mutually exclusive collectively exhaustive (MECE) framework for conducting gap analysis on model performance (measuring the difference between reality and model expectations) is valuable. The drivers of changes in model performance can be determined and connected to their root cause, which are generally data quality, operational efficiency, algorithms, and feature engineering.
  • Set goals and product roadmap is often set by identifying opportunities from the root cause analysis framework.
  • Improve decisions by building explainability and determining the right tradeoffs. Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is valuable for transparency, improving the predictions and making business decisions. One of the business decisions are tradeoffs typically between exploring and exploiting or between changing recall by altering precision. Machine-learning analysts must have a principled approach and build explainability to determining which tradeoffs are necessary to scale a product/company.

In addition to all of the skills required for a product generalist, a machine-learning analyst needs to be good at statistics, machine learning, coming up with frameworks, root cause analysis, and optimization.

Marketplace Analyst

The primary role of marketplace analysts is to maximize the value of the marketplace by improving its efficiency. Many consumer technology companies can be thought of as two-sided marketplaces. These create value primarily by enabling direct interactions between two (or more) distinct types of affiliated customers. Many products, including PayPal, eBay, Uber, and YouTube, are two-sided marketplaces.

The marketplace problem can be simplified to three parts — supply, demand and liquidity. Marketplace analysts focus on identifying opportunities by understanding the drivers of each. The marketplace team would require three types of analysts.

  • Growth analyst — Supply and demand can each individually be posed as growth problems. For example, there is a metric on the supply side (say, number of drivers) and another on the demand side (number of riders) that a growth analyst would try to move.
  • Machine learning analyst — The machine learning team would also need to optimize routing using matching algorithms. Thus, a marketplace team would also consist of machine-learning analysts.
  • Core marketplace analyst — The marketplace team would also need a core marketplace analyst whose role is to understand the interactions between the supply and demand sides of the marketplace and to improve the liquidity and efficiency of the marketplace overall.

The Core Marketplace analysts must be adept at the following:

  • Monitoring the health of product and define product success by identifying and tracking the right metric that connects the supply and demand, for example the sell-through metric and setting the right goals.
  • Set product roadmaps by understanding the utilization of supply/demand and determining areas where one or both are constrained or under-optimized.

On top of all of the skills that product generalists should have, a core marketplace analyst needs a deep understanding of economics (especially supply and demand), optimizations, network effects and marketplace dynamics.

Ecosystem Analysts

Ecosystem analysts help drive business and product strategy by analyzing market trends and educating product leaders on their product’s market landscapes. These market trends can be internal (e.g., how users of an internal product are embracing mobile more) or external (e.g., the effects of a competitor). Specifically, an ecosystem analyst:

  • Sets product roadmaps by providing key insights on market trends, customer behaviors, and competitive moves to help drive product roadmaps.
  • Drives product strategy by: 1)Building business cases for specific product initiatives based on a deep understanding of how different parts of the ecosystem interact. (For example, knowing that content production ultimately drives active usage could lead to a recommendation to focus on content production.) 2) Performing competitive monitoring and analysis and identifying key opportunities by performing market research and competitive landscape analysis and creating relevant benchmarks.
  • Drives business strategy by building the case for new business initiatives, with a focus on market, opportunity sizing, and product synergy. (For example, knowing that mobile usage in teens is increasing exponentially could lead to recommending a new mobile products for teens.)
  • Identifies potential M&A targets by constantly monitoring competition and areas of strategic interest.

Good ecosystem analysts have a deep understanding of the domain, an ability to communicate their insights effectively to multiple cross-functional partners, and the skills to perform market and competitive research.

TAKEAWAYS

  • The role of a data scientist is to leverage insights from data analysis to help drive product decisions.
  • There are six types of data scientists — product generalists, early product analyst, growth analyst, core marketplace analyst, ecosystem analyst and machine-learning analyst.
  • These data scientists have five core skills — Problem formulation, Technical ability, Analytical ability, Synthesis and Influence.

This work is a product of Sequoia Capital’s Data Science team. Chandra Narayanan, Hem Wadhar and Ahry Jeon wrote this post. See the full data science series here. Please email data-science@sequoiacap.com with questions, comments and other feedback.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by +443,678 people.

Subscribe to receive our top stories here.

--

--

Sequoia
Sequoia Capital Publication

From idea to IPO and beyond, we help the daring build legendary companies. Follow our publication for more Sequoia perspectives: https://seq.vc/Sequoia-pub