#4: Roles, Skills and Org Structure for Machine Learning Product Teams

5 min readJul 25, 2017

This is part 4 of the 6-part tutorial, The Step-By-Step PM Guide to Building Machine Learning Based Products.

We discussed how to build a model from start to finish — now let’s talk about who actually does it and how the team structure can work.

One Big Happy Team

A “traditional” product team is comprised of designers, engineers and product managers, with a data analyst sometimes integrated into it but more often shared across multiple teams as a very scarce resource. In a company where data science is becoming part of the DNA, it is essential to make data scientists an integral part of the product team, rather than treating them as a separate entity. Developing models that have business impact requires designers, PMs and engineers to work with data scientists on a continuous, daily basis, the same way designers, PMs and engineers work together on the creation of “traditional” products.

Roles and Responsibilities in Model Development

We previously discussed the ML development process. Here we’ll focus on the makeup of the team and the responsibilities of the different functions in the process.

Ideation phase: At this stage you need experts with a deep understanding of the problem space, who can articulate the different factors that may influence choices / outcomes. For example, if you’re building home value estimates, you need real estate experts who know how to value a home and what factors affect it. Even if your data scientists happen to have experience in the space, it’s always a good idea to pull in (usually non-technical) business experts from other parts of the organization to get new ideas and sanity check your thinking.
Data preparation: This is often led by data scientists, with help from engineers to collect data, build scrapers, integrate APIs etc. Product / business people have to be very involved, to help with external data acquisition, data pulls through existing relationships etc.
Prototyping and testing: This is largely the work of the data scientists. Product / business people have to very tightly follow this phase, see results and help determine whether they make business sense or further iterations are required.
Productization: This is a combination of data science and engineering work. Tasks to support data collection at scale vary greatly depending on your data needs and sources. If you use external data you may need to collect data by building scrapers — which require front end knowledge, calling various APIs or ingesting data from various feeds and partners (the latter are largely backend tasks). There is also a need to productize and scale data cleanup and processing, which is also largely a backend task. Engineers also work jointly with data scientists to make sure the models scale and verify that the quality of results in production meets the requirements.
Overall system architecture: Making sure the overall system for your data and models supports your business needs requires engineers with experience in architecting and scaling complex distributed systems. The level of complexity obviously varies depending on what you’re trying to accomplish.

The Makeup of Data Science Teams

Data Science is a relatively new field — it cobbles together various existing fields in a new way. Until recently there hasn’t been a “data science” degree (even now it is far from ubiquitous), hence people gravitate to the space from a wide variety of related disciplines and backgrounds.

The key ones are statistics, computer science (with or without a focus on artificial intelligence), economics or econometrics.

A combination of different backgrounds and skills could be very useful on a team — they each bring something different to the table. Especially if you’re in an emerging area where a lot of new thinking is required, combining diverse backgrounds often results in different approaches to the same problem that generate more innovative solutions.

Data scientists vary in terms of dependence on the engineering team. Former engineers are often able to work end-to-end and get a model from prototype to deployment in production with no support, while others need more help from the engineering team. Depending on the availability and makeup of your engineering team you may need data scientists who are more or less independent. Another consideration is your problem space — for example, a background in econometrics may be more critical for a stock picking app than for a self driving car.

A Data Science Reporting Structure That Makes Sense

There are close ties between engineering, product and data science. Traditionally, the tendency has been to have data science report to engineering, however as the role of data science in organizations evolves, new structures are emerging. I’ve seen 3 main structures work well, each with its own benefits and drawbacks.

Option 1 — Data Science Reports to Engineering

Having data science report to engineering creates full alignment between the disciplines and doesn’t require a clear delineation between data science and engineering skills. Many engineers who work with data scientists become very curious about the discipline and look to develop their skills further — I’ve seen engineers become interested in areas that are more “puzzle” like, such as natural language processing, and others take ML classes to one day become full fledged data scientists. Fewer boundaries between the teams can help develop data scientists / engineers who can work end to end — both build models and productize their code. It is a bit easier to support career development and delay questions of reporting, performance evaluation etc. until after people have tried things out and decided if they want to make a transition.

This reporting structure also helps streamline the overall system, from the ML frameworks used by data science for prototyping, to the productized systems and architecture supported by the engineering team. It also helps ensure that the ML framework and architecture get the attention they deserves.

Option 2 — Data Science Reports to Product

Since product needs should be the drivers for data science projects, having data science report to product creates full alignment on goals and deliverables. Essentially, the head of product has reporting level visibility into all data science projects and activities and can help prioritize them and ensure they drive business outcomes. This also helps foster a tight ongoing collaboration between product and data science, which is essential. The prerequisite to this is a product person who understands how data science and product should work together and is committed to developing not just the product but also the data science underlying infrastructure.

Option 3 — Data Science Separate from Product and Engineering

This has the benefit of giving visibility to the data science team and making it more accessible to the entire organization. It allows the head of the data science team to gain more direct insight into the high level strategic decisions in the organization and ensure that the opinions and needs of all business stakeholders are taken into account.

There isn’t a “right answer” — it all depends on your organization, your goals and the strengths of your teams and team leads. As a rule of thumb, joint reporting usually results in better alignment between teams, given a single decision maker at the top. Think what areas are more prone to communication and collaboration breakdowns in your organization and consider having those teams report to the same executive.

Now that you have a super effective team, it’s time to go back to talking about the most important thing: Your users and their experience with your product.

I write about life purpose, mindset and creativity for professionals who want more from life than they’re experiencing at https://producthumans.com/