Data science team sizing and allocation strategy
There are many ways to organize a data science team within a company. One of the most effective is the hybrid model, as I explain verbosely in a post and briefly in a thread:
Central management of a team is not without challenges, the most prominent being that of sizing and allocation. In this post, I propose an easy-to-follow procedure as a solution. For this solution to be effective, here are some conditions that we assume hold for every organization.
Assumption 1. Longterm ownership is valuable.
Data scientists are engineers, and just like engineers, they are able to produce quality and impactful results only when they have longterm ownership over the product and their work. Good products can only be created with care.
Assumption 2. The engineering teams and their size determine the company strategy and objectives.
The engineering teams are created in such a way as to be able to tackle the various strategic bets and existing value areas of the company. If this is not the case, the existing engineering teams are the unannounced strategic bets and value areas of the company.
Assumption 3. More engineers on a team lead to more moving parts, more experiments, and more meetings.
The bigger an engineering team, the more time that needs to be allocated to meetings and emails for coordination and planning. In addition, more engineers result in more projects and more experiments.
Assumption 4. Teams with more mature machine learning capabilities have better instrumentation, better quality data, and more data sources that are useful to the data science team.
Machine learning models are highly sensitive to data shortcomings. And so teams with mature ML models tend to have higher quality data and aggregate data sets as compared to other teams. If this is not the case, stop what you’re doing and fix that tire fire.
Assumption 5. Teams with a user-facing aspect run additional client-facing experiments.
These experiments sometimes double the number of experiments on an engineering team.
Assumption 6. Having just one data scientist significantly improves the quality of data (leading to less buggy data products) and the speed of decision making (leading to faster product iterations).
Based on the assumptions above, my proposal is to assign a point to every engineer on every team (client + backend). Note that each point contributes to a sum that represents the relative amount of data science work that is needed.
Sum the points and sort in descending order. Break ties by a team’s data practices maturity. The higher the maturity, the lower on the list.
Assign one data scientist to every engineering team on the list, starting from the top. Note that this is not per project but rather per engineering team (PM + EM + designer + user researcher + group of engineers). This will help get all teams from 0 to 1.
At this phase, if you still have more data scientists, take 5 points off (we are taking this to be the ideal data scientist to engineer ratio as a start) of every team. Then assign another data scientist, starting at the top, and repeat.
This approach presented above gives a clear, fair, safe, and effective strategy for allocating data scientists to product teams.
Please let me know your thoughts in the comments or the Tweets.
Update. Here is some feedback from the Tweets: