#2 — Skills diversity: Building the right machine learning team
This is story #2 of the series Flight checks for any (big) machine learning project.
Great! you have clear KPIs and a more robust understanding of the problem.
As in any successful project, it is important to ensure that the team that carries it out has the necessary skillset.
In the case of machine learning projects, the required profiles are so novel that they can become somewhat confusing. Even IT professionals may not fully understand the differences between a data scientist, a business analyst, a machine learning engineer, a data engineer, a backend engineer, etc, etc.
Regardless of labels (which are generally useful more than anything to simplify job market search), it’s simpler to think about what tasks a relatively large machine learning project usually involves, and whether or not the team has those skills.
Let’s also think about roles and not people. Our project may be carried out by a single person and that is fine, but that person should have all the necessary roles.
We are always going to need the roles; if there are fewer people than roles, it is likely that team members will exchange hats throughout the development life cycle.
Don’t get me wrong: even if there are more people than roles, it is perfectly valid that each has more than one role.
Let’s start with the roles (those keywords that appear on LinkedIn) that are more typical of machine learning projects (in addition to the already know backend and front end profiles, which will be important especially in the integration stages), to understand them a little more:
The Business Analyst (BA):
I will address the business needs to make sure that we are solving the right business problem
The Data Scientist (DS):
I will translate the business needs in a mathematical and machine learning model
The Machine Learning Engineer (MLE):
I will implement the model development workflow and monitoring, trying to automate the steps in order to reduce the operative overhead
The Data Engineer (DE)
I will be responsable for build the data pipelines in order to make the data available, joinable and usable in a offline and online way.
A successful machine learning project team usually needs to have at least all these skills to a greater or lesser extent.
There are different configurations to achieve this and there is no single recipe for success — it depends more on the size and timing of each project.
As a side comment, in the following scenarios that I am going to share, I take for granted that there is a management layer already developed that can exercise the role of Project Manager. As such, it would take on a leading role in communicating with and updating sponsors and guiding collaborators in their career plans, objectives and feedback. It would also put together work dynamics to help prioritize and assign tasks correctly, detect dependencies and bottlenecks and offer other support tasks necessary for the development of the project.
Some working scenarios I’ve seen:
Small project or Proof of Concept (POC):
- 1 DS or Backend or or MLE or BA with Machine Learning courses (the data unicorn).
- They do whatever it takes to build a proof of concept and bring it into production even if it is “wired”.
- If the project is successful and grows, tougher engineering profiles are added.
- At that point, it may be that the organizationally “natural” thing to do is for the DS to take on leadership of the project. Yet, that may not always be aligned with their career objectives. Maybe they are interested in moving across projects setting up POCs and training those who will later be in charge, but not directly leading the project.
Small or medium project:
- 1 BA + 1 DS + 1 MLE or Backend Engineer
- The DS defines the problem, models, generates tests and shares metrics. They work jointly with a BA who has a better understanding of the problem.
- Backend plugs in everything you need to be in production.
Medium or large project:
- 1 BA + 1 DS + 1 MLE + 1 DE
- The DS defines the problem, models, generates tests and shares metrics. They work jointly with a BA who has a better understanding of the problem.
- The Data Engineer focuses on how to achieve a sustainable governance of the data necessary to solve the problem and advises the DS on tradeoffs.
- The ML engineer helps the DS to automate the retraction of the model and ensure its real time monitoring, also plugs in everything you need to be in production.
Large project:
- 1 BA + 1 DS + 1 MLE + 2 DE
- As the project grows, the bottleneck is usually more related to the data pipeline than to modeling. Adding people to that part of the problem is usually a good decision.
Some settings I saw fail:
Small or medium project:
- 1 DS + 1 BE (without any BA that supports and helps).
- In general and especially in the early stages, having full business analysts assigned to the initiative is necessary for success so as to ensure that you are solving the correct problem.
Medium or large project:
- 2 BA + 4 DS + 1 MLE + 1 DE
- Adding more DS usually creates a rapid bottleneck in the ability to productize these new tests and ideas, causing DSs to have to either resign themselves to not seeing quick impact, or to have to step out of their desired role to build a backend.
Large Project:
- 1 BA + 2 DS + 2 MLE
- Not including experts who work on the data pipeline will generate sustainability problems in the medium term.
Let’s illustrate this point. I have been involved in credit card fraud prevention projects where rapid impact is highly prioritized due to the changing dynamics and importance of the project for business metrics. As a result, there were many data scientists working on different models making headway on many fronts in parallel. This revealed shortcomings in the data pipeline turning it into a bottleneck for all projects. The trend was reversed by adding more DE and MLE efforts to the solution, ultimately empowering the data scientists and accelerating their impact on the business.