How we are building our data science team at Clustree 2/3 — The composition

Cyril Le Mat
8 min readApr 16, 2019

--

Previous post :

This post is the second of a series of articles about the data science team at Clustree, starting here :

The skills’ framework we use when considering data scientists

This is a skills cloud for our own team of data scientists, using our technology.

“Data scientist” is in fact a pretty vague label. There is a large variety of data scientists on the market and a broad range of expertise can be found.

Below is our understanding of the different types of competencies which need to be considered and assessed when looking for data scientists.

Data modelling and mining skills

First, there are the data modelling and mining skills, the natural competencies coming to mind when thinking of data scientists.

  • The more generic know-how is what we can call “traditional machine learning”. It is the ability to master standard algorithms (linear regressions and classifications, trees, support vector machines and clustering) to build quick, efficient solutions to solve traditional business problems.
  • Another expertise is around “Deep Learning”, which encompasses Natural Language Processing (NLP), Image, Audio and Reinforcement learning methods. In that field, some very specific expertise is required to apply machine learning to complex, usually huge, sets of data.
  • Next, we have “statistical expertise”, which is about applying advanced mathematics and hypothesis-driven thinking to any problem. Statisticians will do a great job when complex probabilities are involved.
  • Finally, there is the “research expertise”. It is possible to find former researchers in the market who have done modeling without using machine learning (in Physics for instance). Their focus on validation and exploitation of models can be extremely useful.

Software engineering skills

Secondly, a data scientist is foremost a software builder and thus needs “software engineering skills”.

  • “Traditional software engineering skills” — Depending on your organization, you may need either an advanced level in traditional software engineering (comparable to that of other software engineers in your company) or a DIY / experimental / good enough level for your data scientists to have autonomy and understanding of the functioning of an engineering team.
  • “Data-related / data-specific software engineering skills” — Data Science does have some technical specific requirements: data engineering skills for parallel data processing and storage and machine learning engineering for high quality models productionizing.

Soft skills

Depending on the typology of organizations and the actual role of data scientists, those may need to interact with various teams internally and potentially be involved with external counterparts (much more than developers are). Therefore, data scientists also need to develop a certain type of “soft skills”.

  • “Communication” — Building a powerful model is great. Being able to explain it, along with its philosophy, its pros and cons, etc. is also extremely valuable.
  • “Visualization” — Every business with lots of data needs its data scientists to be good at displaying that data through visualization techniques.
  • “Business sense” — The more data scientists understand the ins and outs of their industry, their customers and their company’s value proposal, the more relevant and efficient they will be in their scientific work. To be a good data scientist, you need to innovate, create, think and not only execute so you need to understand the context to know what can make a difference and is worth your time.

The specific data science skills needed at Clustree

We need deep learning skills. Our core competency is our ability to understand and process career data, which is mainly textual, through NLP. In contrast, we don’t really need specific mathematician or statistician skills. Finally, our traditional machine learning needs are also quite limited.

As we are building an AI software and do have Terabytes of data, we are quite demanding on the software engineering skills. Our data scientists devote a significant part of their time to writing code (which is quite similar, in terms of complexity to what our developers are writing, with real challenges around stability and scalability) and need to work on technologies related to processing and storing data.

We are also demanding on the “soft skills” front. Especially on “Communication” and “Visualization”. At Clustree, we sell HR software to large corporates (Carrefour, Sanofi, Solvay, etc.). We have customers (executives, HR professionals, employees, etc.), partners (consulting firms, software vendors, etc.), a product team, a sales team, a customer team, etc. and our data scientists need to be able to interact with ALL those counterparts that have very diverse understanding of their field.

All our data scientists are nice and warm people and they all enjoy spending time with customers who are directly in interaction with the AI. Those often need comforting, and love visualizing their career data to get new insights every time we meet them (see below a data visualization of the main career paths at one of our customers’).

Finally, as already explained, given the complexity of the data we process and the limited development of machine learning in our field, we need senior data scientists who are open-minded, self-confident and not afraid to experiment.

The composition of our data team is based on 3 guiding principles

Based on our context and our specific requirements, we built our data team with the following convictions:

  1. We want all the very rare data science skills that we need (at expert level) AS A TEAM, not in every individual.
  2. Our data scientists are generalists, not specialists.
  3. We need a team leader with business acumen, who is pragmatic and a great manager.

We cover our need for data science skills at the team level, not at the individual level

Of course, most recruiters say that the skills we need on the data science front at Clustree are very difficult to find (at a sufficient level of expertise), in any given candidate (and that’s a very good reason to cover those skills as a team and not to expect to find them in every single data scientist we hire).

But thinking as a team mostly enables us to consider bringing diverse profiles (and not clones), which is critical to increase creativity in our collaborative work culture.

We consider that a specificity of the data science field is that brainstorming sessions have a high impact on productivity because defining a problem and deciding on the best way to tackle it are crucial to your ability to get results (as opposed to implementing known solutions or spending time optimizing an initial solution).

In such context, people are more important than methodology.

We believe that a data team is more efficient when it is made up of people with very different attitudes towards risk-taking, diverse training backgrounds, various cultures and alternative sets of modelling, software engineering and soft skills.

Confronting the points of views of a researcher and someone product-minded, opposing a very rigorous engineer with an instinctive one, orchestrating a debate between a big data architect and a senior software developer, mixing the inputs of a statistician and a deep learning expert, etc. is creating a lot of value.

“Diversity & brainstorming increase the productivity of a data team whose goal is step-change impact (vs. incremental improvements)”

Therefore, we decided to build a complementary team and to maximize the differences between our data scientists.

We hire generalists, not specialists to encourage learning and initiatives within our team

We assembled a complementary, diverse team but we nevertheless want our data scientists to be well-rounded. We want generalists because we don’t specialize our data scientists by function internally at Clustree.

They can have different “majors” in their skillset, but they must be “full stack data scientists” and have minimum capabilities in all the dimensions of a data scientist work so they can 1) have a holistic view of a problem, 2) understand what their ideas imply in terms of execution and 3) have enough autonomy to work on any given project without depending on anyone.

Many companies organize their data science teams by function. Some people source the data, while others model it, implement it, measure it, etc. Teams of experts with data engineers, research scientists, machine learning engineers, etc. are thus created. This type of organization is purported to create productivity gains along the value chain as people specialize in their respective fields. Unfortunately, we believe it is not really adapted to data science because data science is not about execution and it is not adapted to Clustree because we are looking for step-change impact and not incremental improvements.

This is the other specificity of the work of a data team. It needs to build things and to test them before implementing them and then improving them repeatedly. Consequently, a data team should be assembled to learn through experimentation and iteration.

When you have full stack data scientists, you encourage learning and iteration through autonomy. Our data scientists have broad responsibilities and perform diverse functions: from conception, to modeling, to implementation, to measurement.

Our generalists may not be as experts as specialized data scientists in a given function. But again, we are not pursuing functional excellence or small, incremental optimization on a problem which would have already been largely tackled. We want to learn and discover breakthrough solutions to business problems. With full context for the holistic solution, a full stack generalist can come up with ideas that a specialist working in silo won’t even consider. He will have more ideas and try more things, be bolder, fail more. However, in data science at Clustree, the cost of failure is low, and the benefits of learning and coming up with new ideas are high.

Hiring a team leader with technical and business skills

Finding the right person to manage your data science team is obviously critical. In company such as Clustree, hiring someone without either strong business capabilities or technical skills can be dangerous.

Data is crucial to be business therefore the Head of Data will need strong business acumen as she/he will be associated to product decisions and strategic decisions and will regularly presents updates at board meetings. Morevoer, she/he need communication capabilities as she/he is one of the company’s public figures and be regularly speaking to customers.

On the technical front, the Head of the data team needs to set the technical vision for the team, improve technical methodologies and keep the team’s R&D in line with the company’s business priorities. It is also critical to have someone who is passionate about his team’s work and more importantly, down to earth and technically productive. Getting his hands dirty helps implementing his vision, keeping up with the workload, and also earns him the team’s respect. However, the manager does not need to be the best at modelling and coding and the more the team grows, the less he will use his technical skills.

Following post :

Here is the last post about the data science team at Clustree, starting here :

--

--

Cyril Le Mat

Head of data and software @Sunrise , Ex-Head of Data Science @Cornerstone on Demand @Clustree @Hostnfly @Cheerzto