How we are building our data science team at Clustree 1/3 — Our Context

Cyril Le Mat
5 min readApr 16, 2019

--

Executive Summary

Clustree is a venture-backed, Paris-based, Artificial Intelligence (AI) startup operating in the field of Human Resources (HR).

We leverage algorithms and deep learning to understand, enrich and maintain the complex, incomplete and unstructured career data (skills, skills’ levels, job titles, wishes, training courses, education, etc.) of a company’s pool of talents (employees, candidates, alumni, freelancers, etc.) to deliver, through a SaaS model, tailor-made recommendations for various use cases (internal mobility, talent sourcing, project staffing, strategic workforce planning, etc.).

In the specific context of our B2B startup 1) where AI is the actual product, which makes the data science team critical, and 2) operating in a relatively new field of application for machine learning, which pushes us to deliver step-change impact in our AI effort (vs. incremental optimization), we built our team of data scientists on the following principles:

  • Hire a diverse, complementary team (in terms of key strengths, backgrounds, personalities, etc.) to increase brainstorming efficiency in problem-solving.
  • Hire “full stack data scientists” and avoid functional specialization to favor autonomy, trial and error and learning within the team (to make it more agile).
  • Pick a team leader who is business-oriented, down-to-earth and a good manager.
  • Build a small team, step by step, with a lot of pragmatism regarding your actual needs.
  • Assess the modelling & research capabilities of candidates through real-life case studies (vs. theoretical tests) during the recruitment process.

We wrote this post together with Guillaume Durao, Chief Operation Office at Clustree, hoping it could benefit other AI startups.

Building a data science team is a work of goldsmith

This is the data science team at Clustree (Mikael, Cyril, Thomas, Victor and Antoine).

This are the team who got the company in this article from Josh Bersin (the main influencer in the HR industry), as the only French startup which matters in the field of AI applied to HR (AI In RH: A Real Killer App, by Josh Bersin).

They are pretty much all French, white males in their thirties and they all come from the most prestigious engineering schools, which is kind of a problem for a startup such as Clustree, which is supposed to foster diversity in HR through bias-free decision-making (but we are working on this for our next hire!), but we are very proud and certainly grateful to have them with us.

Assembling such a data team has been a challenge and quite a long process (and we are obviously not finished yet as we are looking for a 6th data scientist right now), given the war for talent, which is taking place in Paris right now.

French startups have attracted over €600 million in funding in January/February 2019 alone (!!) with massive fundraisings from companies such as Alan, ContentSquare, Malt, Virtuo, Wynd, etc. and most of that money will go towards paying the salaries of technical talents (and office space and furniture!).

“In France, each month, we estimate that there are some 400 new openings for data scientists”.

Moreover, when compared to other European startup hubs, Paris is blessed with an over-average number of interesting, well-funded AI startups such as Cardiologs, Dataiku, Deepomatic, Doctrine, Shift Technologies, etc.

Here is the French AI ecosystem according to France is AI.

This forced us to think carefully about the size and seniority of our data team members, the way we wanted to assemble that team and to make it work together, while we also had to articulate a compelling pitch in order to attract candidates who are in very high demand.

Our context: a pure AI startup in a relatively new field for machine learning

Artificial intelligence is one of the most misused terms in technology today. As a matter of fact, a few weeks ago, a provocative report from British Venture Capital firm MMC Partners claimed that 40% of European AI startups are… not using AI at all.

On the other hand, almost every startup has now a team of data scientists working on some algorithms. Most of the time, data scientists either 1) support other internal teams for decision-making (allocation of marketing budgets, product development decisions, pricing mechanisms, etc.) or 2) work on a specific challenge (logistics optimization, identification of seasonal trends, recommendation engines, etc.) to improve a product or a service.

For instance, at Netflix, data science is used to make suggestions on the movies you should be watching. But Netflix is not selling recommendations. It is selling content.

Things are a bit different at Clustree. Because recommendations are our actual product.

“At Clustree, AI is not a way to optimize our product. AI is our product”.

The data pipelines and algorithms that we use to collect, enrich and maintain our customers’ valuable but chaotic data are at the core of everything we do. “Productization” is then mainly about gathering extra data from employees (on top of what can be detected automatically elsewhere) and displaying, explaining, sorting, searching that data and those recommendations in various contexts (internal mobility, external recruitment, staffing on projects, choice of training courses, etc.). This is what can be seen on the screenshot below that shows one part or our product.

Another specificity of our context is the originality of our field of application for AI.

In collective imagination, machine learning is traditionally associated with environments such as Finance, Transportation, Marketing, CRM, etc. which abound with clean, quantitative data. In those fields, data scientists mostly work on incremental improvements (having better hotel yield, better transportation routing, better CRM segmentation, etc.).

With career data, there is roughly no available technology yet, except for keyword searches.

“Career data is a largely unexplored field of application for machine learning”

Therefore, assembling a great data team and making it efficient and creative is critical for Clustree. Because AI is our product and because we need to explore a new field to win our big bet of reaching deep understanding of career data.

To date, Clustree has invested over 4000 days into pure data R&D and has built a product relying on dozen of algorithms, including 5 deep learning networks.

We are in the business of helping HR professionals relying on data-driven, skills-based decision-making regarding their pool of talent so we had to invest in having our own framework for data science skills.

Following posts :

This post is part of a series of articles about the data science team at Clustree :

--

--

Cyril Le Mat

Head of data and software @Sunrise , Ex-Head of Data Science @Cornerstone on Demand @Clustree @Hostnfly @Cheerzto