Lymia — From ideation to a AI-powered recruitment product

Alexandre Tadros
Jalgos — A.I. Builders
7 min readMar 1, 2020

--

Lymia, a tool that leverages data science to respond to one of the HR industry’s main challenges, is an interesting use case that shows how communication between experts in different areas allows an isolated AI experimentation to become a deployed product. Let us demystify the way AI powered products can be built and appreciate how a good product means much more than an algorithm that gives amazing results.

There is a justified intuition that HR processes can be optimized using data science (as among other industries). A lot of businesses try to hack the HR industry’s challenges, but the user is rarely satisfied. The data is available, algorithms have been created, there’s a business need. So what’s missing ?

Don’t invent the need for your algorithm — Understand the need from the needy

One of the most common data science problems in HR is the task of matching job offers with corresponding resumes. Downloading a dataset comprised of job offers and matching resumes, training an algorithm to produce the best matching performance for this dataset, and then building a subsequent product is possible, but a backwards way of doing things the right way.

The first thing we did is partner up with a big shot in recruitment, who knows the industry inside and out, its processes, its reasons, its limitations. He tested numerous tools on the market, however was unsatisfied. Reducing a candidate to a matching percentage, without an explanation is not an acceptable way to recruit.

He knows the problem because he experiences the problem: identifying, acquiring, categorizing, and selecting the right candidates is a long and manually intensive process that requires a lot of human intuition that is accumulated throughout a career.

So how do we hack this sector and target an exact need, whether the need already exists or not? Not even he knows exactly what is needed. It is the dialog between experts, designers and data scientists that helped us fashion a useful solution for recruiters.

By exchanging ideas with an industry expert, the data scientist identified the data needed and the way to use it. Showing figures, graphs, and insights from the data that the expert did not know about opened his eyes about the potential data leveraging has on solving this crucial issue within the HR industry.

By inquiring for more data insights, in depth studies, and testing hypotheses about the value data could bring to his job, he then started connecting the dots.

“Wow, you can do that? So can you do this then? Just use this over that to compute this, it might be better”

Meanwhile, the designer captured the HR experts and potential users feelings, and as the need became more defined was able to create an interface with processes and features that enhance the recruitment workflow.

And voila, the solution to our problem : a visual way to represent and easily search through a pool of candidate profiles, enabling the recruiter to identify a relevant shortlist in seconds. The current way of talent picking: using a keyword search and/or having to know candidates and careers by heart was too laborious and inefficient. When looking for candidates for a given job, a recruiter often has a real person in mind whom he thinks would be ideal, based on their understanding of the candidate’s specific experiences. A recruiter has an expert understanding about the context surrounding a position title that helps him make judgement calls.

Our goal was to bring structure to a raw list of profiles by creating a grouping mechanism that associates similar candidate profiles. We thus needed a similarity measure between profiles that was going to be the cornerstone of the “AI” used in the product to be.

Putting together the algorithm

The most exciting part of data science begins. How do I go from textually described sequences of career experiences to a similarity measure between them (one profile being represented by one sequence of several experiences)? There are many factors that a recruiter takes into consideration to assess the similarity between candidate profiles. The similarity between job positions, the nature of companies, career trajectories, years of experience and more… these parameters and more had to be included in our model in different ways that may or may not satisfy our recruiting expert. He then helped us find the right way to grasp how these parameters should be considered in the model. Like how designers conceive user experiences and interfaces by exchanging with users, data scientists design their models with domain experts. When working on any data science project, the greatest outcomes come from the dialog between different worlds. Any data scientist sees the value of communicating with an expert on the data they manipulate.

Once a model was in place, it got a lot of feedback that the data scientist needed to consider. The goal is for the product to be useful and to always use the best model, not to hold on to the algorithm we initially crafted. “Don’t fall in love with your algorithm” they say.

This means that the previous modeling choices were maybe not the best and that some areas or the entire model needed to change. Our way of easily accepting proposed changes was to build the model from small data processing and machine learning components, and organize the code into detachable small bricks that can easily be replaced or modified almost independently. The modularity of the model makes it easier to master and refine.

Make the model usable

The algorithm is not the product. The user does not know or need to know the details of the implementation, the methods used or code design. Things that matter during research and ideation are not relevant anymore. The details of implementations started to multiply and became crucial for production. The algorithm consists of lines of code that need to be made available to the backend (through an API for instance) and work with the other components of the product. The AI part of the product needed then to be resilient towards edge cases and run with other data apart from the sample it already saw.

When building other product components such as databases, data sources and backend, data pipelines became structured and therefore the algorithm needed to connect with data sources the right way. You can’t just load data from local csv files, change the data locally in your session to make something work in your algorithm, deliver graphs or reports and forget about the code afterwards anymore. The data manipulation pipeline needs to work every time when restarting the system, and not only once in order to generate graphs or performances and then kill the session. Due to these requirements the raison d’être for automated tests, dockerization, and continuous integration became obvious.

There was then another kind of problem we had to deal with. Not the kind you’d expect when hearing about data science, machine learning and AI, but the kind that naturally arises when building a product, i.e code that will be usable by users other than yourself. Which docker image should we use? Which libraries does the project need? Which versions of them? How can we interact with the production database? Shouldn’t those variables be made into environment variables? What parameters actually have to be made settable for the backend?

Making an algorithm production grade consists of solving these issues one by one in order to have a standard functioning product in which to iterate for improvements. The help of a strong full-stack developer was necessary to provide guidance for these solutions. Prioritizing and encouraging the team to work together required a product manager who juggled with backend, frontend, data science, and design issues that needed to be organized

Team work makes the dream work

Building an industry specific product that uses AI requires way more than applying an algorithm to a specific data set and having it work. Identifying the problem to be solved, defining the way to solve it and actually solving demands a collaborative squad of data scientists, domain experts, designers and developers who are prone to communicate with one another, lead by a product manager. Working on a data science proof of concept on a specific dataset is like one individual going to test the temperature of the lake water one day, while building a product is like regularly training to win the rowing olympics, category ‘quadruple sculls’.

lymia.eu

https://lymia.eu/

--

--