Introducing CV2Vec: A Neural Model For Candidate Similarity

Rob May
Talla
Published in
3 min readMay 12, 2016

Talla builds intelligent assistants that help perform some of your more monotonous work tasks. Our first assistant is for HR and does some simple things like schedules interviews, answers basic HR questions, and helps manage the onboarding process. But one thing we heard from our beta testers time and time again, that Talla didn’t yet do, is that they wanted help sourcing candidates.

As we thought about this, we wanted to find an approach that was novel and unique. We use word vectors at Talla and, while discussing this great post by Chris Moody, we wondered if you could do the same thing for resumes. In other words, could you map resumes into a vector space and use the vectors to determine candidate similarity in a more composite way, as opposed to using a specific key word search?

We had no idea if it would work, but as it turns out, it works surprisingly well. The key idea is to take natural language descriptions of job duties, positions, and other information provided and map it into a dense vector representation. By doing this, we are able to perform candidate searches on more than just keywords. We can find candidates that are most similar to a reference person or the job ad itself, cluster people together and visualize how CVs align with each other, and even make a prediction as to what someone’s next job will be using a LSTM. We aren’t aware of anyone currently trying this kind of approach.

Here is an interesting visualization of the space of CVs:

The other thing we can do with it is search by analogy. We can take one profile, subtract another profile, add a different profile, and check on the result. It’s very useful if you interview a candidate who has a lot of great experience but isn’t quite the correct match. If you’ve ever had that discussion of “we want to hire someone with her background, but just different in this one little way” well, now you can actually run that search.

There are some problems with this approach, mainly that there is no good objective function, and tuning the hyperparameters is hard.

We have filed a provisional patent on CV2Vec (I know, I know, cue the boos from the Hacker News crowd), but we plan to open source most of it so that it is a community asset, once we better understand our various licensing options. We did this because, if all we did was open source it, the existing players with great data sets would just use it and it would reinforce their position of market strength. And really, user data like this should be owned by the users who submit it. So we are trying to find a licensing route that only allows you to use it for free if you contribute some of your data back to the project and protect the data rights of individuals who may submit their own data. If you are interested in the announcements around that, you can fill out this form and we will be in touch once you can download the model.

If you would like to try it out for candidate recommendations, just send us an email. Let us know if you have any questions, and of course, if you find this stuff interesting, Talla is hiring in Boston and San Francisco.

--

--

Rob May
Talla
Editor for

CTO/Founder at Dianthus, Author of a Machine Intelligence newsletter at inside.com/ai, former CEO at Talla and Backupify.