Word2Vec for Talent Acquisition

6 min readSep 13, 2018

“Deep Learning”, “AI” and “Cognitive” are buzzwords that are commonly appearing in our daily feeds. The real value of data science in AI is in the impact we can make on the world. As a data scientist in the HR domain, I’d like to use this space to explain how these technologies are impacting the world of talent acquisition.

Identifying skills of relevance to an organization

With extensive emerging technologies and ample disruption in the market, it is challenging for organizations to find the right talent to meet the demands of businesses at a sufficiently rapid pace. New roles like “Blockchain developer”, or “UX expert” or “Machine Learning engineer” are cropping up; yet, supply of talent in these critical areas is scarce.

It is more cost effective and faster for businesses to source talent in-house. So the question is, how can we find experts with career potential to fill roles in these critical areas without having many candidates with the exact knowledge for the job? Here is a possible way forward:

Identify people with the closest set of related skills within your organization
Tailor personalized learning programmes to enable people to build the skills needed to perform in new roles.

In this post, I will focus more on the first bullet: how to identify people with related skills. However, a follow-up article on recommendation engines for tailoring learning programs for HR can provide more insights on how to deliver personalized recommendations to users once segmented into groups.

Selecting when skills are not an exact match with employer needs

The concept of “transferable” skills is crops up quite often. However, can we quantify how “transferable” a skill is using a mathematical formula? How can we mine skills from a profile and identify how “transferable” those skill sets are?

Recruiters and hiring managers are often overwhelmed with so many profiles to screen

Word Embedding to understand profile relatedness

The image below is an illustration of the output of Word2Vec. The Word2Vec algorithm has been repeatedly applied in the field of text mining to identify semantic relationships between words. When two words commonly appear together in a text, over a corpus of millions of documents (such as the Wikipedia dataset), we can mine relationships between words. The most famous example is King - Man + Woman = Queen.

Word2Vec: Words that commonly appear together are closer together

Semantic relationships between words can be applied in a similar way to extract closeness and relevance of skills and rank people’s profiles against those skills. For example, “java”, “javascript”, “python” and “perl” are semantically related as programming languages. If a candidate does not know how to code in “python” but they are “javascript” experts, the effort in learning python would be lower than that of a person with no prior knowledge of programming languages.

Bag of words explained

A person’s CV can be converted to a bag of words, where each word is represented as a series of zeros and ones.

Let’s take two examples of position titles:

“Java Programmer at Amazon”

“Consultant at Bain”

Assume all of these words can be encoded as One-Hot vectors:

Java: 1000000

Programmer: 0100000

Amazon: 0101000

… And so on

Comparing relatedness of words using One-Hot vectors does not provide any metric of distance, given that all of these are perpendicular. However, with Word2Vec we find that words that appear together in a document are also closer together in vector space. Therefore, in the example above we may find that after training a skip-gram Word2Vec model with 2 nodes in the hidden layer, words like “Java”, “Programmer” and “Amazon” appear closer together than words like “Consultant” and “Bain”.

E.g.

Java (0,1)

Programmer (1,1)

Amazon (1,2)

Consultant (4,2)

Bain (4,3)

Using this approach, we can use a metric like the Euclidean distance between two vectors or the cosine similarity to quantify how close two words are to each other. For example, take the Euclidean distance between “Java” and “Programmer” and “Consultant” and “Programmer”:

Distance(“Java”, “Programmer”) = Sqrt [(0–1)² + (1–1)² ] = 1

Distance(“Consultant”, “Programmer”) = Sqrt [(4–1)² + (2–1)² ] = 3.16

Therefore, we see that “Java” is closer to “Programmer” than “Consultant” in this very generic example. (*This is an oversimplified concept, since consultants can also be programmers).

Using the Word2Vec approach to encode CVs

So taking this idea forward, how do we encode a person’s full profile? Assuming that we are only interested in extracting keywords from CVs, and that all words within a CV have been converted into embedded vectors, what is the next step?

Take “Java Programmer at Amazon”, now encoded as:

[(0,1), (1,1), (1,2)] (we removed stop words in the encoding of the position titles)

There are multiple approaches for sentence embedding in the literature. A simple technique I found in this kaggle post is to take the average of all individual words:

Bag of Words Meets Bags of Popcorn

Use Google's Word2Vec for movie reviews

www.kaggle.com

So in this example, “Java Programmer at Amazon” becomes (0.67, 1.33) and to be able to compare to other people we can also apply cosine distance or Euclidean distance to assess how close or far away they are. For example, “Consultant at Bain” is now (4,2.5).

Back to talent acquisition…

So we spoke about the maths and a distance metric to find profiles that relate the most to the skills the organization needs… but how does this link back to the problem statement at hand?

A talent acquisition professional has multiple CVs at hand and positions that need to be urgently filled. There is no straightforward way for a recruiter to scan through hundreds of CVs. Even with a basic keyword match search engine to find these people, by looking for exact matches it is easy to miss out on other individuals with potential skills that are closely related to the opening. Therefore, this article presents a simple distance formula that provides a quantity of how “transferrable” or “nearby” a person’s skillset is. With this approach, a talent acquisition professional can scan through thousands of profiles and identify the most relevant ones without missing out on potential candidates who may have skills that can be easily transferred to meet the demands of the role.

However, the limitations of such approach is that we are only relying on keywords from CVs to extract information. The vastness of an individual’s digital and social footprint provides a lot more information about a candidate that can provide very useful insights to a business on other competencies/personality traits. For example, IBM has a Personality Insights service that mines Big 5 traits from digital social footprint.

In addition, the triangulation of multiple datasets (structured and unstructured) can provide an extra layer of verification. Keywords can be tweaked to a candidate’s advantage and bias selection algorithms towards people with the “right kind of words” in their profile. Therefore, innovative approaches to verify and endorse an individual’s skillset through other pieces of evidence/alternative data sources will add tremendous value in the recruitment space.

Though this article relates Word2Vec to candidate selection and job matching, the potential opportunities to make an impact with AI in HR are exploding. People analytics traditionally is aimed at delivering data-driven insights to make better business decisions. However, the true value will arrive once insights can be translated into products at scale.

Next time…

This is my first blog post, so any constructive feedback or comments on which aspects of AI in HR you would like to learn more about would be much appreciated! Many Thanks!

Here are some useful links for further reading:

Bag of words

https://www.kaggle.com/c/word2vec-nlp-tutorial#part-1-for-beginners-bag-of-words

One Hot Encoding and Categorical Variable Encoding

https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/

Word2Vec — Another interesting blog from MEDIUM

https://medium.com/explore-artificial-intelligence/word2vec-a-baby-step-in-deep-learning-but-a-giant-leap-towards-natural-language-processing-40fe4e8602ba