Job Recommendation based on Extracted Skill
Embeddings

Published in

Kariyer.net Tech

7 min readFeb 7, 2022

With the increasing popularity of online recruiting platforms
in modern industry, most employers choose these platforms as a means of
connecting with potential candidates for open positions. Developing job recommendation systems can significantly help both employers and job seekers in speeding up this process and finding the best matches. Using skill phrases extracted from unformatted and unstructured CVs and Job Descriptions, we implement two approaches with different similarity metrics, namely Word Mover’s Distance[1] and Cosine Similarity. We selected TF-IDF with Cosine Similarity as a baseline and evaluated our approaches on the real data from Kariyer.net, which is an employment-oriented online service located in Turkey.

Datasets

We collected 500,000 Turkish and English Job Descriptions (JDs) related to Information Technologies (IT) posted on Kariyer.net. They are divided into two parts: 450,000 of them are used for training a word2vec model and the rest is used for experiments and evaluation. To represent job-seekers, we gathered 7700 CVs only consisting of people working in the IT industry.

Kariyer.net has collected a Skill Dictionary by human relations experts and from users’ feedback over the years which consists of only English keywords. This dictionary is used in the skill extraction module to find existing skill keywords. We narrowed our focus and only gathered skills related to IT.

Methodology

We propose two different approaches. The first one utilizes Word Mover’s Distance [1], a similarity metric between documents, and the second one uses cosine similarity. Besides the choice of ranking algorithm part both of the approaches take the same input and have the same core phases:

Skill Extraction
Word2Vec Model
Feature Selection
Ranking Algorithm

Skill Extraction

The skill extraction module aims to find all the skills, single-word or multi-word phrases. To benefit from the knowledge accumulated over the years at Kariyer.net and to be able to work with the unstructured CVs and JDs, the skill extraction module uses the Skill Dictionary to both identify and extract the skills. As shown below, this process takes raw text as input; JD or Job-Seeker’s CV.

Workflow of the Skill Extraction Process

It first checks for the skill phrases that exist in the Skill Dictionary. For every skill phrase found in the raw text, every white space, punctuation, and stop-word is changed with a fixed character. For example, white space and the dot in “machine learning” and “asp.net” are changed with a fixed character. This ensures that those characters will not be removed in the cleaning phase. It also prevents multi-word skill phrases separated by white space (e.g “machine learning”) from tokenizing into multiple tokens. Then, the modified text is tokenized based on white spaces and applied n-grams (for n=4). As a result, all the grams that exist in the Skill Dictionary are saved as extracted skills.

Word Embeddings

We trained a word2vec model with 450,000 Turkish and English JDs. Because of the noisy texts on the CVs, we didn’t include them in the training process. Even though we lost the chance to add more data to the training process, the subjectivity of individuals’ way of preparing CVs was affecting worse on capturing the semantic relationships of the words. Therefore, we decided to include only JDs in the training.
Before feeding up the model with CVs for training, we applied the same trick as before in the skill extraction phase. For every skill phrase found in the raw text, every white space, punctuation, and stop-word is changed into a fixed character. This way, the word2vec model treats multi-word phrases as if it was a single-word phrase. An example of this modification is “machine\_learning” instead of “machine learning”.
After some quick observations, we decided to use the CBOW model with a vector size of 300. This model is used to convert every skill found in the skill extraction part to a 300-dimensional vector. It enables us to associate every raw text (i.e CV or JD) with a set of vectors. We also observed that short texts with very few extracted skills do not give promising results. We interpreted this as a lack of information to represent a person’s skill set or required skills for a position. Thus, we decided to filter out CVs and JDs with less than 4 extracted skills.

Feature Selection

Among extracted skills from CVs and JDs, some skill keywords should not be regarded as skills. For example, if we feed the skill extractor with “I developed a customer support website using React and Node.js.” sentence, four skills would be extracted from this phrase: “customer support”, “website”, “react”, “node.js”. However, “customer support” is an irrelevant keyword for this specific JD and does not represent a skill. To handle similar instances, the mean of all skill vectors for each CV and JD is calculated by summing them index-wise and dividing them by the number of the extracted skills. Then, the cosine similarity between the mean and all the skill embeddings extracted from that job description are calculated. For each skill embedding, if any of them has a cosine similarity less than a threshold, it is removed from the extracted skill set.

Ranking

The goal of the Ranking Algorithm is to assign a “similarity score” for each job description and return the top “n” results. It assigns scores based on how well-fitted a person is for a particular position. For each CV and JD, a subset of extracted skill vectors is passed into the ranking algorithm.
Two approaches differ in this step: the first approach finds the nBOW representations and assigns a “similarity score” using WMD; the second approach aggregates skill vectors of JDs and a CV, then uses Cosine Similarity to calculate “similarity score”s. Details are explained below.

Word Mover’s Distance Approach: Word Mover’s Distance is a hyper-parameter free distance metric between text documents. It leverages the word-vector relationships of the word embeddings by calculating the minimum total distance that takes to ‘travel’ from word embeddings of a text document to word embeddings from another text document.
To rank the JDs, we first calculate the normalized bag-of-words (nBOW) vectors represented by:

where c is the number of occurrences of a word in the document, d is the nBOW vector representation of a particular word and n is the size of the vocabulary. The former step is required by the WMD. Then, we calculate the WMD between a CV and all the JDs and return the top “n” JDs with the smallest distance values.

Cosine Similarity Approach: A methodology was needed to aggregate skill
embeddings to satisfy the following: 1) It should represent a raw text with only one vector (compared to many skill vectors) so that the cosine similarity function can be used to rank JDs. 2) It should capture the hierarchy of importance between skills so that some skills are prioritized over others. To overcome the latter, IDF (inverse document frequency) values of all skills in the CVs and JDs are calculated. Each skill vector is multiplied by its IDF value. Then, for each document (i.e JD or CV), a single vector is constructed by summing skill vectors index-wise and dividing it by the total of IDF values of the skills:

where w represents the word embedding of a skill, idf is the IDF value
of that skill, idftot is the total of the IDF values of skills and V is the vector
representation of a particular raw text. Now, the candidate’s CV and each JDs
are represented as a single 300-dimensional vector. Then, the Cosine Similarity between a CV vector and all JD vectors is calculated. Similarity results are assigned as “similarity score”s for each job description and top “n” JDs with the highest scores are returned.

Evaluation

To evaluate our two approaches, we selected baseline as TF-IDF vectors with
Cosine Similarity as the comparison metric. We labeled each CV-JD pair as one or zero based on whether it’s an appropriate recommendation considering both the skills required by the employers and the job seeker’s skills. Due to the nature of the recommendation task, we share the results based on precision at k (for k values of 1, 3, 5, 10) and consider only the highest 10 JDs for each CV. In total, we labeled 680 CV-JD pairs and the results are demonstrated below.

References

M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, “From Word Embeddings To Document Distances.” Accessed: Jan. 22, 2022. [Online]. Available: http://proceedings.mlr.press/v37/kusnerb15.pdf.

Job Recommendation based on Extracted SkillEmbeddings