[1 /2] Job & Resume Matching — Obtaining similarity score using Doc2Vec

4 min readJun 22, 2023

This series of Job and Resume matching for the use case of How recruitment companies filter the candidates to pass to their Hiring managers for interviews and further recruitment procedures consists of 2 parts:

1. Matching CV and Job Description to obtain the score using Doc2Vec, then
2. Indexing candidates through the produced scores and their preferences as well as the available slots of the employers, using the Gale Shapley algorithm.

Current Recruitment Procedure of Employers:

Filter the matching scores of candidates to retain around 30 applications
Pass those 30 applications to Hiring Managers to select 5 candidates
Call for interviews of those picked 5 ones
Further interviews and offers

To pass the first round, in which I believe more than 90% of employers are using the Application Tracking System (TSA) to filter the candidate, we should modify our CVs to match the target job.

This part will demonstrate how the score is calculated so that you can apply this technique to modify your CV.

Data

The author has trained the model on a dataset that contains current job postings available on the City of New York’s official jobs site in 2020. You can follow this link to download it: New York Job Posting Dataset

Introduction

CV Job Matching using Doc2Vec is a technique that aims to match job descriptions with resumes by representing them as numerical vectors using the Doc2Vec model. This approach allows for efficient comparison and similarity calculation between textual documents.

In the field of machine learning, representing text documents numerically is a challenging task. However, it is essential for various applications, such as document retrieval, web search, spam filtering, and topic modeling. Doc2Vec, a variation of the Word2Vec algorithm, provides a solution by generating vector representations from words.

Word2Vec algorithms, such as Continuous Bag-of-Words (CBOW) and Skip-Gram, are used to create Word2Vec representations. CBOW predicts the current word based on the surrounding words in a sliding window context. Each word is then converted into a feature vector, and these vectors become the word vectors after training. On the other hand, Skip-Gram predicts the surrounding words given the current word. It is slower than CBOW but is known for its accuracy with infrequent words.

Implementation

To implement CV Job Matching using Doc2Vec, we start by importing the necessary libraries and loading the job data from a CSV file. We preprocess the data, keeping only the relevant columns, and merge them into a new column called ‘data.’ Then, we tokenize the words in the ‘data’ column and tag them with unique identifiers using the TaggedDocument class.

Next, we initialize the Doc2Vec model with specific parameters, such as the vector size, minimum count, and number of epochs. We build the vocabulary by feeding the tagged data to the model and then train the model on the tagged data.

After training, we save the model for future use. To match a resume with a job description, we load the saved model and preprocess the resume and job description text. We convert them to lowercase and remove punctuation and numerical values.

Using the trained model, we infer the document vectors for the resume and job description. Then, we calculate the cosine similarity between the two vectors to determine the match between the resume and the job description. The cosine similarity score ranges from -1 to 1, with 1 indicating a perfect match and -1 indicating no similarity.

By employing Doc2Vec and cosine similarity, this approach enables efficient and effective matching between job descriptions and resumes, helping to streamline the job application process and enhance the chances of finding the right candidates for specific positions.

Finally, the author also employs a Gauge chart from Plotly to show the matching percentage with a threshold that users could consider modifying their CV to pass the Application Tracking System (TSA) from the majority of employers.

Detailed codes are shown here: https://github.com/kirudang/CV-Job-matching/tree/main

Please refer to Matching_Algorithm.ipynb notebook.

Testing

I have used the trained model to test my friend’s CV with a Bank Teller, the one that she wanted to apply, and the outcome is 52.3%, which is moderate.

In the below chart, a notification is also poped up to remind the candidate to improve her CV further.

Please comment below if you have any queries!

Continue to read part 2 of this series at: https://github.com/kirudang/CV_Job_matching_2/tree/main

References

B, N. (2022, December 14). Build accurate job resume matching algorithm using Doc2Vec. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2022/12/build-accurate-job-resume-matching-algorithm-using-doc2vec/

Distributed representations of sentences and documents. (n.d.). arXiv.org. https://arxiv.org/abs/1405.4053

Jindal, S. (n.d.). Shailja-jindal/bidirectional-job-Resume-Recommender-System. GitHub. https://github.com/Shailja-Jindal/Bidirectional-Job-Resume-Recommender-System/tree/master

NYC jobs — dataset by city-of-ny. (2018, July 30). The Data Catalog Platform | data.world. https://data.world/city-of-ny/kpav-sd4t

R&D, L. J. (2022, April 28). AI-based automatic resume Analysis. Medium. https://lajavaness.medium.com/ai-based-automatic-resume-analysis-795e2f91cdf9

Shperber, G. (2019, November 5). A gentle introduction to Doc2Vec. Medium. https://medium.com/wisio/a-gentle-introduction-to-doc2vec-db3e8c0cce5e