[2/2] Job & Resume Matching — Matching candidates and vacancies using the modified Gale-Shapley.

6 min readJun 24, 2023

This post is the second part of the Job — Resume matching. In the first session, we learned about how to acquire similarity scores, please visit that article here: https://medium.com/@kirudang/job-resume-matching-part-1-2-obtaining-similarity-score-using-doc2vec-a6d07fe3b355

Definition of Gale-Shapley Algorithm

The Gale-Shapley algorithm, also known as the Stable Marriage algorithm, is a mathematical algorithm that solves the problem of matching two sets of participants based on their preferences. In simple terms, it helps to pair members from two different groups in a way that maximizes their overall satisfaction.

Here’s an example of how the Gale-Shapley algorithm can be applied to the job search scenario:

Imagine there are four job seekers (Alice, Bob, Claire, and David) and four companies (Company A, Company B, Company C, and Company D). Each job seeker has their preferences for the companies, and each company has its preferences for the job seekers.

Preferences of job seekers:

Alice: Company A, Company B, Company C, Company D
Bob: Company B, Company C, Company A, Company D
Claire: Company C, Company D, Company A, Company B
David: Company D, Company A, Company B, Company C

Preferences of companies (based on the score we have calculated in the first part)

Company A: David, Alice, Bob, Claire
Company B: Bob, Claire, Alice, David
Company C: Claire, David, Bob, Alice
Company D: Alice, David, Claire, Bob

Using the Gale-Shapley algorithm, the job seekers and companies will be matched based on their preferences. The algorithm proceeds as follows:

Initially, all job seekers and companies are free and unengaged.
Each job seeker proposes to their most preferred company. In this example, Alice proposes to Company A, Bob proposes to Company B, Claire proposes to Company C, and David proposes to Company D.
Each company reviews the proposals and tentatively accepts the most preferred job seeker. In this example, Company A accepts David, Company B accepts Bob, Company C accepts Claire, and Company D accepts Alice.
If a job seeker is rejected by a company, they move on to propose to their next preferred company. Bob, who was rejected by his first choice (Company B), proposes to his second choice (Company C).
The process continues with job seekers proposing to their preferred companies and companies tentatively accepting or rejecting the proposals based on their preferences.
The algorithm repeats the proposal and acceptance/rejection process until all job seekers are matched with a company.

In this example, the final matches are:

Alice is matched with Company D
Bob is matched with Company C
Claire is matched with Company B
David is matched with Company A

The Gale-Shapley algorithm ensures that the matches are stable, meaning that there are no two job seekers and companies who would both prefer each other over their current matches. It provides a fair and efficient way of matching job seekers with companies based on their preferences, increasing the chances of successful job placements.

For more intuitive research and example, please check out this short video: https://www.youtube.com/watch?v=0m_YW1zVs-Q&ab_channel=SamuelWhitehouse

Apply to our use case

What if a company named KD wants to recruit Data Analyst (Job 1), Data Scientist (Job 2), and Machine Learning Engineer (Job 3), which share many same skillsets and there are many candidates who applied for at least 2 out of 3 jobs above? Or we can say that our use-case is to rank candidates with the jobs of their preferences in case they are interested in more than 1 job post.

Implementation

Assumption: The original Gale-Shapley algorithm assumes an equal number of participants on both sides of the matching problem (like 4 jobs and 4 candidates only), but this assumption rarely holds true in the actual problem.

Therefore, we need to modify our algorithm to handle this imbalanced situation.

Business rules:

01. 1 job has many CVs applied to
02. 1 CV can apply to many jobs
03. The number of Jobs is smaller than the number of CVs
04. 1 job we want to match with k out of all CVs based on the scores produced from Similarity using Doc2Vec, assuming that we already had the scores of each CV to a Job. This is also called Job_preference
05. Each CV has its preference list of jobs as well

Inputs:

A list of jobs:

jobs = ['Job A', 'Job B']

A list of all available CVs in the database:

cvs = ['CV 1', 'CV 2', 'CV 3', 'CV 4', 'CV 5']

A dictionary of Jobs and relevant CVs with descending similarity scores acquired from the first part.

job_preferences = {
    'Job A': ['CV 1', 'CV 3', 'CV 4'],
    'Job B': ['CV 2','CV 5', 'CV 3']
}

In this example, Job A has 3 candidates who have applied for the job, and among 3, CV 1 is the best one with the highest score, then following by CV 3 and CV4.

A dictionary of CVs and prioritized Jobs that candidates have applied:

cv_preferences = {
    'CV 1': ['Job B', 'Job A'],
    'CV 2': ['Job B'],
    'CV 3': ['Job A', 'Job B'],
    'CV 4': ['Job A'],
    'CV 5': ['Job B', 'Job A']
}

An integer represents a number of filtered candidates:k

k=2

Let's assume that for each job, the employer wants to select the 2 best candidates for Hiring Managers to review.

Code

def gale_shapley_matching(jobs, #A list of Job ID / Job Name
                          cvs, # A list of CVs
                          job_pref, # A dictionary of Jobs and according CVs with descending similarity scores acqured from the first part
                          cv_preferences, # A dictionary of CVs and Jobs in priority that candidates have applied
                          k): # A number of candidates that employer will sort out at the end of this phase
    # Create dummy jobs
    # Because we have fewer jobs than CVs (rule 03), we need to create dummy jobs to balance the data
    dummy_jobs = ['Dummy Job {}'.format(i) for i in range(len(cvs) - len(jobs))]  # Generate dummy job identifiers
    jobs += dummy_jobs  # Append dummy jobs to the list of jobs

    # Initialize all participants as free and unchosen
    free_jobs = set(jobs)  # Create a set of all jobs, initially all are free
    # The set() function in Python is used to create an unordered collection of unique elements
    # Dictionary to store the final job-CV engagements
    job_engagements = {job: [] for job in jobs}
    # Number of available slots for each job
    job_slots = {job: k for job in jobs}

    while free_jobs:
        job = free_jobs.pop()
        # Skip dummy jobs
        if job.startswith('Dummy Job'):
            continue
        # Get the preferenced CV list of the current job
        job_preferences = job_pref[job]
        # Filter preferences to exclude unavailable CVs
        job_preferences = [cv for cv in job_preferences if cv in cvs]

        for cv in job_preferences:
            if cv in cvs:
                if job_slots[job] > 0:
                    # Assign the CV to the job
                    job_engagements[job].append(cv)
                    # Remove the CV from the list of available CVs
                    cvs.remove(cv)
                    # Reduce the available slot count for the job
                    job_slots[job] -= 1
            elif cv in dummy_jobs:
                # Assign the dummy CV to the job
                job_engagements[job].append(cv)

    # Remove dummy job engagements
    job_engagements = {job: cves for job, cves in job_engagements.items() if not job.startswith('Dummy Job')}

    return job_engagements

Output

As we can observe, Job A will pick CV 1 and CV 3, while Job B will do CV 2 and CV 5. CV 4 is the one eliminated from this round.

Please find the code here on my Github: https://github.com/kirudang/CV_Job_matching_2

Thank you and let me know if you need to discuss anything :)