Generating Job Recommendations for Jobseekers on MyCareersFuture

Contributors: Fred Teo (GovTech), Clarence Ng (GovTech), Elizabeth Lim (GovTech), Lua Jiong Wei

In our previous post, we shared how the Jumpstart platform aims to tackle diverse problems in the job ecosystem through data science solutions deployed in a microservice architecture. In this post, we will do a deeper dive into a specific use case: how we use data science to improve the job matching process on the MyCareersFuture job portal. As MyCareersFuture already has native search functionality, our aim is to provide job recommendations to jobseekers on the platform — specifically, jobs that the users might not explicitly search for but might be suitable for.

In this post, we will cover our challenges and considerations in the areas of:

  1. Data sources
  2. Modelling approach
  3. Deep dive into each of the models: Skills-matching, Views-based, Applications-based
  4. Model evaluation and testing

In the next post of the series, we will discuss the engineering involved in deploying services like this job recommender.

Not all data are created equal

It will be helpful to first run through the kind of data sources we have access to in modelling job seekers’ preferences (i.e. the jobs that they are interested in and are a suitable match for).

Let us consider this from the perspective of a new user of the MyCareersFuture platform. When the user first enters the website and makes a search query, they are prompted to select at least 5 skills they possess from a provided list of skills. They may also later add or refine their selected skills while browsing the website.

Screenshot of skills tagging system on MyCareersFuture

As users browse the website, we log clickstream data about their interactions with the platform — for example, what search terms they use and which job postings they choose to view and apply for. These job postings are rich sources of information, capturing data such as job titles, job responsibilities, salaries and required skill sets. Users might also upload their resumes onto the platform, providing us additional content about their job and education history.

While there are many options, not all of them are created equal. In particular:

  1. Content data might be inaccurate or imprecise: Job postings and user profiles are manually keyed in by users, which may result in inaccuracies.
    E.g. employers sometimes include industry buzzwords (e.g. “Knowledge of Big Data”) which are not necessarily requisite skills for the actual job (e.g. Business Analyst)
    E.g. A single skill term may have a wide range of substantive meanings across different industries (e.g. Management).
  2. Specific challenges with clickstream data: Clickstream data tends to be highly sparse (i.e. many jobs may never be viewed or applied for before), exhibit a long-tailed distribution (i.e. a few superstar, popular jobs may have thousands of views and applications while some cold jobs might not receive any applications), and may be entirely incoherent (e.g. views by web crawlers).
  3. Varying ability to signal jobseeker preferences: An application for a job is considered a stronger indication of interest as compared to simply viewing it.
  4. Varying coverage of data sources: While most users are likely to have made a search or viewed a job, not all might have applied or uploaded profile information of their past employment or education. This will limit the size of our dataset and the number of users we are able to make recommendations for.

A hybrid approach (Different models for different folks)

As we’ve explored in the previous section, the data sources that we have access to differ in their intensity, quality, type and coverage. This has implications for our approach to generating recommendations. Limitations in data source coverage mean that a model requiring all available variables as input would only apply to a limited subset of users. Also, different types of data might be best modelled with different modelling approaches. Hence, there are limitations in what we could do if we limited ourselves to a single model for generating recommendations.

MyCareersFuture’s Job Recommendations are generated using a hybrid approach, where multiple models are built and then mixed to generate the eventual recommendations for the user. For example, if a user has added skills to their profile, and has viewed some jobs, but made no applications, we generate recommendations based on a skills-based model and a views-based model. The outputs from both models are then combined based on the historical performance of each model to generate the final recommendation. We found that the views-based model had better performance in practice, so we show more recommendations from that model. In this way, the hybrid approach allows us to address gaps in the availability of data while still prioritizing better-performing models.

Combining the outputs of different models allows us to cover all jobs & most users

Model 1: Skills-Matching Model (Content-based filtering)

What’s a prerequisite for getting the job? Apart from passion and interest, quite often it also involves having the right experience and skills. Given that the MyCareersFuture job portal is organised around the concept of using overlapping skills to find suitable jobs, we created our first model with skills data. In a nutshell, we generate skills-based representations of jobs and users, and compare their similarities to find the best matches.

This model is an example of a content-based filtering system, because we aim to generate recommendations by directly comparing the features (i.e. skills) provided by users and jobs. As this model does not require that the user have any past job views or applications, it has the benefits of having higher coverage.

To begin, we first look at some examples of how skills are tagged to jobs. In the table below, we have highlighted some select skills for 3 roles.

By looking at this example, we observe several types of skills:

  1. Specialized and domain-specific skills e.g. DCF, SEO, Big Data
  2. General and common skills (underlined) e.g. Communication, Excel
  3. Transferable skills: e.g. Data Modelling & Analysis

The third category is the sweet spot where we can use a skill that is meaningfully transferable across a wide range of job roles and industries, to suggest roles that expand the range of search for the jobseeker. These are jobs that the jobseeker might be suitable for (due to skill overlaps) but may not think of searching for. We also want to ensure that we avoid relying too heavily on skills that fall into the second category, as they might not be specific enough.

In technical terms, the model works by computing a mathematical measure of similarity between the user’s skills and the skills from the job postings, returning those jobs with the highest degree of similarity. To compute measures of similarity, we have to convert the user-provided and job-postings skills into a vector and matrix representation respectively, which would allow us to calculate the similarity between a single user and the inventory of jobs on the portal.

In the jobs matrix, each column corresponds to a skill and each row represents the skill requirements of a job. Entries are non-zero when a job requires a particular skill. To create the jobs matrix, we:

  1. Process the skills: Every job posting has a job description and employer-tagged skills. We parse through every job, perform the necessary pre-processing (e.g. removal of stopwords), and extract skills from the job description and those tagged by the employer. This extraction is done with reference to a canonical vocabulary of skills.
  2. Generate a document-term matrix: In our context, a document is a job posting, and a term is a unique skill. The matrix counts the number of times each skill appears in each document.
  3. Perform term-frequency inverse-document frequency (TF-IDF) weighting of our matrix. As explored earlier, niche skills are more likely to help jobseekers stand out, whereas generic skills provide less differentiation. The TF-IDF algorithm considers the relative frequency of skills across the entire inventory of jobs, weighing niche skills higher and generic skills lower. This helps to create a more accurate representation on the importance of specific skills for the job.

Each user can be represented by a row vector based on the skills they provide through a similar process: creating an indicator vector of the skills (in the same dimension and order as the jobs matrix) and applying the TF-IDF weights generated in step 3.

Finally, to make a set of recommendations, we compute the cosine similarity between the user vector and the jobs matrix, and return the jobs with the highest degree of similarity.

Model 2: Views-based Model (Collaborative Filtering with Singular Value Decomposition)

Relying only on skills as signals for job matching is, nonetheless, imperfect: Jobseekers and employers may not always provide a complete or entirely accurate skill profile on the MCF platform. More importantly, in determining a job match, there may be other relevant factors beyond skills. For example, some jobseekers have preferences for roles at certain types of companies (start-ups versus large multinational corporations), or salary expectations, which skills alone cannot model.

We could expand our content-based model and add these new factors in, but collaborative filtering methods provide a simpler approach. In particular, we developed a views-based, collaborative filtering model, which generates recommendations for users based on their job-views as a signal of their preferences.

The intuition behind the model is that jobseekers who view the same set of jobs are likely to have similar job preferences. But how exactly do we model and represent these preferences? We could start with a views matrix, where the matrix at row i and column j would be 1 if a jobseeker i has viewed job j. To generate recommendations for jobseeker i, we can then find other rows (other jobseekers) in the matrix that have high cosine similarity to row i, and then recommend jobs viewed by those other jobseekers to jobseeker i.

However, due to the high dimensionality of the matrix (number of users multiplied by number of jobs), generating recommendations using the original views matrix would be too computationally intensive to complete within reasonable time. We get around this problem by using singular value decomposition (SVD) to generate a low-rank approximation of the views matrix. This approach works based on the assumption that most of the variance in the views-matrix can be explained by a smaller number of latent factors (which in practice could be things like skills, industry, qualification requirements, working hours etc.). We can then use SVD to approximate the original views matrix in a latent, lower-dimensional space. Computations can then be run on this smaller matrix instead to reduce the time required for generating recommendations.

Illustration of how a views interaction matrix is modelled as a product of a set of user and product latent vectors (image taken from Google’s Developer Documentation)

However, when conducting sanity checks on a sample of recommendations, we found that the model tended to generate nonsensical recommendations for users with too few view events. For example, for a user who viewed only one job titled “Dentist (General)”, the model generated recommendations such as software engineering, business analyst and human resources associate. On the other end, the model also performed poorly on users who had viewed many jobs that were seemingly random and unrelated. We suspect that these users could have been recruiters or web crawlers.

To overcome this problem, we imposed two additional constraints for our recommendations:

  1. Minimum and maximum number of views required before a recommendation is made: The SVD model was only trained on users who had viewed 5 to 50 events in the preceding 7 days.
  2. Minimum level of similarity between previously viewed job titles & recommended job titles: We used the Levenshtein Distance, which computes the minimum edit distance between two strings. However, we were also mindful that if the similarity threshold was set too high, it could reduce the coverage of the model and harm the diversity of job recommendations eventually served.

The threshold choices for the 2 above constraints can be thought as hyper-parameters of our model that can be tuned over time or through A/B testing.

Model 3: Application-based Model (Collaborative Filtering)

The key idea behind our application-based model is that job applications also implicitly reveal preferences and skills — users who apply for the same job are likely to have a degree of similarity.

However, applications are likely to be a much stronger signal than views. Some jobseekers may be unsure of their preferences in the first place, and may browse widely to first understand the market. Alternatively, others may realize that they are not interested or do not meet the requirements only after reading the details. A view is “low-effort” relative to an application: viewing simply requires a click of the mouse but an application may require tailoring one’s resume, answering supplemental questions or even writing a cover-letter. A collaborative filtering model based on past applications could hence generate sharper recommendations for users.

The downside of this application-based model is its limited coverage — it can only generate recommendations for users who have recently applied for a job. However, because the size of the problem has scaled down considerably, we can compute recommendations without the need for any dimensionality reduction. We see this model outperforming the views-based SVD model in practice, possibly because of the increased veracity of the signals and computational approach.

A simple illustration: Jules is assumed to be similar to Emma because he applied for 2 / 3 of the jobs which Emma applied for, and thus, could be recommended to apply for Job 3.
(
Photo credits: Welcome to the jungle)

To make predictions on any given day, we look to the past 2 weeks of applications. This gives us a good balance between sufficient sample size and recency. From this data, we create a user-job application matrix. Element (i,j) of this matrix will be 1 if user i has applied for job j before, and 0 otherwise. We then apply a user-based collaborative filtering model, which:

  1. Computes the similarity between users based on their job application history (user-similarity scores)
  2. Uses the user-similarity scores to weight the user-job application matrix to create an estimated score of user i for job j
  3. For each user, finds the jobs with the highest estimated score which the user has not already applied for and recommends them

Challenges in Model Evaluation and Testing

In testing adjustments to our models, we conduct randomised online A/B tests. This means that one group of (randomly assigned) users will be exposed to the existing model while the other group will be exposed to the new model.

Conducting such tests raises non-trivial questions requiring consensus with other stakeholders: What sort of risks are we willing to take in testing models when a “bad” model may compromise a jobseeker’s employment outcomes? Should we slow down our rate of A/B testing given the economic impact of COVID-19? What sort of offline testing should we do to be confident that our new model is an improvement? As a data science team working on services used by other stakeholders, model testing cannot be conducted in a silo.

In addition, the choice of evaluation metric isn’t straightforward. For Netflix, evaluating their algorithms’ performance might mean measuring the number of shows started, or time spent on the platform. For Amazon, performance might be measured by the average order value or revenue per user. Such metrics are typically measurable in real-time.

For our jobs recommendation service, our desired performance measure is the number of actual job placements (i.e. jobseeker gets a job). However, due to natural lags in the hiring process and inherent difficulty in getting employers to close the feedback loop, the pace of feedback from tracking employment outcomes is far too slow relative to the pace at which we have to test and push improvements.

Thus, for model development and A/B testing, some of the intermediate metrics we monitor are the click-through rate (# clicks / # recommendations) and application rate (# applications / # recommendations). Nevertheless, we regularly review the placement rate to ensure that the recommender addresses the initial goal of helping jobseekers get matched.

Conclusion

Recommendation systems is a wide and exciting data science field with many algorithms and solutions (content-based, CF-based, deep learning, among many). They are also inherently difficult to build and test due to the unsupervised nature of the problem. For us, this has been compounded by the extreme time lag and difficulty in unearthing true job matching outcomes.

We have found that a hybrid-based approach works best to balance between coverage and accuracy (user experience). And to work and iterate fast, we have developed reasonable intermediate metrics to guide our development and A/B testing.

The development of a good model is our bread and butter, but the successful deployment of the model is crucial in ensuring that it creates real impact for users. In the next post, we will discuss how our software engineers help to deploy these models effectively in a low-latency production environment.

We hope you’ve enjoyed learning about our work. If you have any feedback or suggestions, let us know what you think! We’d love to hear from you.

P.S. Our team is currently looking to hire across all roles! If this work sounds interesting to you, check out our job postings here or reach out at recommender@dsaid.gov.sg

--

--