Deploying Job Recommender Systems in Production

Joel Foo
DSAID GovTech
Published in
7 min readDec 8, 2021

Contributors: Joel Foo (GovTech), Jack Ong (former GovTech)

In the first two posts of this series, we shared about the problem scope and product design of JumpStart. This AI platform aims to tackle diverse problems in the job ecosystem. Also, we went into detail about how we used content and collaborative-filtering based methods to create a job recommender system.

In this final article, we explore the use case of a job recommender system and discuss how the models are deployed in production.

A simple interface for our users

“Using” data science is not that easy. There’s a lot of work involved in:

  • Building up data ingestion and engineering pipelines
  • Setting up analytical environments and specialised infrastructure like cluster computing
  • Building high quality Recommender Systems
  • Monitoring and improving these ML models
  • Designing deployment systems

As an AI platform, JumpStart abstracts away all this work, allowing teams to add data science capabilities to their product easily. All they have to do is to call our API.

But to ensure that other teams get the best experience using our platform, we have to ensure that our services remain reliable, resilient, and user-friendly, no matter how complicated things become under the hood.

Designing a good production system

In an ideal world, there are several goals that a good production system should concurrently maximise.

  1. Ability to accommodate models of different complexity: Using state-of-the-art techniques could perform better, but is often more complicated and harder to deploy.
  2. Low latency: The time taken from receiving a request to returning the recommendations should be sufficiently short (< 300ms), especially when JumpStart’s users are other systems. Low latency ensures that the end-user experience, e.g. jobseekers on MyCareersFuture, will not experience bottlenecks due to JumpStart’s recommendation engine.
  3. Ease of deployment/update: Recommendation models can degrade over time as user behaviour changes, or new data is made available. Thus, the architecture must be able to support easy retraining and updating of models.
  4. Ability to A/B test models: When a new model is developed, it can be rigorously tested offline using historical data, but actual performance might still differ. Thus it is important to be able to A/B test the different models.

In reality, these goals often conflict and create trade-offs that must be managed. In this post, we discuss how we handle such trade-offs in a case study of our job recommendation service.

To provide a high level overview: although we simplify the interface so that our users only see one service endpoint, our job recommendations are actually served by a hybrid model (read more in our data science post) consisting of 3 different models. In turn, each model has its own flavour and characteristics that require different deployment setups.

Service endpoint served by a hybrid model consisting of 3 different models.

A brief summary of the 3 models that make up our hybrid recommender system

  • Application-based / Views-based; Modelled using collaborative filtering techniques which take in a user’s historical activities as input, then look for similar users, and recommend jobs that these similar users have applied/viewed.
  • Skill-matching; Modelled by representing users/jobs as vectors of their skills and calculating cosine similarity between them.

Application/View-based models

A challenge with collaborative filtering models is that they are computationally intensive. To illustrate, let’s consider how a live application-based model would work in an API server:

  1. The user-item (or in our case, user-job) matrix has to be constructed, and this matrix can be huge. For example, a user-job matrix based on 14 days of application events is about 40K rows and 38K columns in size.
  2. Upon receiving the input (a user’s past applied jobs), the API server will need to search for similar users by calculating similarity scores between the input and every row in the matrix, then the rows have to be sorted to find the closest similar users. This is often the bottleneck as a lot of computation is required, e.g. 40K similarity scores are computed, then sorted.
  3. Furthermore, the computation will require the whole matrix to be in memory, requiring better hardware specifications of the API server, e.g. more RAM. This will incur more cost, and updating/uploading a new matrix is likely to require a server restart. Other alternatives could be to use external storage, but overall speed for generating recommendations will be compromised.

The above makes it impractical to retrieve recommendations from the model whilst ensuring low response times. Therefore in JumpStart, recommendations are generated offline for every user in the matrix, and then stored as key-value (user to recommended jobs) pairs in a database for the API server to access.

Such a deployment model has the following advantages:

  1. The expensive pre-computation and offline generation of job recommendations can be done in a cluster. As the number of users and activities increases, more nodes can be added to the cluster, making it scalable.
  2. When the key-value pairs are stored in in-memory databases such as Redis, the retrieval is very fast and the API server only needs to know the user’s ID in order to retrieve and return job recommendations.
High-level overview of the architecture for deploying our application/view-based models.

However, the key-value pairs could very quickly get outdated as new users/jobs come in or when user preferences change, resulting in degradation of model performance over time. To mitigate this, we retrain the model daily to generate new and latest job recommendations, then overwrite them in the Redis database. It is also possible to increase the frequency to, for example, twice a day, or even hourly, but there would be diminishing returns as user preferences are unlikely to change frequently or abruptly.

Skill-matching model

The skill-matching model takes in a user-provided list of skills as input and looks for jobs with the highest similarity in terms of skills to recommend. As there are thousands of skills and a user could select an arbitrary number of skills, the number of possible combinations of skills is too high for recommendations to be generated offline.

Thus this model has to be deployed as a live inference model. To achieve this, we leverage on the SageMaker service provided by AWS. The components of this model are as follows:

  • A cleaning pipeline: To clean the user-provided list of skills to conform to the skills vocabulary of the model.
  • A TF-IDF vectoriser: To convert user skills and job skills into vectors so that mathematical computation of a similarity score is possible.
  • A search index: After a job has been vectorised, it needs to be stored in a search index that is optimised for computation of similarity scores. This is also where recommendations are retrieved from.

After training, the model is then deployed on SageMaker using tools such as MLflow. Our API server will then be able to access the model endpoint to retrieve predictions:

A) When the API server receives a user-provided list of skills at inference time, the list of skills is passed to the SageMaker endpoint hosting the model.

B) Within the SageMaker container:

  1. The model will clean the list of skills using the cleaning pipeline.
  2. The cleaned list of skills is vectorised.
  3. The vector is then used to search for recommended jobs in the search index.
  4. The jobs with the highest scores are returned

C) After receiving the predictions, the API server will send a response to the client.

However, the model laid out above was too slow in generating predictions — the cleaning pipeline was the bottleneck. The cleaning pipeline performs several steps such as stop-words removal, lemmatisation, n-gram generation, etc. which took a relatively long time. Knowing that the universe of skills is rather static (new skills don’t get added into MyCareersFuture’s taxonomy at a high velocity), we tackled this by generating a “raw skill”-to-“cleaned skill” lookup table at training time, and replacing the cleaning pipeline with this lookup table. By doing this, the user-provided skills can be “cleaned” very quickly by simply retrieving the cleaned version in the lookup table.

Another complicated component is the search index. Typically one would use storage that is optimised for search, such as ElasticSearch, to store the vectors of the jobs. However for our job recommender, we decided to store the search index in the model itself as a property to reduce the overhead of maintaining an ElasticSearch cluster.

To summarise the above, and better illustrate how the different components are consolidated into the model, here is a short code snippet on how the model’s structure would look like:

Other than using SageMaker, it is also possible to spin up another API server, or load the model into the existing API server to deploy it. However, we decided to use SageMaker as it is a native service provided by AWS specifically to host ML models. Its compatibility with the open-source MLflow library has also made it much easier to maintain and update rather than a bespoke API server.

High-level overview of the architecture for deploying our live skill-matching model.

Conclusion

The JumpStart platform aims to provide ML services across diverse use cases to help address problems within the job ecosystem. But even a single service may be served by several data science models. As each model has its own complexity, different deployment setups are required to maximise the intended outcome. Thankfully, we abstract away these intricacies for our users by maintaining only one endpoint per service.

In designing our architecture, we have also sought to balance the different engineering goals, such as the ability to host different kinds of models, low latency, low maintenance overhead, etc. through creative and optimised use of cloud native services. Nevertheless, we acknowledge that this architecture is likely to evolve and improve as our product matures, and that there is much to be learnt and done.

We hope you’ve enjoyed reading about our work, and if you have any feedback or suggestions, let us know what you think! We’d love to hear from you.

P.S. Our team is currently looking to hire across all roles! If this work sounds interesting to you, check out our job postings here or reach out at recommender@dsaid.gov.sg

--

--