The moat between notebooks and real-time prediction: Our experience building production machine learning at Zenjob

Published in

Zenjob Technology Blog

8 min readNov 29, 2023

Close up of a person looking at code on a computer screen. — Photo by charlesdeluvio on Unsplash

Whether or not we’re aware of it, machine learning has become a part of our digital lives — from our phones trying to find which app we might want to use at a given time, to social media predicting which content we might interact with the most.

So what is machine learning? In a meta move, I asked ChatGPT, a machine learning algorithm, to describe what it is. Here’s what it came up with:

Machine Learning is like giving a computer the superpower of intuition — it’s the art of training a digital mind to see patterns in data, make predictions, and learn from its mistakes, all without ever needing a pair of human eyes.

Our mission and machine learning

At Zenjob, our mission is to enable part-time jobbers to decide when, where, and how to work with just a few clicks. We’re always looking for ways to improve on that promise, and for the past couple of years, we’ve been experimenting with using the power of machine learning (ML) to not only automate and improve our internal processes, but also to drastically improve our B2B and B2C sides. I, along with other Data Scientists at Zenjob, have led the effort in introducing ML in our production workflows, working together with some amazing people from the Backend, Platform, and Data teams to make this magic possible.

Last year, we started the work of applying ML techniques to our job-recommendation algorithm. With hundreds of thousands of job offers seen by tens of thousands of users every month (whom we refer to as Talents), we had enough scale and data to train our models and figure out what our users like the most.

That said, we still had some big challenges, including a huge class imbalance in our training data — out of all the jobs the users see, they only decide to apply to a small portion of them. Additionally, there are significant seasonality changes in the job market, there are outside economic factors affecting the types of jobs available, and more. All of these things are related to the actual model-building part. For me, however, the biggest hurdle was bringing our solutions to production.

So in this blog post, I’ll give a little behind-the-scenes look at how we brought job recommendations to life as one of our first production ML use cases. The goal was to harness the predictive power of computers to learn from the past behavior of our Talents and serve them jobs that they like the most or are most suited for.

Background

I have a Master’s in Data Science, so I came into my current role with a decent amount of experience in feature engineering and model building — from linear classifiers for predicting price and churn, to neural networks for image classification.

Yet, during my academic life, I never had to deploy a model to an environment even close to production. Most of the work we did inside classrooms was inside Jupyter Notebooks. For people new to Data Science, Jupyter Notebooks are interactive coding web applications that can contain live code, equations, visualizations, and narrative text.

*A Jupyter Notebook cell showing how the process is much more complicated than just a deploy command.*

For a production environment, the model is just one part of the equation. Behind that, we need a whole system that makes it possible to get the required features, making them readily available in a preprocessed format, and allowing the model to use them and serve predictions instantaneously — a resilient, low-latency system capable of scaling to thousands, and even millions, of users. This is MLOps.

MLOps or ML Ops is a paradigm that aims to deploy and maintain machine learning models in production reliably and efficiently.
- Wikipedia

Looking back now, I can see how building a model in the safe confines of the rectangle cell of a Jupyter Notebook is just the first step, and one of many.

Goal

Our goal was to create a model that predicts which jobs our Talents like the most and serves them as a separate section in the feed.

An example job feed showing recommendations — *Image: Zenjob Design Team*

Most platforms today already have recommendations of some sort, either driven by an ML model or by some ranking algorithm that tries to find the best content for you. The idea behind it is to help the user find the content that they like, easily and quickly.

Simple, right?

System design and tooling

Note: For a production scenario, we need the model-caller (the service that calls a model for predictions) and the model-server (the endpoint that serves the model). We’ll refer to the service calling our model for predictions as the backend and the model-server as the model endpoint.

Training the baseline model

To tackle the severe class imbalance, we tried a couple of different approaches — from linear classifiers to neural networks. In the end, we settled on using XGBoost — which is a highly efficient, flexible, and portable machine learning model — to classify jobs into two categories: conversions and non-conversions.

Then, based on the model’s result, we ranked jobs and decided which jobs to show to Talents. This approach gave us a good balance in predicting which jobs our Talents would apply to while simultaneously weeding out unlikely job matches.

*A Jupyter Notebook cell showing an abstracted view of how model training looks.*

Deploying and serving the model

We started simple by setting up a Flask server, a micro web framework written in Python, that serves the model and exposes a REST API, allowing any other service to query it directly with the required model features and get a prediction. In our case, the prediction is the probability that a user will apply to a given job. We then containerized it with Docker and deployed it as a node inside our Kubernetes cluster.

In subsequent iterations, we redesigned this process by hosting the model as a SageMaker endpoint for improved model delivery. This also resulted in a significant boost in response time.

*A Jupyter Notebook cell depiction of how serving a model would look.*

Getting the features

Now comes the most complicated part — getting the latest features to the model at the time it receives a request. When designing and training an offline model, it’s a pretty no-fuss process — you have access to a database; you query the features; you do some preprocessing; you split it into train, test, and validation sets; and you’re done.

But in real-time scenarios, we had a couple of choices, with their respective pros and cons, outlined in the table below.

Scenario 1

How does it work?
The backend sends the already-preprocessed features in the correct format, the model-server receives it, it scores it using the model, and poof, it’s done! 🪄
Pros
This is the easy, no-nonsense solution that shifts all the responsibility to the backend service.
Cons
This is inflexible. As Data Scientists, we like to experiment and iterate frequently, changing the features and/or the preprocessing techniques to continuously improve a model. This scenario slows down and/or severely hinders the ability to do that, specifically in our case where the backend is written in Java and isn’t suited to the Python-friendly preprocessing techniques and libraries.

Scenario 2

How does it work?
The backend sends the identifiers or the raw features available at runtime. For example, in our case, for all the available jobs, we can easily get the date and time of the jobs available, the ID of the category the job belongs to, the ID of the user we want to serve recommendations to, and so on.
The model-server uses these identifiers to prepare the relevant features for that particular job category, job company, user, etc.; preprocesses them in a format that the model requires; gets the prediction scores; and then sends them back.
Pro
Very flexible approach, allowing us to quickly add/remove features, as well as change the preprocessing techniques without a significant engineering effort on the backend. Also makes it easy to test different models with different features/preprocessing techniques.
Cons
Extra work for the model-server, including implementation of a feature-fetching system on the model side.

Note that, in both cases, we still need a process that runs on a schedule, preprocesses or calculates the features, and stores them somewhere for our backend/model-server to fetch with low latency.

We went with Scenario 2, opting for the more flexible yet more-difficult-to-implement approach. Here’s how we got the features to the model:

A daily job fetches features from the database, processes them, and aggregates them on a user level.
The processed features are then fed into SageMaker Feature Store, a fully managed, purpose-built repository for storing, sharing, and managing features for ML models. We utilize the online storage functionality for real-time inference.

A Jupyter Notebook cell depicting an abstracted version of what fetching a feature looks like.

Prediction

In terms of the end goal — predicting which jobs a user would like to apply for — how does it work?

Well, when a user opens the app, we fetch all available jobs for them, prepare their features, score them using the model, and then show them the result, which is the jobs they’re most likely to apply to.

*A Jupyter Notebook cell depicting what Recommended Jobs would look like in a cell.*

The whole process needs to be quick to make sure the user doesn’t face any lag. Right now, it takes around 100–150 ms, with a lot of room for improvement. One key area where we can cut down this time more is by eliminating the need for the middle-layer (the model-server) and shifting all the data fetching and data processing to the model inside SageMaker.

Conclusion

During the implementation process, we learned a lot and gathered great insights, and we saw that Talents interacted with jobs recommended by the model much more than with other jobs. More specifically, Talents who opened these offers were twice as likely to apply. We’re still experimenting with this feature — from tweaking the model with different feature-engineering and model-training mechanisms, to changing the number of recommendations shown in the feed to figure out the best fit — so it might not be available to everyone yet.

Last but not least, we now have a blueprint and a system in place for implementing ML in production, which we now plan to continuously iterate on and implement for other use cases as well. Hopefully, you’ll soon see another blog post from us Data Science folks here at Zenjob about the next challenge we decide to tackle.