Deploying our first data science model for job feed at scale at apna

Published in

apna-technology-blog

8 min readJun 28, 2022

By Deval Sethi, Himanshu Solanki, Raviteja Meesala

At apna, we empower the rising workforce of India by connecting them to life-changing opportunities through our job marketplace and community. Customer obsession is one of our core values, and that drives everyone at apna to work hard to deliver an experience that delights our candidates and employers on the app. In this blog, we will share how we leverage ML to bring our mission to life.

The Problem Statement:

The problem of relevancy: Candidates come to apna looking for a job opportunity through which they can earn their livelihood. Likewise, employers want a pool of candidates for their job postings that meet their needs. As a technology organization, our goal is to increase the relevance on both sides, i.e. helping candidates find relevant jobs that meet their interests and skills, while at the same time helping employers find the most relevant candidates faster.

The problem of scale: With the exponential growth of Apna’s active users (> 22M) and the active jobs (>200K) on the platform, discovering the right opportunities becomes a challenge. Serving a job feed at a scale that is personalized and relevant to users becomes a necessity.

Approach:

To address the above problem statement, we built the “Impression to Lead” model, which predicts user conversion from users’ past interactions with jobs as well as job and user attributes at city and category levels. This enabled us to solve the problem of relevancy.

In the earlier version of the job feed, we used to compute relevant and eligible jobs for the user on the fly, but this approach was not scalable. In the new version, we have shifted to a pre-computed feed in which we precompute the job predictions through the model and serve them as soon as the user requests the jobs on the app. This approach enabled us to serve the jobs at scale and solve the problem of scale.

Challenges we faced:

Determining the grain of the features to use in model
Scaling our inference service.
Choosing the right data store for storing the model predictions.

A high-level overview of the Impression to lead forecasting model:

The ‘Impression to Lead’ model is an AI model that aims to predict how likely a candidate will apply to the job. In technical terms, we try to predict the probability of application action by the candidate on any given job.

Mathematically,

P (user, job) =?

Labelling of actions for model:

Viewing Only: i.e only impression and NOT applying to a job implies a negative behaviour.
Lead: When a candidate applies for a job, they become a lead for that job, this behaviour implies a positive behaviour for the model.

User behaviour for a job affects model scores. — User behaviour affects model results.

The model:

We built a classification model which uses the user’s and job features and job’s historical features (past activity) to predict the likelihood of the user applying to the job and getting selected for the interview.
In other words, “Out of all the jobs that the user sees, what are the types of jobs the user will apply to.”
We use BigQuery as our data warehouse to generate our training data. We used BigQueryML — XGBoost Classifier to train the model on massive training datasets.

User v/s User Cohort:

Instead of going for user-level predictions, we decided to go with user-cohort (treating a group of users) level predictions as the starting point for our v0. This helped us lower the storage size and processing power by ~750X without significantly impacting the model outcomes. This approach also helped us solve the “cold-start” problem as a new user can be mapped to the user cohort and we can recommend the jobs based on that cohort.

Offline Evaluation:

We took a week’s data and passed it to the model for predictions. Later, we compared the results with our existing logic in production and several other baselines (popularity, CTR, etc.). In terms of recommendation metrics(top@k and Map@k), we found that there was a 50% increment in these metrics compared to other variants.

Features:

We have used 4 types of features for our model:

Job Attributes like job_category, no_of_openings, salary, etc.
Activity features for the job like impression_to_applied, view_to_applied, etc.
User Attributes like Gender, Experience, Education, etc.
User X Job Attributes: The features related to the user and job both like: distance.

The launch!

System design and architecture or precomputed feed:

Once the machine learning model was ready the challenge was to deploy this precomputed feed and model in production. Here is the approach we took:

Data preparation:

Our daily Airflow jobs:

Triggers queries on BigQuery to create intermediate view tables for faster data access
Prepares and prepopulates our feature store which is used by the inference service.
Creates DataProc Spark cluster and launches the spark-submit job.
Runs data quality checks once the whole job is complete.

Serving model (Inference-service):

Approach: We used an online python-based inference service (microservice) which acts as an ML engine for our various products. Different downstream services request this service to get the predictions.
Model Hosting: Due to the lightweight nature of ML models, we store the models in memory in the service for faster predictions.
Feature store: We chose Redis as our data store for features required by the model, as it is an excellent key-value store database with highly low latent reads.
Deployment: We used tenet-based deployment to host different models in this service to get away from the “noisy neighbour” problem and ensure that we can provide proper infrastructure for the model. This ensured reliability across different use cases even if one model deployment fails.
Scale: We pre-scale the inference service to cater to the barrage of requests by our Spark application. The service supports querying for inference data in a batch mode to help optimize for network overhead. Currently, we hit the service at 200 batched RPS with each batch containing 5000 combinations of (user, job) constantly for 10 mins to fulfil service level agreements (SLAs) with each request taking around a latency of 500ms (P95) to complete.

Storing the model results:

Requirements: Following are the non-functional requirements (NFRs), which we wanted in our choice of datastore:

Latency: Extremely low latent because of high traffic — Write < 5ms (P95), Read < 10ms (P95).
Storage: Should be able to support a huge number of rows and columns — 22B rows, 6 TB Disk storage
Peak: At peak times should be able to support — 600K writes per second and 4K reads per second.

Our Choice: After doing a lot of POCs on different databases we chose ScyllaDB to attack this problem because it supports low latency concurrent reads and writes, along with maintaining the performance throughout.
Our research:

Comparison of datastores for job feed @ Apna

Approach: A recommendation service serves these results from ScyllaDB to our services such as Job feed which consume the predictions.

Spark Job components:

Job Types: The streaming job is used to process the new data (writes and updates) and update the pre-computed results, whereas the batch job runs on a nightly basis to process the whole data.
Entity Collection: User and Job entities are collected from different sources (from Kafka in case of streaming, Bigquery in case of the Batch job)
Enrichment: Data is enriched and cleaned to all the required fields and attributes are part of the data frames.
Candidate Selection: In this component, eligible jobs are fetched for a particular user cohort.
Ranking: The set of eligible jobs for a particular cohort are ranked based on scores fetched from the inference service.
Post-processing: Post-processing like weighted factor sorting, filtration, etc. are carried out in this component.
Scylla Sink: Finally the output is dumped into ScyllaDB as a Sink.

Applications, and Impact:

Here are the various applications where we use our Impression to lead the model:

Job feed: The ‘Impression to Lead’ model directly powers our job feed through which we can show relevant jobs to our candidates at scale. This integration resulted in double digits improvements in applications per user.

2. Push notifications: We send millions of push notifications to our users daily across various campaigns. The I2L model powers these campaigns resulting in boosting our CTRs as compared to conventional campaigns.

The way forward:

We plan to do the following in our next steps:

ML Platform: We will be investing in the development of an in-house MLPlatform that will unify and automate our whole ML lifecycle from training to deployment to monitoring. Through this, we will be able to leverage MLOps in our platform for a more robust and faster approach to productionizing our models.
Enhanced Personalization: Moving to the user-level recommendation model to serve better relevant jobs to the user.
R&D of new features: Keeping the model’s features updated with newly launched features on the platform. Example — Employer candidate conversation.

Acknowledgements:

To begin with, if you have been reading this blog till now then we would like to acknowledge you, Kudos to you! You are a champ and dedicated too! We are looking forward to connecting with folks like you so please reach out to careers@apna.co
Next, We would like to acknowledge the people behind this:

Data: Himanshu Solanki, Deval Sethi, Sarfaraz Hussain
Engineering: Ravi Singh, Raviteja Meesala, Bhargav Reddy Kolla, Sunil Kumar Chaurasia, Pratichi Sahoo, Gurinder Singh, Samiksha Ojha, Aswin Mullenchira, Prinson Thomas, Harish Srinivas
Product managers: Sanket Purohit, Yogeshwari Chandrawat
Design: Neha Soni
Leaders: Puneet Kala, Ronak Shah, Shantanu Preetam

Special thanks to Vaishakh N R for his immense contribution in leading and managing this entire project & team end-to-end.

Let me know if you have any questions or suggestions,
Until Next time,
Deval Sethi
Email: devalsethi@gmail.com