How Moovup recommends jobs with Amazon Personalize
Moovup is an online platform for front-line jobs, such as retail, restaurant, and logistics jobs. To provide jobs that match user’s interests, a personalized job searching experience is the key. For example, if a part-time waiter is searching for jobs in Moovup, we recommend jobs related to food and beverage to him. These personalized jobs are shown in:
- The “Recommended jobs” (推介好工) section on the homepage
- Job searching page
To implement this feature, we integrated Amazon Personalize to our backend. Amazon Personalize is a machine learning service that helps developers easily integrate personalized recommendations to the website with no machine learning expertise required.
In this article, we share how did we develop a personalized job recommendation feature with Amazon Personalize. To show job recommendations to our users, we have two main steps:
- Create a training model in Amazon Personalize.
- Build a data pipeline to connect with Amazon Personalize.
The data pipeline needs to continuously:
- Send user’s interaction data to our training model in Amazon Personalize.
- Fetch user’s personalized job recommendation and show it on the front-end side.
Part 1: Create a training model in Amazon Personalize
Let’s begin with creating a training model. The diagram above shows the steps of creating a training model in Amazon Personalize. In Amazon Personalize, a training model is called a “solution version”. Here are the details for each step:
- Create a dataset group. It is a container for all of our resources, such as datasets, solutions and campaigns.
- Create three datasets, including items, users and interactions. They are containers for our item, user and interaction data, which are used for training our model.
3. Upload our user, item and interaction data in CSV format to S3 bucket. We upload our job data, user data and user’s clicks data in CSV format to Amazon S3. Amazon Personalize will fetch those data from Amazon S3 for training model later.
4. Create dataset import jobs to import those CSV files. Specify the Amazon S3 location of the files in the console. Meanwhile, create schemas that fit our datasets.
For the schema of a dataset, fields
should match all column headers in our CSV file. Take “item dataset” as an example, we have fields such as JOB_NAME
and JOB_TYPE
.
Additionally, we set these fields with categorical
or textual
properties to be true
. The reason is to let Amazon Personalize train a model based on these fields. JOB_NAME
has an infinite set of values which is unstructured data. Yet, JOB_TYPE
is not. It has a fixed set of values, part-time or full-time. That is why we set textual
for the former but categorical
for the latter.
{
"fields": [
{
"name": "JOB_TYPE",
"type": [
"null",
"string"
],
"categorical": true
},
{
"name": "JOB_NAME",
"type": [
"null",
"string"
],
"textual": true
},
...
]
}
5. Create a solution, which means training a model. We use the User-Personalization recipe for the training. To optimize the training model, we can enable HPO (hyperparameter optimization), by adding hpoconfig
in the solution config. Amazon Personalize will run many training jobs with different values within the range we specify. The trade-off of it is to have longer training time which causes higher cost.
After the training is finished, the performance of the training model is shown in solution version metrics. A higher score (closer to 1) means better performance.
6. Create a campaign to deploy the training model, by selecting the solution version created in the previous step.
7. Apply a filter to filter out unwanted recommended items. For instance, we can specify only getting promoted job instead of normal jobs, by including an expression Items.JOB_PROMOTION_STATE in (“Posted”)
below:
Include ItemId WHERE Items.STATE IN ("Posted") AND
Items.IS_JOB_SUSPENDED IN ("false") AND
Items.IS_COMPANY_SUSPENDED IN ("false") AND
Items.JOB_PROMOTION_STATE in ("Posted")| Exclude ItemId
WHERE Interactions.event_type IN ("apply_job_complete")
Now, we have our training model ready to give us job recommendations for different users. By calling Amazon Personalize’s GetRecommendations
API with a user id, it returns a list of job ids. We can test the campaign by inputting a random user id 123
in the console:
It returns a recommendation ID and a list of item ids with their scores.
Part 2: Build a data pipeline to connect with Amazon Personalize
Next, we build a data pipeline to fetch user’s job recommendations and send their data to our model at the same time. As mentioned at the beginning of the article, our goal is to recommend jobs that fit user’s interests in near real-time. Hence, we need two features:
- Collect and send user’s data to Amazon Personalize and update our training model. User’s interaction data tells the training model what kind of jobs the user is interested in.
- Fetch user’s job recommendations from Amazon Personalize and show them to the user when they are searching for jobs in Moovup.
Here are the steps we have for the data pipeline:
- When a user clicks or applies for a job, our website or mobile application sends an event to Firebase, which is linked with BigQuery.
- Create a scheduled job with Amazon EventBridge. Every 10 minutes, it fetches user, job, and interaction data from PostgreSQL and BigQuery. Then, we send these data in batches (10 records for each batch) to Amazon Personalize by calling the
PutEvents
,PutItems
andPutUsers
API. - Every two hours, Amazon Personalize updates the latest model automatically. It includes the new data we sent to the model before.
After a while, the user’s interaction data is added to our model in Amazon Personalize. When they are searching jobs in Moovup, we go through the steps below:
- Our server sends a request to fetch a list of job recommendations with their user id, through the
GetRecommendations
API. Then, Amazon Personalize returns a list of job ids to our server. - Our server loops through the ids and get job details for each id.
- Our server sends a list of jobs with their details to client. Now the user can see jobs that are recommended by Amazon Personalize.
What if a new user accesses our application for the first time? They have not clicked or applied for any job in the application yet, which means the training model does not have their interaction data. In this case, Amazon Personalize will return a list of popular jobs instead of personalized job recommendations, as mentioned in the documentation:
For new users without interactions data, recommendations are initially for only popular items
Summary
The aim of integrating Amazon Personalize into our backend is to recommend job that matches user’s interest in near real-time. We achieved it by creating a training model in Amazon Personalize, and building a data pipeline to get job recommendations from Amazon Personalize for different users. As a result, we successfully improved our user’s job searching experience and employers’ efficiency of recruitment in Moovup.