A Step-By-Step Guide to Building a Recommender System using AWS Personalize

Furkan Karadas
Onyx Labs
Published in
5 min readDec 26, 2021

While companies are developing their products, customer satisfaction has always taken a significant slice of the pie. Recommender systems are widely used in online streaming platforms, such as Apple Music, Spotify, and Internet video services like Prime Video, Netflix. Recommendation systems have emerged to solve the problem of choice, which arises from the user’s exposure to millions of contents. These systems are working a way to increase user experience and customer satisfaction. For this reason, companies make several investments in these systems. In 2014, Netflix was employing 300 people to develop and improve its content recommendation systems and spent a total of 150 million dollars for this, ex-Chief Product Officer Neil Hunt said these at the 8th ACM Conference on Recommender Systems.

This article mentions AWS Personalize that enables developers to build applications with Machine Learning (ML) technology for real-time personalized recommendations. Also, we will develop a movie recommendation system using AWS SDK for Python.

What is AWS Personalize?

Amazon Personalize allows developers to create apps using Machine Learning (ML) technology to provide real-time personalized recommendations with no necessary ML knowledge.

Amazon Personalize makes it simple for developers to create apps that provide a variety of personalization experiences, such as personalized product recommendations, personalized product re-ranking, and customized direct marketing. Amazon Personalize is a fully managed Machine Learning service that trains, tunes, and deploys custom ML models to deliver highly personalized recommendations.

Amazon Personalize sets up the necessary infrastructure and manages the entire machine learning pipeline, including data preparation, processing, identifying features, implementing the best algorithms, training, optimizing, and hosting the models.

How does it work?

AWS Personalize

I will mention more details in the implementation part, so I would like to explain briefly in this section.

First of all, we prepare our dataset. This is the data preparation and data preprocessing parts. After doing it, we must upload the dataset in S3 Bucket, and then we should set AWS Personalize configurations.

Example Movie Recommendation System with Python

1. Data Preparation

First, we must download the dataset that we will use in this example (download ml-latest-small.zip). This dataset has been collected from the MovieLens website. After downloading this data, open the ratings.csv and follow these steps:

  • Delete rating column
  • Replace the header row with the following: USER_ID, ITEM_ID, TIMESTAMP
  • Save the ratings.csv file

Code:

import pandas as pd file_path = "../data/ml-latest-small/ratings.csv"
output_file_path = "../data/ratings.csv"
filename = "ratings.csv"
df = pd.read_csv(file_path, usecols=["userId", "movieId", "timestamp"])
df = df.rename(columns={"userId": "USER_ID", "movieId": "ITEM_ID", "timestamp": "TIMESTAMP"})
df.to_csv(output_file_path, index=False)

2. Create S3 Bucket and upload the dataset file

Note: We assume that you have installed the boto3 library and set the AWS configurations. If you did not install and set the configurations, you can install them through this page.

We are creating an S3 bucket with the code block below:

import os
import boto3
from botocore.exceptions import ClientError
s3_bucket_name = "personalize-movie"
s3 = boto3.resource('s3')
try:
s3.create_bucket(Bucket=s3_bucket_name)
print("{} created.".format(s3_bucket_name))
except ClientError as error:
print(error)

Congratulations! 🥳 We created the first S3 bucket. Now, we need to upload the dataset that we have prepared earlier:

try:
s3.meta.client.upload_file(output_file_path, s3_bucket_name, os.path.basename(output_file_path))
print("File uploaded")
except ClientError as error:
print(error)

After uploading the dataset, we need to authorize Amazon Personalize to read the data in the S3 bucket.

import jsonbucket_policy = {
"Version": "2012-10-17",
"Id": "PersonalizeS3BucketAccessPolicy",
"Statement": [
{
"Sid": "PersonalizeS3BucketAccessPolicy",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::personalize-movie",
"arn:aws:s3:::personalize-movie/*"
],
"Principal": "*"
}
]
}
try:
put_bucket_policy_info = s3.meta.client.put_bucket_policy(Bucket=s3_bucket_name, Policy=json.dumps(bucket_policy))
print(put_bucket_policy_info)
except ClientError as error:
print(error)

3. Create a role and put role policy

If we would like to use Amazon Personalize, we must create an AWS Identity and Access Management (IAM) service role and attach permission policies for Amazon Personalize. Let’s do this:

role_name = 'personalize_user'
policy_name = 'AwsPersonalizePolicy'
role_policy = {
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": {"Service": "personalize.amazonaws.com"},
"Action": "sts:AssumeRole"
}
}
iam = boto3.client('iam')
response_role = iam.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps(role_policy)
)
print(response_role)

We should note the ‘Arn’ code on the screen for later use. (BTW, we are using Jupyter Notebook, so we can later access the ‘response_role’ variable and use it.)

4. Importing training data

The initial step is to build a dataset schema. The schema allows Amazon Personalize to parse the training dataset.

dataset_schema = {
"type": "record",
"name": "Interactions",
"namespaces": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
}
],
"version": "1.0"
}
personalize = boto3.client('personalize')
create_schema_response = personalize.create_schema(name='personalize-ratings-schema', schema=json.dumps(dataset_schema))
print(create_schema_response)

Attention: The names in the part of the fields of the JSON must match the headers in the CSV file (dataset headers).

We should create a dataset group that contains one or several datasets that Amazon Personalize can use for training.

response_dataset_group = personalize.create_dataset_group(name='personalize-dataset-group')
print(response_dataset_group)

🕐 Wait until STATE condition is ACTIVE! If you want to learn state, you should run this code:

response_dataset_group_status = personalize.describe_dataset_group(
datasetGroupArn=response_dataset_group['datasetGroupArn']
)['datasetGroup']['status']
print(response_dataset_group_status)

We create an empty dataset and add it to the specified dataset group:

response_dataset = personalize.create_dataset(
name='personalize-dataset',
schemaArn=create_schema_response['schemaArn'],
datasetGroupArn=response_dataset_group['datasetGroupArn'],
datasetType='Interactions'
)
print(response_dataset)

We create a job that imports training data from your data source (an Amazon S3 bucket) to an Amazon Personalize dataset.

response_dataset_job = personalize.create_dataset_import_job(
jobName='PersonalizeJob',
datasetArn=response_dataset['datasetArn'],
dataSource={'dataLocation': 's3://{}/{}'.format(s3_bucket_name, filename)},
roleArn=response_role['Role']['Arn']
)
print(response_dataset_job)

🕐 Wait until STATE condition is ACTIVE! If you want to learn state, you should write this code:

response_dataset_import_job_status = personalize.describe_dataset_import_job(
datasetImportJobArn=response_dataset_job['datasetImportJobArn'])['datasetImportJob']['status']
print(response_dataset_import_job_status)

5. Create a solution

After you import your data, create a solution and solution version. The solution contains the configurations to train a model. A solution version is a trained model.

We should pay attention to the recipe when creating a solution. You find recipe information at this link.

create_solution_response = personalize.create_solution(
name='PersonalizeSolution',
recipeArn='arn:aws:personalize:::recipe/aws-user-personalization',
datasetGroupArn=response_dataset_group['datasetGroupArn']
)
print(create_solution_response)

🕐 Wait until STATE condition is ACTIVE! If you want to learn state, you should write this code:

response_solution_status = personalize.describe_solution(
solutionArn=create_solution_response['solutionArn']
)['solution']['status']
print(response_solution_status)

We set up a training configuration. Next, we will create a solution version:

create_solution_version = personalize.create_solution_version(
solutionArn=create_solution_response['solutionArn'],
)
print(create_solution_version)

🕐 Wait until STATE condition is ACTIVE! If you want to learn state, you should write this code:

response_solution_version_status = personalize.describe_solution_version(
solutionVersionArn=create_solution_version['solutionVersionArn']
)['solutionVersion']['status']
print(response_solution_version_status)

We trained a custom recommendation model for our dataset. Let’s deploy this solution.

6. Deployment

We can deploy it using a campaign after we train and evaluate our solution version. A campaign is an endpoint used to host a solution version and make recommendations to users.

response_campaing = personalize.create_campaign(
name='PersonalizeCampaign',
solutionVersionArn=create_solution_version['solutionVersionArn'],
minProvisionedTPS=1
)
print(response_campaing)

🕐 Wait until STATE condition is ACTIVE! If you want to learn state, you should write this code:

response_campaign_status = personalize.describe_campaign(
campaignArn=response_campaing['campaignArn']
)['campaign']['status']
print(response_campaign_status)

The whole system is ready, now let’s get the recommendation for users from our system.

7. Get Recommendation

After creating a campaign, we can use this code to get suggestions:

user_id = 2
num_item = 5
personalize_runtime = boto3.client('personalize-runtime')
response_get_recommendation = personalize_runtime.get_recommendations(
campaignArn=response_campaing['campaignArn'],
userId=str(user_id),
numResults=num_item
)
for item in response_get_recommendation['itemList']:
print(item['itemId'], "-", item['score'])

user_id: We give the id of the user we want to recommend contents.

num_item: How many recommendations do we want the model to give us?

A list of recommended items and scores for the user is displayed in the console.

Summary

In this article, we have briefly introduced AWS Personalize and developed a movie recommendation system. If you want to improve yourself on AWS Personalize and examine different use cases, you can find detailed information on the AWS Personalize documentation website.

Follow ONYX Labs Medium Blog for more content like this!

--

--