Story recommendation with AWS Personalize

Published in

cloudstory

3 min readJun 26, 2019

CloudStory has finally come to the fun part: Machine Learning (yay! :). After weeks of streamlining story ingestions and tagging and cleansing, we are now quite confident with data quality being good enough for Machine Learning. In case you need a little intro, https://blogs.cloudstory.dev is a tech news aggregator built entirely on Serverless framework and managed databases from Google (Firebase). Quick story here: https://stackshare.io/cloudstory/decisions/10226439048561471 1

First thing first, key decisions.

Google Recommendation AI vs AWS Personalize

After spending a couple of minutes reviewing the documentation, it’s clear that Google solution is designed towards e-commerce. The narrative, terms and schema are pretty much locked in towards e-Commerce system. Despite that it’s still in beta, which has high chance to change in future. Meanwhile AWS Personalize has been out from Beta and available for general public. Furthermore, documentation is fantastic, a typical AWS standard. I managed to establish an understanding that allows me to design the solution in one night, this includes Lambda code required in the design.

Daily data ingestions

We can easily adjust our scheduler, but daily seems to be good enough, we do not need real time recommendations, as we really prefer quality rather than frequency, and ML needs time to learn.

Ingestions by S3 rather than API call

We feel that async and file based approach is more scalable, and operationally more manageable in case of failures. CloudWatch and SNS are our operation team’s good friends so far.

Monthly ML training

With our limited understanding on ML, we decided to retrain and fine tune our recipes (if necessary) to see if there is improvement over previous version. Then we can make a call if to use the new version for the Campaign. According to documentation, Personalize can learn from the new events data automatically, so we may skip this step entirely after couple of months.

The Solution

In a nutshell, here is the data flow and tools look like:

From the client perspective, the only changes required is to send clicks data to the backend. Since client already use Firebase SDK, we might as well use it to send click events to Firebase database. The click data includes open, read, comment, like and save story. The rest of data (Stories, and Users) are automatically generated from the backend via Firestore database trigger.

Every day ML Data Job (Firebase function) is triggered by PubSub Scheduler. The job will consolidate the data, export them into csv files, and upload to S3 and GCS. GCS file upload is just meant for backup and will not impact the solution.

The S3 file upload will trigger Lambda function to create new Dataset, and import to existing Dataset Group

CloudWatch scheduler trigger Training job every month so we can learn and take action based on the quality it produces.

At the consumption side: Recommender is a google cloud function responsible for populating user recommendations. The function calls AWS Personalize API, and populate recommended stories into the user’s feeds. This feeds will subsequently be displayed as recommended story in CloudStory homepage.

We are new to ML, but thanks to AWS Personalize, it allows us to put together a solution for our experimentation. We welcome all feedbacks.

To end this story, we have a very important announcement to make. We intend to make the dataset public so everyone can use it in their own ML experiment. But before that happen, we need at least 1000 users with some interactions in the app. So start using Cloudstory today, and invite your friends. Thank You!