E-Commerce Use Case, Batch Data Import
Online content is growing faster than ever. With the time the average user has for engaging and interacting with an application being limited, recommendation engines are now becoming a must-have for businesses like video streaming platforms, e-commerce, online food delivery applications, and many others.
In essence, recommendation is a type of pre-filtering (Netflix homepage) or ranking (order of search results by relevance) of the content, so that it fits in a single view or in the latter case so that what the user is searching for is as close to the top as possible for less scrolling. For a long time, it has been dominated by collaborative filtering, content-based filtering and similar matrix factorization based methodologies. Modern recommenders are model-based or hybrid architectures (e.g. Neural Collaborative Filtering), that can leverage the collaboration of users (transcendence of the information in the interaction data between users) and other features describing items (type of item, the context of the event, etc.) and users (demographical data, profile status).
AWS Personalize is a fully managed service provided by Amazon for deploying recommendation systems without having to go through designing the whole model development, deployment, and MLOps cycles and implementing them from the ground. Even with no ML experience, you can create a case optimized or a custom recommendation system in a very short time, that is able to ingest historical batch and real-time data and once online provide batch and real-time recommendations, while continuously monitoring, and maintaining the whole service. AWS Personalize also handles the provisioning and maintenance of the infrastructure as well so you only need minimal cloud infrastructure building skills.
AWS Personalize does not only provide hybrid model-based recommendation systems with auto ML, it offers a diverse set of features that can benefit any online application that wants to personalize its users’ experience:
- User Segmentation
- Targeted Campaigns
- Rule Based Recommendations like: Customers who viewed X also viewed (e-commerce use case) / Frequently bought together (e-commerce use case) / Because you watched X (video streaming use case)
Using AWS Personalize
The first step to using this service is deciding on which one of the two types of dataset groups your application falls into.
Domain Optimized Dataset Groups:
This is the fastest and with the least overhead of the two ways you can use AWS Personalize. You only match each of your use cases to one of the predefined use cases within the two main business/platform types:
VIDEO_ON_DEMAND and ECOMMERCE
Choosing this, AWS Personalize manages the resources and you only define the correct schema and upload data (bulk, batch, and stream)
Custom Dataset Group
Here you handle:
- Provisioning Minimum TPS (transaction per second), and use it to sustain real-time (but you get charged for more if you surpass the provisioned amount)
- Choosing a suitable recipe from AWS Personalize custom recipes
- Estimating and setting up model retraining and data updating (using the freshest data) cycles
But you still don't manage infrastructure provisioning and maintenance.
Creating an E-commerce Use Case Optimized Recommender
& Importing Historical Batch Data
For e-commerce businesses, you can create an e-commerce optimized dataset group from the AWS console.
Click continue and define the dataset schema with required fields which are trivial but a little different for each use case.
The default way to input data to AWS Personalize is,
- Import a big batch of bulk data for ingesting historical data, when you are creating the recommender the first time.
- Import incrementally (can be done programmatically or manually from console), more batch data (though this time probably much smaller) over time to keep a fresh image of what’s happening in the platform.
- Stream real-time data into Personalize, to get the latest update of what users are doing and optimize recommendations w.r.t. to these events.
After creating a group, you need to import 3 datasets:
This is most probably the largest (longest i.e. with a lot of rows) dataset you will let AWS Personalize ingest. This is every event describing users’ interaction you have recorded in the past and will keep on recording and providing to AWS Personalize in order to produce accurate recommendations. From the console select the dataset group and click “Import interaction data”.
!Note that: When you are importing batch data from an S3 bucket, you need to provision the necessary read permissions to the IAM role you create for AWS Personalize within the S3 bucket read/write permission policy.
For E-commerce Interactions Dataset,
The required fields:
- USER_ID: The ID of the user
- ITEM_ID: The ID of the item
- TIMESTAMP: The time of the recorded event
- EVENT_TYPE*: The type of the event (e.g. view, watch, click, like, purchase)
*You can configure Amazon to train models using only certain types of events or weighting certain events higher than others. By default Personalize trains models with equal weights on all types.
The reserved fields (the meaning of these fields is predefined by AWS):
- EVENT_VALUE*: The strength or significance of the event represented by a scalar value. When the event type is ‘watch’ this is the percentage of the video watched.
*Just like event_value you can configure AWS Personalize to train models using events only with an event_value above a threshold.
- IMPRESSION: This is a field for creating feedback to Personalize from its own recommendations. Using this Personalize decides how to manage the balance between exploration* and recommendation.
- Exploration is an important aspect of recommendation systems, which is anticipated and optimally managed by AWS Personalize for you. Basically, it is the need of showing a user new items with less interaction and thereby give users a window of convenient availability (which is the main motivation of recommendation systems) for items that haven’t received much interaction yet. This is necessary so that the whole system (the dual system together with The Application ⇔ The Users ) does not get stuck in the loop of ‘Users interacting only with popular items — Only popular items getting recommended’.
Finally you have the option to also add contextual fields which you create during the recording of the events and believe contains information regarding users’ experience like:
- DEVICE_TYPE: Type of the device user is interacting from
- LOCATION: Location of the user (not in a spatial format, only strings like “Berlin”)
Upon the interactions data you need to provide the two datasets:
A table of users indexed by a unique user_id, and respective demographic or somewhat static data.
- USER_ID: A unique ID per user
- 1 metadata (There needs to be at least one categorical feature describing users)
You can also add more fields. Fields can be like:
Any numerical or categorical data you can provide that describes the item.
- GENRE (in case of music or video streaming)
- PRODUCT_TYPE (music instrument, home appliance)
- PRODUCT_DESCRIPTION: AWS Personalize can mine unstructured textual data and make meaningful features out of it. To achieve this, create a ‘string’ field like this and set the ‘textual’ attribute to True.
After creating the schemas correctly and pointing the dataset import job to the correct locations of the data within your S3 buckets, AWS Personalize will ingest the datasets, train the models, and will be ready for…
- Getting recommendations using the rest API calls like Get Recommendation calls
- Ingesting new batch data with PutEvents API calls or ingesting in bulk by CreateDatasetImportJob API calls.
For real-time data ingestion, there needs to be a complementary infrastructure streaming into AWS Personalize like a Kinesis Producer or a Kafka Service, and we will not go into this in this blog post.