How to build and implement Personalised sorting

AnkitG
4 min readJul 19, 2017

--

In my last post, I shared a quick and easy approach for sort order improvement.

As promised, in this post I will explain the V3, implementation of personalised sort (n=1 level) at scale.

First, let me explain the process in short

For each defined category of items (eg shoes, sports news) map all your items/products in MECE mutually exclusive and collectively exhaustive groups. Sort all the items in each group based on popularity (define yourself or you may take inspiration from my previous post).

When a user visits a product category (eg sports news or shoes) display the items from product groups sorted by default popularity sort.

Now, let me explain the whole process at length.

What is an item (it can be a news article or a shoe) sort order?

It is a logical distribution of items on a landing page. This helps in easy access to items which interest to the user. For the best user experience website should showcase the most prefered products at the beginning of the page for higher conversion (whatever you deem fit- purchase, reading, call to action etc.).

Worth reading about soft bounded and hard bounded sort order and what is the difference.

Typical sort order are

1. Order defined by site owner (platforms offer limited flexibility and customisation)
2. Date added.
3. Price (low to high, high to low).
​4. Name (A to Z, Z to A).

Why is Sort order important

Every website owner wants to showcase the most profitable items, first to the user without dropping of conversion rate and I can write a whole post to explain the advantages but in short it helps to

  1. Manage visibility

2. Conversion

3. Profitability

4. Control sales

5. Build brand

How to create a personalised sort order

Please note that some of the points will be very specific for eCommerce but they can be tweaked for other domain as well.

Tech Stack

We used AWS for clickstream logs storage and refined user interaction maps were stored in MongoDB.

Technical details and flow

There are two main data streams

  1. Product (or item) stream — product details (type of inventory, inventory, product attributes like image, description, category etc. Typical model refresh cycle is once every two hours.

First output of this stream is product groups— groups of similar items- similarity algorithms like word2vec, cosine similarity, collaborative filtering are used to identify products which are perceived equally by user. Each of these products are called a product group (typically 25–40 products)

Second output of this stream is product scores— popularity score for each product — Exploit (CTR, sales, order..) and explore (introduction_date, inventory_type..) mechanism is used to identify popular products. Please read this to understand more about explore and exploit mechanism (we used multi arm bandit). Each product in every product group is sorted by the popularity score.

2. User stream — user interaction with website, products. Due weightage can be given to specific events ( like x -> visit of product page, 2x->add to wish list, 3x->save for later, 5x-> add to cart or 10x-> transaction).

First output of this stream is customer events- users behavioural data for storage and future analysis.

Second output of this stream is filtered users- selected users who have provided enough data to be shown personalised products (eg >3 transactions, > 30 product page view etc).

Refresh the model at least once a day (i.e. mapping of users with the product groups), but the execution of recommendation should be real time (from user’s current journey identify user’s affinity towards product group and display rest of the products from that product group). Will explain real time intent judgement and content display in a separate post. So the same user may see different set of products on visiting the same landing page (category page).

The selected users and their category wise products (each category may have different product groups, each product group has several products) are loaded into the controlling systems.

Controlling system does following functions

  1. Defining of test and control group for A/B tests
  2. Control what % of users see default vs personalised sort
  3. Stop personalised content serving if required (eg very big sale event where most of the content is cached and not personalised)
  4. Provide infrastructure for load balancing across distributed servers

Output of controlling system is fed to an end point serving system (like Solr, elastic search etc)

Evaluation & results of the personalised sort order

Evaluate the performance of control experiment (A/B test)- suggested KPIs

  1. CTR — across categories
  2. Contribution from personalised sort assisted sales
  3. Impact on ARPU ( average revenue per user) net of returns
sort order performance evaluation

Personalised sort order showed ~5% improvement in sales over default sort order.

--

--

AnkitG

Rookie photographer & traveller. Professionally I enjoy building data products which lead to improvement in customer engagement & retention.