Easy and effective way to improve sort order algorithm

For any online e-commerce or content company one of the biggest challenge is to best manage visibility on the list pages. There is limited space and problem of plenty. What should be the default sort order and which should be the items shown at the top.

In one of my previous company, my team built a product ranking algorithm in limited time ( 3 weeks) from limited resources (data and tech bandwidth) but giving us astonishing metric improvements.

For creation of algorithm we needed

  • Transaction data (MySQl DB) and
  • Behavioral data (php logs)

Building a system to mine each of these two data types would have taken us significant time and effort (learning curve, as team had to first learn how to mine the event stream).

Instead we used easily available Google Analytics data to build our simple ranking algorithm which can be configured from simple spreadsheet.

Old sorting

Before popularity sort following sorting options were visible on list pages:

Best Match- Solr driven values with boost to introduction dates

New Arrivals- Recency- introduction date DESC

Hot Deals- % Discount DESC

Price Low to High- Price ASC

Price High to Low- Price DESC

All of the above sorting options completely failed to best optimize potential of each SKU. Category landing page has limited top slots (and immense visibility) so results should be carefully displayed to help visitor see what best needs to be displayed for conversion.

Solution- Popularity sort

Each SKU should have a popularity score — indicator of its sale-ability potential, associated with it. The score was linear weighted sum of variables which can affect user interest in a SKU. The weights of the variables were configurable for each category, and the score computation logic was fast/ robust enough to be updated within minutes.

Variables Considered

The most important variables considered for computing popularity score for a SKU were:

  • Sales (orders) of the product
  • Recency of the product
  • Page Views of the product

Model is a modified version of RFM model. Recency denotes the age of product in the system, frequency denotes the number of page views, sales (orders) depict monetary benefits from product sale .

Each variable was further split on the configurable time windows and each window had a weight associated with it. This allowed us for having a) different contributions from different time windows b) for different category, thus allowing for higher contribution of more recent user behavior in overall score. This also give business opportunity to tweak model according to business preference for different category (fashion is different from mobiles).


Calculations explained

Calculated scores for each product were stored in an external file and fed to product serving system — Solr. Solr would check external file availability and then display results based on the file input.

Hacks applied

  1. Used GA instead of click-stream logs to get viability data for each SKU
  2. Used GA to get sales data for each SKU
  3. Used inverse log function to give higher weight to new SKUs and give in-organic boost for initial few days
  4. Instead of sales value, number of orders were considered to avoid penalizing popular but low cost items.
  5. KISS (keep it simple ) — google spreadsheet to control weights

Results (before and after — no significant change in traffic & discounts)

  1. Bounce rate from category list pages reduced upto 30% within 3 weeks
  2. Entrance to cart funnel from the list pages improved 70% in 3 weeks.
  3. Per visit customer value improved by ~100%.

This sort order is generic for all the users, or you can say its personalized at n=n level.

For a product guy, biggest challenge is to deliver results in limited time frame and showcase results with a MVP. In this scenario with a team of 2 dev and 1 PM we were able to deliver significant value to organization.

Since this project was a success, in V2 following was implemented

  1. Let go PV (page views) and sales (order) data instead used CTR (from click stream) of SKU. As this one variable is proxy for both orders and PVs. {GA enhanced e-commerce has provided possibility to get SKU level CTR, which can be used instead of clickstream}
  2. Added new variables like (brand, inventory type, inventory_count, vendor)
  3. Included a new variable which can be used to display any SKU at the top position (artificial boost any monetized SKU)
  4. Introduction date (age of SKU)

This gave us flexibility to display certain type of products at the top and thus improve their sales.

This also adds flexibility in system to monetize certain type of brand or vendor by boosting their visibility on list pages.

In V3 we made a even more complex version of sort order algorithm. A personalized sort order (product ranking) at n=1 level. This i shall explain in next post.