Improving ecommerce business metrics with product ranking

Product Ranking at Constructor

Increasing Business Metrics by Reordering Items

Arshak Mkhoyan
Constructor Engineering
10 min readAug 25, 2022

--

Any e-commerce shopper knows the experience of scrolling through irrelevant and unattractive products looking for the right item. This experience leads to two possibilities: finally finding what you’re looking for, or leaving the site. Either way, it’s not a good experience for the shopper — and it can impact an e-commerce retailer’s ability to drive profit and build a long-term relationship.

Working with some of the biggest retailers in the world, Constructor is continually developing new solutions to help e-commerce teams run successful businesses and improve the customer experience.

One such solution has been effective product ranking to satisfy the end user by helping them find ideal products more quickly. The culmination of seven years of R&D, our ML-powered product ranking system satisfies users’ needs and drives retail company goals by surfacing relevant, attractive products higher up in search results.

In this article, we introduce a high-level view of the ranking system, including why we developed it, how the system works, and the results we’ve seen in A/B tests against other solutions. This is the first in a series of upcoming articles which will build on this one and provide more technical details.

Table of Contents

  1. A Quick Intro to Constructor’s Search Engine
  2. The Retrieval System and Its Limitations
  3. How Constructor’s Product Ranking System Works
  4. Why a Separate Service for Ranking?
  5. A/B Test Results
  6. On the Roadmap

A Quick Intro to Constructor’s Search Engine

In 2015, Constructor was founded explicitly to help e-commerce companies improve their search and discovery experiences through AI and machine learning. We launched with an autosuggest service, and then in 2019 released search, browse, and recommendations, creating a holistic e-commerce ecosystem built from scratch on a foundation of data science.

Before we dive into the ranking system specifically, it’s important to review the overall architecture of Constructor’s search engine. It consists of 3 parts:

Overview of Constructor’s Search Engine
  1. Retrieval System
    Given a search query, retrieve all relevant/attractive products (candidates). The recall metric indicates the percentage of relevant items retrieved from all relevant ones. The higher the recall, the more relevant items are returned.
  2. Ranking System
    Given a set of products, rank products in the most effective way to satisfy business needs. The precision metric describes the percentage of attractive items in the retrieved set. We particularly optimize the precision@K metric, where K stands for the number of items shown at the top of search results to consider for computing precision. In other words, we want to rank attractive items at the top positions. Higher precision@K indicates more interactions (i.e., clicks, add to carts, purchases) between users and products shown.
  3. Business/Merchant Rules
    Applying e-commerce business-specific rules defined by the customer (optimize for profit margin, revenue per visit, conversion rate, inventory balancing, etc.) before returning the final ranked set of items.

In this article, we’ll focus mainly on the second part of the search engine: the ranking system. But first, let’s briefly discuss retrieval.

The Retrieval System and Its Limitations

Constructor has two main product retrieval systems: an Inverted Index (II), which was built in-house, and vector space search, which we’ll call by the name of our proprietary implementation, Cognitive Embeddings Search (CES).

Constructor’s II is e-commerce first and outperforms other popular variants in the field like SOLR, Lucene, and ElasticSearch. The index data structure is built based on searchable fields, including the product name, size, color, brand, AI-generated keywords, etc. Customers can also send us any item’s metadata in a separate field to include it in the II. It is also used in production to quickly get items relevant to a given query by matching the words (tokens) in the query and words (tokens) in searchable fields.

Constructor’s CES uses unsupervised and supervised (DSSM) models to learn vector representation of items and queries. By computing query and item distances (cosine distances between two vectors), we return Top-K items as decided through rounds of A/B tests for each customer. It greatly improves the recall metric and provides awesome lifts for conversion and purchase metrics based on internal A/B tests!

Constructor’s Cognitive Embeddings Search improves search performance by greatly reducing zero-results rates with contextual awareness through vector/spatial representation of a catalog.

Both of these systems have some notion of ranking already: II uses token-item scores (as opposed to more traditional algorithms built into Solr and Elasticsearch like TF-IDF or BM25) and CES uses distances produced as scores.

These scores are then used in a weighted sum in addition to several other ranking factors based on clickstream data, such as query/user and item interactions, the personalization score based on each user’s unique history, and item attractiveness, to calculate the final score used to rank the items. After getting the base ranking, Business/Merchant Rules are applied to change the ranking accordingly.

Ranking using CES combined with II has proven to be a great technology, winning all external customer A/B tests comparing our system both to other search vendors as well as search engines built in-house on technology like Solr and Elasticsearch. However, there was still room for improvement based on the following limitations:

  1. Time-consuming delivery of new features/signals. The logic that applies ranking factors was scattered throughout the backend, leaving no isolated place where data scientists could easily make changes. The lack of a unified pipeline created other limitations: a slow experimentation cycle, difficulty in adding and scaling new signals, and difficulty implementing changes for all customers at once.
  2. Difficulty in delivering “complex” features/signals. Ranking applied within the retrieval system was done for all items returned by the system. Bounded by inference time constraints, we could only add simple signals that were “cheap” enough to calculate for potentially tens of thousands of products.
  3. Lack of flexibility in customer-specific configuration. Ranking logic was hard to configure on a customer basis, especially when we wanted to use different sets of signals for different customers. We could only ingest “all features” into production, many of which were unused. This could potentially overload the system.
  4. Difficulty in adding customers’ external ranking factors. We have customers who want to send us features to be considered by our ranking algorithm. These features needed to be custom added for each business.

In order to surpass these limitations and also add more business value (which will be discussed later in the article), the Constructor team decided to create a separate service aimed at even better product ranking.

How Constructor’s Product Ranking System Works

Constructor’s ranking system is a real-time personalized ranking service, powered by machine-learning models with efficient feature delivery and experiment workflow.

Below you can find a diagram with a simplified process for product ranking. Generally, it works on top of the retrieval system by taking Top-K candidates returned by the retrieval system and passing them through the ML model to get predicted scores that are used to rank the items.

Constructor’s Products Retrieval and Ranking Infrastructure Overview

Overall, the system consists of 2 parts: Offline and Online.

Offline

To be able to process requests to the Ranking Service online, we run dataset creation and model training/ingestion on a daily basis for each customer. To do so, we have a job that when triggered creates a DAG (Directed Acyclic Graph) to compute, save, and collect all necessary data (mostly using Spark via Databricks), and then train and save the ML model.

Dataset Creation

To construct data for training the ML model, we use behavioral and search results logs to collect data around each user’s search requests, items shown, and interactions (clicks, add-to-carts, purchases, etc.) We also run a Feature Computation Job and use the Offline Feature Store to store four types of collected features: item metadata, user metadata, user history, and interactions. The collected features are saved to the Online Feature Store on a daily basis to be used in real-time inference.

Model Training/Ingestion

After the dataset and features are collected, we run a job to train and evaluate our ML model offline. All produced logs generated from training and validating the model (paths to model and data, metric values, statistics, etc.) are saved into Neptune.ai for experiment management, while the model and Model Artifacts are saved into object storage, and Model Metadata is ingested into a database.

Online

When the user types a search query, the request is sent to the Backend Service, which makes all needed calls to other services to return a ranked set of items. The process happens in the following order:

  1. Receive Personalization & Real-Time Features based on user’s metadata and history to later use in ranking results.
  2. Retrieve candidates for ranking:
    - The II receives the query and finds and returns a set of matching items, applying spelling correction and query expansion automatically.
    - In parallel, CES is triggered to enrich the result set. It uses vector representations of items, and thus, has much higher (100%) coverage compared to the II. If results from the II are not available, only results from CES are used. Otherwise, results from the two systems are combined to get all relevant/attractive products.
  3. A retrieved set of items (from II and/or CES) is passed to the Ranking Service. It retrieves features from the Online Feature Store, computes some “complex” real-time features, and retrieves the appropriate model with its artifacts using the Model Registry. By passing features to the model, we produce final items’ scores to rank the results.
  4. The last step in ranking is applying Business/Merchant Rules, which is done in the Backend Service.

Why a Separate Service for Ranking?

A separate ranking service has many benefits over ranking done within retrieval systems. It specifically overcomes the limitations mentioned previously:

  1. Fast delivery of new features/signals. Where it used to take two months to add a new feature to be considered in the rankings, it now takes less than two weeks. This was achieved by having an ML model to learn ranking and separate storage for features. Other benefits include a faster experimentation cycle, scalable addition of new signals, and ease of providing changes to all customers at once.
  2. Easy to deliver “complex” signal features/signals. As ranking is applied to the Top-K candidates, we can now easily add “complex” features that will be computed only for a small set of items and not reach inference time constraints.
  3. Flexibility in customer-specific configuration. Ranking logic is easy to configure on a customer basis, as we can set different (model, feature, dataset) parameters for each customer in the configuration file alone.
  4. Easy to add Customer’s external ranking factors. Customers can send us their ranking factors in a predefined way to be included in our ML model.
  5. Optimal results for merchant tasks. As a bonus, merchandiser rules can now be integrated into the ranking system by providing the respective set of features (promotions/sales data and current items’ stock data for promotions and inventory optimization, for example) to include in the ML model to reflect business logic.

A/B Test Results

Constructor embraces an experimentation culture, so A/B tests are our best friends to validate the changes made in the ranking service: new features, new dataset creation logic, code optimization, etc. We constantly conduct A/B tests to prove the value we provide to our customers and decide on the general path of future work.

We have already had many rounds of A/B tests for the ranking system, each of which helped us to improve the algorithm and the service.

All tests were mostly concentrated on improving product ranking and making sure speed was fast enough so the overall effect was a lift in conversions and revenue. Altogether, we were able to reach positive, statistically significant incremental lifts for most customers on top of the results they’d already attained by switching to Constructor:

  • +2% average incremental lift in purchase rate
  • +2% average incremental lift in conversion rate
  • +3% average incremental lift in click rate

On the Product Ranking Roadmap

The ranking system is still in the process of rolling out for all customers. We’re doing it slowly to optimize respective ML models for every customer to ensure maximum lifts. While it’s already proving its value by increasing our customers’ revenue, conversions, and profit, we’re not stopping there! We have several plans for how we can further develop and optimize the ranking service:

  • Applying the separate ranking service to the browse experience. Constructor ranks the items presented when users search for products by going to a category page from the website’s navigation panel instead of typing a search query. Driving incremental improvement for browse with the separate ranking service is next on our list. We have seen great offline lifts and are hoping for the same results in upcoming A/B tests.
  • Better usage of customer-provided data to improve ranking. We can make ML ranking more personalized if we can apply data from third-party vendors our customers collaborate with or from the Customer’s own DS team.

Conclusion

Constructor’s ranking system is a promising service to make e-commerce businesses more successful by finding an additional way to leverage user data to power the ML model. We’re constantly working on improving the system and specifically the ranking algorithm to make end users happy. When shoppers see more attractive results that they find more appealing, they buy more and retailers see more conversions. It’s a win-win.

Arshak Mkhoyan is a machine learning engineer at Constructor, where our data science and engineering teams collaborate on e-commerce search and discovery and serve billions of requests each year for retail’s top companies.

Interested in joining the future of search? We’re hiring.

--

--