GCP Retail Search Onboarding : Best Practices for User Events (Part 2/4)

Shrish marnad
Google Cloud - Community
5 min readAug 17, 2023

This blog is part 2 of a 4 part blog series on GCP Retail Search Onboarding best practices. Highly recommend to start with Part 1

In part 2 we will cover the best practices in User events in Retail Search for optimal performance.
(The basics of Retail Search User events are not covered in this blog)

User event importance.

User events are used to derive buy-ability signals , understanding buying patterns and trends and also for personalisation.

There are 4 main events SEARCH, DETAIL-VIEW, ADD-TO-CART, PURCHASE. Based on these events we can determine which product was clicked, added to cart and purchased. This will help train the AI model, as the product is getting more interaction (clicks) and conversions(purchases), which will help it rank the product better in the search result for optimised revenue uplift. Apart from this the user events also are the basis for KPI measurement like Revenue Per User(RPU), CTR, CVR etc.

Users events need to be initially backfilled from existing historic events. There might be a need to transform the existing historical events to the User events schema prescribed for Retail Search. This is needed to train the AI model for revenue optimisation. Post this events need to be sent continuously to Retail search(via collect, write API or bulk import)

User events continuity.

The sequence flow of User Events describes the way users performed the activities on the website. The ideal flow would be that the user performs a “Search” on a query then does “Page Views” on the product of interest, “Adds product to the Cart” the ones intended to buy and then does a “Purchase” for the products in the cart. Similarly to the above flow we expect User Events to have a similar pattern for a given visitor id. So this means on a time scale there cannot be an ADD-TO-CART event post PURCHASE event for a given visitor id for the product purchased in the session timeframe. Although there could be scenarios where some websites directly allow customers to add products to cart from search so in those cases we can see there are no detailed page views.

Flow for events

  • Path 1: SEARCH event -> DETAIL-PAGE-VIEW event -> ADD-TO-CART event -> PURCHASE event
  • Path 2 : SEARCH event -> ADD-TO-CART event -> PURCHASE event

If the order of the events (i.e. event timestamps) are jumbled, then they will be discarded by the AI model during model training, which will in-turn affect the search performance

Product Impressions check:

When ingesting or importing events to Retail search , historic events via bulk import or live events via real-time streaming, there are a few thresholds to be met w.r.t. event counts for the AI model to be trained. It is important to note that the minimum volume of events that are required is not a blanket volume number i.e. if the limits says 150k detail-page-view events are required, then it should not be thought of as 150k random detail-page-view events. The events need to be related to other events like SEARCH or ADD-TO-CART. The AI model trains on these events for CTR optimisation and on SEARCH events and the subsequent DETAIL-PAGE-VIEW events i.e. every DETAIL-PAGE-VIEW event must be traceable to a SEARCH event product-id list. This is what counts as “search impressions”. Same for ADD-TO-CART and PURCHASE events. That is to say , if we draw a timeline of events using the event timestamps, for a particular visitor id, we should be able to infer a search leading to a click / buy behaviour. So if the AI model detects a random DETAIL-PAGE-VIEW event that is not associated with any SEARCH event, then that event cannot be used for model training. So care needs to be taken to ingest events that are linked back to a search event which means the timestamps , visitor-id, and product-id details should be accurate for the AI model to able to train on them.

Detecting and Handling Bot traffic.

Weather desired or not, it is quite common to have bots on the e-commerce sites. These bots sometimes make search calls, to get the price of multiple products (for price tracking). This will incur search api charges. These search calls will almost never lead to any conversions. So one way to optimise the costs for such a situation is to cache the search api response with a tested TTL. Few things to note

  • Cache only responses of non-logged in users. Bots never have a login info when making a searchapi call. This means the search request has an empty user-id. It is strongly recommended to not cache responses of logged in users as it could be personalised for that user/visitor.
  • Bot traffic can also be detected by visitor-id, if there are a lot of SEARCH events or DETAIL-PAGE-VIEW events from a few groups of visitor ids, it could be due to bot traffic. It is a good practice to keep your visitor-ids in check.

Handling cached search responses.

Retail Search natively doesnt provide any support for search response caching. Search response caching needs to be handled outside of Retail Search as per need.

When serving a cached search response to a search api call (for reasons mentioned in the previous point), care needs to be taken to ensure the proper attribution token is sent in the SEARCH event.
So in a non-cached search api call, the following is the event flow

non-cached search response user events flow
  • Each visitor is served by a unique search response and the respective attribution token follows
  • So there is a clear distinction between visitor-id1 and visitor-id2 events extrapolation as the attribution token is different

When Search result caching is enabled, the following flow of events is recommended

Cached enabled search response user events flow

Main things to note in the cached response are:

  • The flow of events remain the same
  • The only thing that changes is the attribution token and search impressions (i.e. products-id list) is obtained from the cached response.
  • Visitor 2 will not be aware if the search response was cached or not, So the visitor ID, userID and other information of visitor 2 will be its own respective ids.

In conclusion, user events play an important role in ranking and AI model training, And just like the catalog data it is imperative to have the correct data for optimal search performance.

--

--