Part 1: Streamlining Data Aggregation for E-Commerce at Loblaw Digital

Colin Barber

Published in

Loblaw Digital

4 min readAug 17, 2023

Co-authored by: Colin Barber, Vincent Zi and Indrani Gorti

Introduction

In the ever-evolving landscape of e-commerce, delivering exceptional customer experiences is a top priority for businesses. To achieve this, Data Engineering plays a crucial role in providing the right data at the right time to backend services. One essential aspect of Data Engineering is the aggregation of product metadata. This article explores how Loblaw Digital utilizes cloud technologies to simplify product metadata aggregation, enabling personalized recommendations on web and mobile applications.

Understanding Product Metadata

Product metadata encompasses valuable information such as product descriptions, pricing, and inventory status. Consolidating and organizing this data brings significant benefits to both Business Intelligence and Analytics teams, who gain valuable insights for decision-making, and Data Science teams, who unlock new use cases for advanced machine learning services.

Data Domains and Data Contracts

At Loblaw Digital, we have a number of different teams all using and generating different types of data. When sharing data between teams we establish a Data Contract, which specifies the data model being shared. The data contract specifies all the data fields and types, and whether each field is required or nullable. This model can be used for Schema Validation, and any Schema Validation failures can automatically alert both teams involved in the Data Contract.

Data Contract Example

Here are some json examples of a data model called “Promotion.” We might use this to describe different types of discounts we could offer our customers.

{
  "code": "promo1",
  "reward_type": "$_OFF",
  "rewards": [{"value": "100"}]
},
{
  "code": "promo2",
  "reward_type": "$_FIXED",
  "rewards": 
    [{
      "value": "100", 
      "restrictions": null
    }]
},
{
  "code": "promo3",
  "reward_type": "$_OFF",
  "rewards": 
    [{
      "value": "100", 
      "restrictions": {"spend": {"minimum": 100}}
    }]
}

Here is how we might describe this data model using a Data Contract:

As we can see, each field in the data model can be a simple data type such as String or Integer, or it can be a new type that we can describe using the Data Contract.

In other words, the Promotion type data model contains a Reward type field, which contains a Restrictions type field, and so on. Every field has the following attributes specified in the contract:

Field Name
Field Type (Primitive types such as String, Integer, Decimal, or Composite types such as “Customer,” “Order,” “Event,” “Disjunction[String, Integer],” etc.)
Field Mode (Required — must exist AND must not be null, Optional — can exist and be null OR can be missing, Repeated — 0 or more instances)

By making each of these attributes explicit in the Data Contract, we can be sure that we are always getting the data as intended, and any changes to the data model are caught immediately before they can cause problems for downstream consumers.

Aggregating Product Domains for Real-time Recommendations

Product metadata can be generated by different teams. Sometimes our data consumers need all the data in one place, and if latency between data generation and consumption is not an issue we can ingest each domain’s data models into our BigQuery Data Repository, then join and serve the relevant tables as views using DBT. However, in order for our Helios Recommendations Engine to work, we need the latest metadata to be accessible in as close to real time as possible.

Introducing the Product Domain Aggregator (PDA)

To streamline the product metadata aggregation process, Loblaw Digital’s Data Engineering team developed an innovative solution called the Product Domain Aggregator (PDA). Acting as a central hub, the PDA collects and processes in real-time all product-related data from various sources, including the Product Catalog Service (PCS), the Pricing and Promotions Engine (PPE), and the Helios Inventory Service. We will take a look into the structure and inner workings of the PDA in Part 3.

Delivering the Data: Data as a Service

In order to ingest and deliver the data to the PDA in the most efficient and robust manner, the Data Engineering team created a unified framework based on GCP services. This framework has the following features:

Schema Validation
Reusable Conversion to and from Pubsub, BigQuery, BigTable, Firestore, Cloud Storage
Deadletter Reingestion
Monitoring and Alerting
Real-time and Batch Delivery
+ more!

Stay tuned for Part 2 where we will take an in-depth look at Data as a Service.

Part 1: Streamlining Data Aggregation for E-Commerce at Loblaw Digital

Written by Colin Barber