THG’s Ingenuity and Google Cloud

Shaun Hall
THG Tech Blog
Published in
3 min readOct 31, 2018
THG’s tech platform, Ingenuity, powers some big brands.

The brands running on Ingenuity’s e-commerce platform attracted 500 million customers and generated £736m revenue last year. Automated marketing and data analytics are key to this success, and here’s how we use GCP (Google Cloud Platform) to power this part of our platform.

We capture both frontend and backend customer interations to analyse how customers interact with site features and check platform health.

The raw data from the customer isn’t quite enough to be useful on its own. We need to add information about fx rates at the time and detect bots — they skew a/b test analyses and we don’t want to waste resources trying to convince them to buy. This is done by the Data Enrichment Application, which we run in GKE (Google Kubernetes Engine).

Snippet from the Data Enrichment Application

The enrichment app needs to issue a NACK to reprocess the message when there is a transient issue, but should ACK the message and throw an alert if there is a more serious issue with the message to avoid an infinite reprocessing loop.

Here’s how the enrichment app performs during a Black Friday load simulation

We’re using CPU-based horizontal pod autoscaling (HPA) in kubernetes to scale the enrichment app when it gets busy. We also have a variable number of nodes in our kubernetes cluster — you can see new nodes being started at 10:25 above. A small backlog builds up during the test because the CPU scaling metric is just an approximation for the true metric we want — pubsub backlog size and growth rate (we could implement this with custom metrics).

We use a combination of pub/sub and bigquery to support both realtime and batch applications

Some applications need to run in realtime, such as site health monitors, and others run on a batch basis, like product recommendations. We can cater for both of these models by writing an adapter layer, which streams from Pub/Sub into BigQuery. We’ve experimented with writing the adapter in GKE as well as Dataflow, and the Dataflow scaling is much more effective, although the programming model is more restrictive (it’s declarative, based on the Apache Beam SDK):

Snippet from our Pub/Sub to BigQuery adapter Dataflow implementation

Here, we’re batching together events into 1 minute-wide intervals and writing into the relevant date-partitioned table. There are a few subtleties around how we determine time (it’s data in the event body rather than arrival time) and handling late arrivals (we keep the window open for 30 minutes so we can tolerate a backlog).

Business Applications

Product recommendations are responsible for between 1–4% of all revenue on our sites. Some are personalised and calculated in realtime, and others look at overall customer patterns and are calculated as batch jobs.

Most of our platform runs on THG’s data centres, and the 20ms RTT between those and GCP is too high to block user-impacting requests, so we pull the relevant data for our algorithms into our DC from GCP.

Green lines are calls on a latency-critical path.

We have a number of realtime dashboards around the office visualising technical and business metrics.

The “cashboard” (left) and global speed checker (right)

When the cashboard lights up on Black Friday, it’s time to party!

--

--