Enabling Personalized Marketing with Braze Connected Content at foodpanda

Published in

foodpanda.data

8 min readFeb 13, 2023

What is Marketing Personalization?

Marketing personalization, also known as personalized marketing or one-to-one marketing, is the practice of using data to deliver brand messages targeted to an individual prospect or in other words, creating a unique experience tailored to a customer. Done right, personalization enhances customers’ lives and increases engagement and loyalty by delivering messages that are tuned to — and even anticipate — what customers really want.

In this article, we will break down an example of how we use Personalized Marketing to run digital campaigns, scaling challenges we faced along the way, our optimisation efforts as well as the results.

The Problem Statement

Here at foodpanda, we utilise a Customer Engagement Platform called Braze for our marketing campaigns. With Braze, communications such as Electronic Direct Mail (EDM) and Push Notifications can be sent out to targeted groups of customers based on predefined Custom Attributes, also known as the customer’s unique traits.

These Custom Attributes are created in our Data Warehouse (BigQuery) and are then pushed to Braze daily for our entire customer base, across 11 markets in APAC. This has proven to be:

Slow. We are limited by the API rate limits of sending data over to Braze. Our pipelines to update Custom Attributes in Braze can range from 2 to 20 hours daily depending on the delta that day. Further optimizations could drive down this time, but the bottleneck remains that each POST request can only handle up 75 data point updates.
Attributes VS Content. Custom attributes are meant to be filters on the customers based on their attributes and not meant to hold “content” (e.g. urls/jpegs/gifs). Ideally, we should use an alternative solution to store “content” for each user, which will then allow us to add “personalization” into the Campaigns.
Costly. Currently we are charged based on the amount of data points we update in Braze, and this has proven to be costly for us.

Researching for possible solutions led us to BCC

Braze Connected Content (BCC)

What is Braze Connected Content? BCC is a feature that expands on marketing personalization to boost customer engagement and conversions. This feature allows you to insert any information accessible via API directly into messages you send to users. Braze uses Liquid template to configure messages sent to users. Connected Content allows for pulling content either directly from your web server or from publicly accessible APIs.

For example:

Example of Liquid templating with Connected Content to retrieve data from external APIs

As we were already integrated with the Braze platform, we wanted to test this feature out as it seemed to be exactly what we needed for Marketing Personalization. This would help us solve the above problems:

(1) Faster Pushes. We can also avoid long automation run-times imposed by Braze’s API Rate Limits on updating Custom Attributes.

(2) Personalized Content. We can now add personalized content into our EDMs or push notifications, as Braze Connected Content is meant for this use case.

(3) Manage Cost Better. If we use an external bucket (AWS S3 in our case) to store the “Content” to be used Braze Campaigns, the storage and data ingress/egress cost are cheaper as compared to the daily updates of Data Points in Braze. Cost comparisons will be covered in a future post.

Other Considerations & Tools Required

In order to build a pipeline around BCC, we needed an automation tool that could run on a specified schedule as well as a Storage Endpoint that could store our data and allowed API requests to be made to retrieve the data.

After some consideration, we ended up with the following:

1. Automation: Apache Airflow

We decided to use Apache Airflow for automation. By chaining together various tasks (run by Airflow Operators), we were able to set up a pipeline that runs automatically based on our specified schedule to upload our BigQuery data to an endpoint, where BCC could then make requests to retrieve the data.

2. Storage Endpoint & API Server: AWS S3 & API Gateway

After much consideration and testing, we have decided to use an AWS S3 Bucket to store the personalized content for our customers, and AWS API Gateway to allow requests to be made to the Storage bucket.

The following considerations were made:

Need to support an API/URL GET which accepts 3 parameters: campaign name, country and user id

GET /:campaign_id/:country/:user_id

Needs to be secured; no one can access the user data without some credentials. Hence proper authentication is needed.
Braze delivers messages at a very fast rate, so the server needs to be able to handle thousands of concurrent connections without getting overloaded.
Braze requires that server response time is less than 2 seconds for performance reasons.

We ended up with a pipeline like this:

High Level Overview of Braze Connected Content Pipeline

The Implementation

Using Airflow, our main challenge was how to deal with converting a BigQuery table into millions of JSON files, and transferring them into the AWS S3 Bucket. The high level overview of the tasks required would be:

download the table from BigQuery into a pandas dataframe
extract row by row from the dataframe, and store the data into individual json files in memory
in multiple batches, send these json files in memory to AWS S3 Bucket

The real challenge here, as data is being passed between steps (1), (2) and (3), all three steps needs to run within the same kubernetes pod in order to utilize the local memory.

Other Challenges: Long Runtimes

Initially, we built the pipeline to send updates to AWS S3 sequentially (i.e. updating one Customer per API Call). Just from load testing the pipeline, we estimated that it would probably take above 20 hours for completion to make around 1 million requests to our bucket.

This cost us in two ways:

Hogging of compute resources (Kubernetes slots) which deprive other DAGs in our data pipeline.
Higher failure rates. AWS also has a limit on the maximum number of concurrent requests; you can send 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an Amazon S3 bucket.

While compute resources can be increased, the real problem was a high failure rate due to making sequential requests.

Optimisation

Thus began our optimisation efforts and we restructured the Airflow DAG to increase concurrency by doing the following:

Sending by BATCH instead of sequentially (e.g. from 1,000,000 json files individually to 5,000 files per AWS connection)
Paginating the BigQuery table into smaller segments to be sent, using SQL ORDER BY, LIMIT & OFFSET clause to ensure the pipeline is idempotent
Generate multiple concurrent DAG tasks for each page using Airflow’s TaskGroup
Allocate more CPUs and memory to each concurrent tasks (running on a Kubernetes Pod)
Using a dedicated Airflow DAG pool
Using AWS S3 Sync instead of AWS Boto3 SDK

(2) Pagination SQL using ORDER BY, LIMIT and OFFSET

The SQL above, together with Airflow’s TaskGroup allows us to concurrently download data in segments from our BigQuery table, load them into individual json files in local memory, and pushing them to s3.

(3) Snippet of our DAG for Pagination by splitting in to multiple TaskGroups

For the push to s3, we were able to find the optimisation in Step (6) to use the aws s3 sync command, which was very effective in reducing runtime.

Initially, we only used the AWS SDK boto3 but we faced difficulty customising the concurrency and size of our pushes and decided to use the AWS CLI. Even though aws s3 sync command actually uses boto3 itself, it uses threading to copy multiple files simultaneously, so the copy operation takes less elapsed time. Shown below is a snippet of our custom operator, where we configured AWS CLI within our Worker Pods, and ran the aws s3 sync command from the pod.

(6) Snippet where aws s3 sync command is called inside of Kubernetes Pod

With all the optimizations in place, we end up with a dag looking like this where load_to_s3 task is an Airflow TaskGroup, split into multiple mutually exclusive segments sending the json files to our S3 bucket.

**Visual Graph Representation of our Final DAG** (*ignore latest_ & patch_*)

**Pagination for loading the json files to s3**

The final result of our optimisation?

Over >10x faster run times.

Our current pipeline has proven to scale well. For example, a push of over 10 Million customers would take slightly over 2 hours to complete. This is a huge improvement over the initial 24 hours for only 1 Million customers.

Current DAG run-time pushing **over 10 Million** json files to S3

After the millions of individual json files have been populated into our S3 bucket, the last step which cannot be automated, is for our Customer Relationship Management (CRM) team to pull this data into their Campaigns’ EDM Template in Braze, using Braze Liquid Template to make an API call to the AWS S3 bucket.

The Individual json files synced to S3 Bucket

The Results Themselves 🎉

We successfully utilised the Braze Connected Content pipeline to send the outputs of a Machine Learning Model that generated 6 personal recommended quick commerce products, to each customer!

The recommendations that were generated in a table in BigQuery, were piped all the way into the EDMs that customers are receiving on their digital device.

EDMs showing **recommended pandamart Products** that are **Personalized for each Customer**

By leveraging personalization in our marketing campaigns, we are able to use what we know about our customers and personalize their experience, ultimately driving engagement up.

Moving forward, foodpanda is better able to effectively run omnichannel campaigns with personalized customer content, and identify and leverage the ones that work best. This also enables our Data Scientists to evolve our marketing strategies and processes, by ensuring that the output of their models can be seamlessly integrated and piped into Braze for marketing.

In essence, a general guideline for using the two Braze functionalities:

For pushing unique attributes of customers (e.g. Segment Types, Personal Details for targeting) — use Braze Custom Attributes.
For more personalized content/advertising (e.g. Product Recommendations, Campaign Information, etc) — use Braze Connected Content.

We are now in the process of rolling out this Campaign across all 11 APAC markets and experimenting with more use cases for this pipeline.

Credits

Special thanks to Wei Li and Zheheng for the great business partnership, Garen and Victor for engineering support, Mujahid and Tara for the ML recommendations, Dragos for support with Braze Liquid template, Junrong for the pilot tests, and everyone else for their dedication and help with this project.

This article is by no means the perfect pipeline nor the only way to work with Braze Connected Content, but rather serves as an example of BCC’s personalization capabilities, as well as documentation on what we have experimented with and what works for us. Hopefully, this helps!