Customer Data Platform: The Hero behind User Engagement at Tokopedia

Yunita Ekawati Salim
Tokopedia Data
Published in
5 min readJun 26, 2019

As one of the biggest Indonesian technology companies, Tokopedia has hundred of million users who fulfill their daily needs through various services it offers. One of Tokopedia’s DNAs that embodies in our daily work as Nakama* is to Focus on Consumer, as each of them is special and cherished. However, with the magnitude of our user base, the challenges we face to engage and embrace them are enormous.

Consider a hypothetical case of John from Tokopedia Internet Marketing team who is in charge of building millennial user retention and at the same time promoting trending products from our merchants. Based on John’s problem, we could formulate the problem statement as below:

“Whom should receive emails about the newly arrived Yeezy shoes?”

“Which user should I notify when there are good deals on a particular Marvel comic book?”

To solve the problem, and in turn reinforcing data-driven culture, we need a tool that can assist us in understanding our customers more. Specifically, customers are not only limited to buyers; they can also either be merchants or any partners within Tokopedia ecosystem. The tool must be capable of organizing the data of million Tokopedia customers efficiently, and at the same time supports multiple use cases of data retrieval.

Illustration of customer segmentation (source tellius.com)

Here comes the Customer Data Platform.

Customer Data Platform (CDP) collects and processes data from multiple sources and unify them in a single data platform.

This collection of customer profiles is made accessible to other systems, supporting multiple use cases from company-wide stakeholders.

An example of its utilization would be as a data source for the Data Scientist team to build predictive models based on the user’s individual preferences and their spending patterns.

The internet marketing John’s use case mentioned above can take advantage of the segmentation service that comes as the features of CDP. The purpose of this service is to support marketing decisions based on certain customer criteria, divided into three types of the following data:

  1. User Data: email address, home, and delivery address, etc.
  2. Transaction Data: loyalty points, payment status, etc.
  3. Behavioral Data: search and click history, wishlist, etc.
Illustration of customer segmentation (source tellius.com)

To elaborate on how or what the service does, we will get back to John who plans to create a campaign to target “urban millennials who are inactive for the last three months”. He needs to get the list of users belong into the aforementioned segments and send them some special deals with products that may draw their interest. John will input age (User Data) and activity (Behavioral Data) criteria in the segmentation dashboard that connects into the Segmentation Services. The service will handle the request and John will be notified once the process is complete and results available.

The rest of this post will explain the technicalities of process taking place in the background while John is waiting for the segmentation result.

Illustration of the system

In general, the CDP Segmentation Service is divided into the following parts:

Data transformation

Data from multiple origins are collected, cleaned and transformed into a single customer profile database. We utilize existing Tokopedia ingestion data platform, doing the data processing job on the data lake and manage separate storage using Google BigQuery for segmentation objective in normalized data form. Cleansed and normalized data is appended on a daily basis using Apache Airflow.

Security Layer

The service is intended to accommodate broad use cases by different teams across the company. Consequently, the user may have different roles and therefore disparate access to the data. To ensure that the user is authorized, an audit in the finest granularity must be enforced. Access to our customers’ data is regulated to the column level.

Control layer

The mechanics of the main segmentation system is abstracted in this part, including the normalization and translation of user input to the database-specific language. We use the typical Golang, Postgre and Redis stack. Additionally, the service also logs and monitors all of the segmentation activity using Prometheus and Grafana in this module.

Executor

Acts as the last layer of the service, executor abstracts the data storage from the logical layer and will perform data retrieval job defined by the user-specified criteria. Big Data capable tools are presented in this layer. At the end of the job, a notification will be sent to update and notify the status of the task.

As each of the components mentioned is decoupled from one another, this setup provides scalability to the system. Changes in one layer are isolated, and each of them comes with an extensible set of tools it supports. The executor layer, for example, supports multiple ways and tools to retrieve data from our customer-profile database. You can think of this as strategy pattern in system level.

John has the list of users he plans to engage in a blink of an eye and sending them the promotional emails that they might be interested in. He empowers Tokopedia merchants and at the same time creating engagement with the potential buyers. Segmentation Service as part of Customer Data Platform is an essential tool in the customer-obsessed culture and data-driven organization. Provide the team with the right set of technology solutions and they will help to retain both buyers and sellers.

Tokopedia Data Team does democratize access to knowledge. We build cool stuff that empowers our customers. If you wanna make an impact the way we do, Yes, we are hiring!

*Nakama = Employee of Tokopedia

--

--