Eats data platform: Services for machine learning and audience segmentation

DaaS at Coupang Eats for efficient feature engineering and audience segmentation to support business expansion — Part 2

Coupang Engineering
Coupang Engineering Blog
7 min readSep 2, 2022

--

By Fred Fu

This post is also available in Korean.

As discussed in our previous post, the Coupang Eats food delivery business is advancing to automate data intelligence systems that accelerate business growth. We’re using machine learning (ML) models to predict optimal order assignments for our Eats delivery partners (EDP) and our business analysts (BA) use data to provide customers with customized promotions.

You can see the Eats data platform architecture in Figure 1 with its four main stages and three data processing pipelines that help us solve data needs for business scenarios with differing time sensitivities.

Overall architecture of the Coupang Eats data platform
Figure 1. The overall architecture of the Eats data platform

In this post, we want to discuss some specific data uses in the business and how we developed Data-as-a-service (DaaS) to support such needs in a cost-efficient and quick manner. The most prominent use cases for our data include feature engineering, audience segmentation, user tagging, and business indicator calculations. We will illustrate how our data platform improved the efficiency of our data services by elaborating on two examples: feature engineering and audience segmentation.

Table of contents

· Feature service
Slow and inefficient
Solution
· Audience segmentation service
Offline and inefficient
Solution
Business benefits
· Conclusion

Feature service

Feature engineering is the process of extracting attributes from a dataset for ML training. Features can improve your model’s performance and augment training data when it is not readily available. At Eats, a lack of data is not a problem. In fact, we have too much data and too many features to calculate in real-time, near real-time, and offline.

Slow and inefficient

Before our data platform, feature engineering was a long and laborious process. Without a centralized system, data engineers, data scientists, and BAs spent long hours on feature engineering.

First, data engineers analyzed the feature requirement developed by data scientists. Then, the engineer had to develop the ingestion pipeline on-demand and perform the necessary data transformations to extract features. During this process, the engineer had to deal with numerous fragmented data systems and applications. Developing one pipeline for feature engineering involved collaboration between multiple teams and completely ad-hoc engineering. Preparing a single new feature for production took one to two weeks.

In addition to the slow pace, multiple teams or multiple domain owners were building separate pipelines for the exact same feature. This was not only a huge cost in engineering resources, but also a toll on the computing powers of our systems.

We were beginning to apply powerful ML models to more areas of the business, but the feature engineering process became a huge source of bottleneck.

Solution

To calculate a massive number of features that can be served to models in real-time, near real-time, and offline, we built the feature service as an integral part of our Eats data platform (in fact, the data processing pipelines were initially built just for feature services).

The Eats feature service is a one-stop ML feature discovery, processing, producing, and serving platform. It includes feature metadata management plus data governance management. The system is built on top of our data processing pipelines and supports features built-in non, near, and pure real-time calculation engines.

Feature service architecture of the Coupang Eats data platform
Figure 2. Feature service architecture

The feature services offer features groups, through which a team can manage a defined set of features for a specific business scenario. Even if one feature group requires different data points about the customer, EDP, or merchant in non, near, and pure real-times, the feature service handles each request and aggregates the results to the user.

As an example, let’s go through how to make a near real-time (NRT) feature group on our service. First, users must configure details to generating NRT features in production as SQL commands for the OLAP engine. Next, users map the SQL output column value to the name of their feature to help them generate values to the correct groups. Then, users simply wait for the feature service to do its work and output features to the feature store.

Feature service interface of the Coupang Eats data platform
Figure 3. The feature service interface. Users first group the features as SQL commands (left) and then map the appropriate columns to the feature (right).

Beyond feature generation, the feature service also provides online serving for model predictions, which frees our backend, data engineer, data scientists from the complicated and inefficient process of feature engineering.

Audience segmentation service

Suppose we want to show targeted ads to certain segments of our customers or conduct A/B tests in a smart way that will have minimal impact on business. In such cases, it is important to carefully segment the target audience of your task.

Eats uses segmentation for several business scenarios, including:

  • finding inactive users in certain areas of Seoul for re-engagement through discount coupons
  • encouraging offline EDPs during peak hours to go online and alleviate the EDP supply shortage
  • designing optimal control and test groups during A/B testing for EDP missions to measure the efficiency of various EDP mission bonuses without disrupting business

Offline and inefficient

All the scenarios mentioned above require sophisticated analysis of target audiences and segmentation based on predefined criteria.

Before our data platform, the typical flow of the segmentation process was as follows:

  1. The BA sets the target audience segmentation according to the business requirement.
  2. The BA writes an SQL query to find target users and informs business operations (Ops) to download SQL results as an Excel spreadsheet.
  3. Ops uploads the Excel spreadsheets to a separate business operation system in Eats to create the necessary task (customer coupon, EDP mission etc.).
  4. On top of that, if ops want to conduct A/B experiments within the task, the target audience must be segmented multiple times using the Excel sheet and uploaded repeatedly.

The workflow involved many offline steps, such as manual upload of Excel sheets, and coordination meetings between multiple teams, which slowed down the process and impacted business growth.

Solution

The audience segmentation service of our data platform is a close-looped, one-stop system that aims to help our business operators find the right target audience to support their business decisions or conduct A/B experiments.

Segmentation service architecture of the Coupang Eats data platform
Figure 4. Segmentation service architecture

Like the feature service, the segmentation service is built on top of the general data processing pipelines. Business ops can select filtering criteria according to pre-calculated profile tags on the OLAP engine and generate the segmentations automatically, whenever necessary.

Business benefits

With the help of the segmentation service, business operations can now self-service audience segmentation for all tasks. For example, BAs who wish to create a new EDP mission can filter target users in the segmentation portal using a diverse range of flexible combination of different filter conditions. This creates a new segment.

If BAs want to create an A/B test for the missions, they simply need to enable the “Split Segment” option to create child segments under the parent segment. Then, in the mission creation page, they simply need to refer to the segment ID.

Segmentation service interface of the Coupang Eats data platform
Figure 5. How to create an audience segment on our segmentation service. To enable splitting the audience for A/B testing, users simply have to enable “Split segment”.

By integrating this segmentation service, the operations team gains incredible human resource savings in design and creating tasks. The time spent for creating EDP missions was reduced from an average of 510 minutes to 260 minutes for individuals and from 8775 minutes to 4550 minutes for teams. The segmentation service saved individuals and teams a 49% reduction in time spent on creating EDP missions.

Beyond EDP missions, the segmentation service is also currently used by customer relationship management (CRM) and promotion teams to build an intelligent promotions system for customer loyalty and growth.

Conclusion

At Eats, the priority is the customer. We are continually upgrading our technology to ensure our customers receive their orders in a timely manner no matter what. The DaaS outlined in this post is just the beginning of our journey toward an intelligent system that reduces human engineering costs.

Do you want to join a team innovating food delivery services using data? If so, view our open positions.

Twitter logo

Coupang Engineering is also on Twitter. Follow us for the latest updates!

--

--

Coupang Engineering
Coupang Engineering Blog

We write about how our engineers build Coupang’s e-commerce, food delivery, streaming services and beyond.