ClickHouse at LifeStreet: Performance Marketing is as Strong as Your Data Platform
EDITOR’S NOTE: This is a repost of “Performance marketing is as strong as your data platform” in the LifeStreet blog originally written by Alexander Zaitsev, Altinity CTO.
We live in a rapidly changing world. The ability to discover and apply business-critical insights from petabyte datasets in real-time is now a key factor in many businesses. Digital marketing is no exception. In fact, digital marketing is now one of the major sources of Big Data. Real-time bidding (RTB) networks encourage high competition between advertisers that require a lot of technology and innovation. In old times there was a saying “Advertising is the engine of the trade”. Now we can say that “Technology is the engine of advertising”.
In this article, we will explain how ClickHouse is used by the digital marketing company LifeStreet. LifeStreet was one of the first US companies to discover ClickHouse, and that allowed it to build robust and scalable analytics for performance advertising.
Data in Performance Marketing
Today, accurately estimating the value of an ad impression requires acquiring and structuring 100–1,000 times more data compared to a decade ago.
That’s because ten years ago the amount of data used to predict an impression value was limited to traffic from the publisher’s websites running the digital ad campaign or from the first generation of publisher networks. Since then real-time bidding (RTB) networks have transformed audience targeting by connecting publishers and advertisers all around the world in a programmatic RTB auction. Now, thousands of advertisers bid for impressions from dozens of different publishers in real-time and performance marketers have more data than ever before to make reliable estimates of an ad impression’s value. Even ‘lost’ impressions opportunities help advertisers understand the market and estimate the cost of an impression better.
So, how much data are we talking about?
In the US alone there are more than 250M active Internet users, and many of them use multiple Internet devices. If we consider the world as a whole, this number is close to 3–4 billion. Certainly, not all users are eligible for all marketing campaigns but for performance marketing to be effective, advertisers need the ability to discover and apply business-critical insights from petabyte datasets in real-time to determine which impressions are most likely to respond positively to an ad. And it is the Data Management Platform (DMP) system that is responsible for collecting and storing multiple anonymized user attributes, like country, language, etc. in order to differentiate different user patterns.
This data is usually provided by an ad network, but there are also dedicated external systems that supply additional information associated with a user, for example, purchaser profiles from online web stores. Another way DMPs collect (a lot of) data is from the marketing campaigns themselves. At LifeStreet for the last several years the amount of data processed and stored tripled every 18–24 months reaching 300 billion records per day at a high watermark!
Why is data processing so hard?
So let’s break it down: 300 billion records per day means approximately 3.5 million records per second. This is a lot of data. Every bid request generates a record with hundreds of different attributes provided by the RTB network and DMP system. The record size may be several kilobytes. Those records need to be stored in a database. Simple math gives us gigabytes per second and tens of terabytes of data every day. To put it into perspective, a typical laptop has less than one terabyte of disk space, and could only store less than an hour’s worth of data.
The challenge is not only to store this data quickly and reliably but also to make it available for further analysis by humans and machine learning algorithms. Data is useless if it is not actionable. Performance marketing requires us to look back on weeks or even months of data in order to find user patterns, build performance profiles, make predictions and estimate impression value. That means several hundred terabytes or petabytes of data needs to be readily available for market analysts, dashboards and machine learning algorithms.
Think of a typical marketing data report, most have a much smaller subset of dimensions than we store because each report only needs a unique subset of the data attributes we have available. So from the giant volume of data being stored, only a fraction is needed for a particular report. But there could be many reports with different ‘views’ of the data, and different ways to navigate between those views. This is a common pattern in analytics, and database systems learn how to do it effectively with a technology called a column store.
In traditional databases, data is stored by rows, and when data is queried full rows need to be accessed. This works for small datasets but becomes prohibitively expensive and slow for reports with billions of rows. Conversely, columnar databases store data by columns that allow retrieving only those columns from the storage that are required for the particular report. This reporting architecture enables much better data compression, reducing the amount of data that needs to be processed by a single report 10,000–1,000,000 times.
Columnar databases first appeared in 1998 but were quite marginal during the next decade — the era of Big Data hadn’t yet arrived. In 2010, when LifeStreet first started using commercial columnar databases, we were one of the first ad tech companies to use this technology. But prohibitive licensing fees and real-time performance shortcomings were limiting our ability to execute real-time analytics at scale until we discovered ClickHouse in 2016.
Why ClickHouse is LifeStreet’s Data Warehouse Solution
ClickHouse was the first database management system that met all of LifeStreet’s requirements:
- Real time data ingestion from hundreds of ad servers
- Really fast ad-hoc reports. It outperformed other technologies tremendously.
- Scalability. We quickly scaled the cluster to 5PB of data.
- Ease of maintenance. One DBA can support multi-region highly available clusters.
- Cost of ownership. ClickHouse is open source so you only pay for the hardware, and ClickHouse is not demanding
ClickHouse was originally developed by Yandex, a Russian company, often considered to be the Russian Google. Yandex developed ClickHouse in order to power an analytic application for its own huge ad network. They decided to make it an open source platform following the trend established by Yahoo, Google, Facebook and other Internet giants who continue to release many open source products.
Probably the most important feature of ClickHouse is its querying speed compared to other databases. The ability to query on tables with billions of records can be completed in sub-seconds which gives a tremendous advantage to the business itself. The faster your queries run, the more data insights you can find. With previous database solutions, LifeStreet’s marketing analysts often suffered from “not-worth-trying-disease.” If something took more than 15 minutes to query — forget it. With ClickHouse, ad hoc data exploration became easy and possible.
Today at LifeStreet ClickHouse is used in a variety of ways. Primarily it’s used for storing RTB data — several trillions of rows in a huge geo-distributed cluster. This cluster serves millions of queries every day and powers reports and dashboards, machine learning algorithms, and ad-hoc exploratory analysis. LifeStreet reporting is recognized by our users as one the best in the market — thanks to ClickHouse. ClickHouse is also used for storing real-time campaign budgets, DMP data and raw logs. Using one database technology for multiple tasks reduces infrastructure and team overhead. With Apple’s iOS 14 privacy changes, the value and use of data increases exponentially.
Invest in Innovation
At the time, ClickHouse was a brand new technology that some may have considered a risky decision to adopt. But LifeStreet carefully weighed the risk, ran a lot of experiments, and tested before making a decision to invest in it. Over the years, ClickHouse has evolved, grown and matured to become the database management system of choice in ad tech, telecom, financial and among other industries where massive amounts of machine-generated data require fast analysis.
LifeStreet was an early adopter of ClickHouse and by investing in an innovative data-warehouse, it was able to build a robust reporting system that enables its business analysts to generate and learn from the detailed reporting required to optimize performance.
LifeStreet builds solutions to help mobile app developers find and grow their audiences. Its leading programmatic marketing platform empowers mobile marketers to take full advantage of programmatic advertising by offering transparency, granular controls, and high-performing ad experiences.
LifeStreet’s proprietary deep learning models drive more effective bidding, providing higher user quality, and delivering 50% more ROI than traditional machine learning models. By adjusting bids based on each prospective user’s unique value to the advertiser, LifeStreet is able to maximize ROAS and optimize to any post-install KPI, including retention, purchase value, propensity to act, and more.
Deeply trusted by app developers all over the world, LifeStreet was founded in 2009 with the vision of becoming the most intelligent, transparent, and accessible programmatic platform for mobile app developers. The company is headquartered in San Francisco, California.
Alexander Zaitsev joined LifeStreet in 2005 as the Director of Engineering and was responsible for the company’s analytical infrastructure and overseeing the development of big data technologies. In 2017 he launched Altinity.
Originally published on the Altinity Blog on July 27, 2021.