Prebid Integration @ 10ms: Part 1 — Latency
In this series of blog posts, we are going to discuss the process of integrating Verity, GumGum’s Contextual Intelligence and Brand Safety service, with a Demand-Side Platform (DSP) used for matching ads to content. In this first installment of the series, we will examine the topic of Real-Time Bidding (RTB), and explore how data providers such as Verity need to handle high volumes of traffic at extremely low latencies.
Verity is GumGum’s brand safety and contextual intelligence solution. It is a service used by publishers to intelligently index content, and by advertisers to ensure that highly relevant ads are matched with content. Verity is built on the AWS cloud platform and currently resides in the AWS Virginia data centers (aka us-east-1).
Verity uses custom Computer Vision (CV) and Natural Language Processing (NLP) models to analyze content using state-of-the-art Deep Learning models. The service is capable of analyzing content in the form of text, images, audio, and video. Currently, Verity supports the following endpoints:
- page — given a URL, return a response based on the URL’s contents (text, images, audio, and video)
- text — given a block of text, return a response
- image — given an image, return a response
- video — given a video, return a response
The Verity response consists of several components, including
- IAB category — the categories of the content (e.g., “Cooking” or “Tennis”)
- Threat levels — Amount of “threat” in various categories such as nudity, drugs and alcohol, violence and gore, etc.
- Keywords — relevant keywords from the article (e.g., “White House”, “Joe Biden”)
- Sentiment — the tone of the content: positive, neutral, or negative
Verity plays the role of a data provider in real-time bidding scenarios. One variant of real-time bidding is known as client-side header bidding and is illustrated by the diagram and steps below.
- Advertisers register contextual segments of interest with a DSP, indicating the target of an ad campaign. For example, a segment could be COOKING or TENNIS.
- A Viewer enters a URL into a browser request content. For example, John Doe requests an article on a recent tennis match from the New York Times website.
- The browser requests a page from a Publisher’s server.
- The Publisher’s server serves the page to the Browser.
- The page is displayed to the user. The page contains one or more empty boxes where ads will be placed. These boxes are called Ad Slots.
- Each Ad Slot makes a call to the DSP to request an Ad to fill the slot. The payload of the request contains (among other things) the URL of the parent page.
- The DSP makes calls to all the data providers that are associated with the ad campaign. Verity is one such data provider.
- Verity returns contextual segments that represent the page’s contents (e.g., TENNIS).
- If there is a match between an advertiser’s registered segments and the segments returned by Verity, the DSP returns a bid to the browser, along with an associated ad creative.
- The browser holds an auction among the bids it has received, and the ad associated with the winning bidder is displayed to the Viewer.
The Demand Side Platform (DSP) that Verity should integrate with has the requirements:
- 2 DSP locations — Los Angeles and New York City
- 100K Requests Per Second (RPS) per data center
- 10 milliseconds (ms) round-trip response times (aka latencies). This constrains step 7in the RTB sequence described above.
When we began the integration, Verity’s Service Level Objective (SLO) was 100ms, well above the 10ms required by the DSP. We attempted to pare down latency, optimizing the system by identifying and eliminating bottlenecks.
The first bottleneck that we identified was related to data access. Verity required several reads to DynamoDB to determine whether Verity had previously processed a page. By consolidating and denormalizing multiple DynamoDB tables, we were able to reduce the number of reads, resulting in savings in the 10s of milliseconds.
The second bottleneck we identified is the Java Virtual Machine (JVM). Since JVMs rely on garbage collection, significant latencies can appear in the event of stop-the-world garbage collection. We explored optimized JVMs such as Azul, but our experiments did not reveal a dramatic improvement using such platforms.
Overall, we were able to pare down our response times to around 30ms. This was still well above the required SLO of 10ms.
At one point in the effort, we took a step back and asked ourselves whether it was possible to achieve the 10ms SLO. The obvious answer is certainly since the DSP has other data providers that presumably are meeting the same SLO. However, it is worth considering the effect that geographic distance might have on response times. Recall that the DSP is located in Los Angeles and New York, and Verity is located in Virginia.
The minimum theoretical time a message could travel from the DSP to Verity and back is limited by the speed of light. Image an experiment composed of a laser, a mirror, a light detector, and a timer. The laser and detector are collocated with the DSP’s location, and the mirror is at Verity’s location. If we switch on the laser and measure the time elapsed between turning on the laser and detecting the light, we have a number representing the fastest possible round-trip time.
We can calculate the lower bound on response times using the formula time = distance/rate. This is equal to two times the distance between the DSP and Verity divided by the speed of light (299,792,458 m/s). This produces:
New York City → Virginia → New York City: 2.7 ms
Los Angeles → Virginia → Los Angeles: 27.1 ms
Consider the case of New York City → Virginia → New York City. If the internet operated at the speed of light, and Verity could generate a response in under ~7 ms, we would be compliant with our 10ms SLO.
Now consider the case of Los Angeles → Virginia → Los Angeles. The speed-of-light roundtrip time is 27.1ms, well above 10ms. This means it is impossible to meet our SLO with the existing geographical configuration.
This thought experiment raises an interesting question, namely, what is the actual speed of the internet? To answer this question, we performed a series of empirical experiments to measure internet speed.
Experiment 1: Nginx
In this experiment, we fixed the location of a client and varied the location of the server. The client was a simple script that hit the server every second and measured the roundtrip latency. The server was a simple Nginx server, deployed with no custom configuration. We ran the script for ~10 minutes for each location of the Nginx server. The client was run in a Los Angeles AWS Local Zone, and the server was placed at the following locations:
- localhost — on the same machine as the client
- same local zone — in the same data center as the client, but on a different machine
- us-west-1 — in Northern California
- us-west-2 — in Oregon
- us-east-1 — in Virginia
The following table shows the latency measurements. The leftmost columns represent percentile latencies, and the 3 rightmost columns are the minimum, maximum, and mean latencies.
The figure below is a box plot of the latency data.
Let’s focus on the p90 column, representing the 90th percentile response time. When the server is on the same host as the client, the response time is 2.0ms, well within the 10ms SLO. The same is true when the server is in the same data center, or a different data center in the same city, resulting in latencies of 2.3ms and 4.5ms respectively.
When the server is placed in a different city in the same state, the latency shoots up dramatically. Between Los Angeles and Northern California, the latency goes up to 21.2ms. And from Los Angeles to Oregon, the latency is 55.1ms. When going across the country from Los Angeles to Virginia, the latency shoots up to 134.7ms.
Consider the Los Angeles to Virginia latency of 134.7ms. This is about 5 times the time of the theoretical lower bound of 27.1ms. Using this relationship between the speed of light time and internet time as a multiplier, we can calculate the estimated response time between New York City and Virginia: 2.7ms x 5 = 13.5ms. If this calculation is accurate, it means that it would not be possible to hit the 10ms SLO from the New York City DSP client.
Experiment 2: Pingdom
We further explored internet times by taking measurements from Pingdom, a service that measures uptime and latency by periodically pinging services from various locations around the world.
Focusing on the p90 latencies, we see a clear relationship between distance and latency. The minimum distance, from Philadelphia to Virginia, results in a p90 latency of 285ms. The client farthest away from the service of Thessaloniki in Greece results in a p90 latency of 768ms, almost a full second. All latencies reported by Pingdom are well above the 10ms SLO.
Our experiments demonstrate a clear relationship between distance and round-trip response times (aka latencies). This data provides solid evidence that meeting a 10ms SLO with a purely cloud-based solution is impossible given the existing geographical relationship between the DSP clients and the Verity server. In order to meet the 10ms SLO, it is essential that the Verity server must be co-located with the DSP client.
Stay tuned for the next part of the series where we describe co-locating Verity with the DSP.
The engineering work for this effort was performed by Florian Dambrine, Edwin Galdamez, and David Williams.