Interactive Bid Requests w/ Apache Druid
Online advertising is the process of app or website publishers offering advertising opportunities to buyers at a massive scale. Advertisers receive hundreds of thousands of requests per second from publishers (usually from an exchange) and must evaluate each bidding opportunity within 100ms.
During this process, a bid request is sent to a DSP’s bidder in order to determine if there’s an ad that can be shown for that incoming request. If the bidder determines that the incoming request matches a given targeting criteria a bid is returned along with the bid price.
Targeting is a critical part of the process, audience availability can vary upon many things such as desired geographical region, device vendor, publisher, creative attributes, etc. In the process of pitching a campaign to a brand, marketers must be able to determine how quickly a budget can be spent given the brand’s targeting criteria. Marketers need to be able to predict the audience availability for specialized campaigns that they pitch.
Bidder operators can easily give their sellers, marketers, and users, the ability to specify a targeting criteria and determine historical bid requests that matches that criteria and how well certain campaigns are expected to perform. By tapping into Druid to analyze bid requests, and Superset to build specialized dashboard, we’re going to show you how to provide these reporting tools to your team and users.
A perfect example of this kind of functionality is Facebook Advertising’s platform. Within their platform, a user uploads a creative, and as they are setting up the campaign’s targeting criteria, they can visualize the size of the audience they’ll be able to reach.
In this post we’ll setup some basic tools and data schemas that can give us the foundation to build powerful features such as Facebook’s campaign planner.
Frameworks & Tools Required
- PostgreSQL v9.6.3
- Druid v0.10.0
- Ruby / Sinatra
- Superset v0.15.5.0
Building a Basic Bidder
The first thing we’ll define is a set of raw bid request examples. These are easily available from the OpenRTB specifications. Within a bid requests there exists sets of publisher defined attributes such as the user’s device id, geographical location, application name, domain, etc.
We’ll start off by building a simple bidder that doesn’t actually bid on anything. The bidder itself is built in a low-performance dynamic language to allow readers to most easily read the code.
The bidder starts off by parsing the incoming request, and finds or creates the resource in the master database. The is necessary to store mutable bid request information such as mapping a publisher’s name with their id. We define an id provided in the bid request as an external id and an id stored in our database as a canonical id. This allows our bidder to handle multiple exchanges and publishers that may have overlapping external ids.
Our master database contains a mapping of canonical ids to external ids along with mutable data such as a name.
It is important to note that low cardinality ids can be defined as dimensions to allow for groupBy queries. High cardinality ids such as device ids should not be dimensional ids. This is because Druid pre-aggregates rows based on unique dimensional values. Using high cardinality dimensions will create a large amount of rows and severely hurt performance.
Our simple bidder will then serialize the internal representation of the bid request, which includes canonical ids, and send it to Druid via Tranquility.
For the purposes of this tutorial, we will use a device id and user id to build HyperLogLog backed counters. The operator can then view number of unique user and unique devices for any given targeting criteria.
For the full simple bidder source code, specifications, and example data see:
The main service classes are lib/services/bid_request_loader.rb
. This class calls all resource loaders that upsert the given resource to PostgreSQL. Below you will also find the simple device_loader.rb
that is in charge of upserting devices to our PostgreSQL database.
Basic Bidder: Architecture
The best way to understand what happens during the processing of a bid request is to look at the source code. Below is an overview of how all resources are loaded and persisted in PostgreSQL. Once all resources have been extracted/updated from PostgreSQL, a materialized bid request is created which is just a JSON message sent to Druid for analysis.
Configuring Druid: Defining a Schema
Now we need to define a schema for Tranquility so that it knows how to index the incoming data from our simple bidder. Below is an example of a Materialized Bid Request:
The ids are such as site_id
, device_id
, and user_id
are the primary ids in the PostgreSQL tables.
We’ll then translate this schema into a configuration file with an ingestion spec
to describe our dimensions and metrics.
If you’re using a Hadoop indexing task
or a middle manager
the ingestion spec would be the same. You may have to tweak other properties depending on your setup.
Along all of our dimension definitions, we’ll count: bid_requests, unique user ids, and unique device ids
. Unique counts are backed by Druid’s HyperLogLog implementation.
Simulating Bid Requests
Now that we’ve got a simple bidder live, and Druid ready to ingest the data we’re now ready to simulate bid requests to our bidder.
If you’ve configured the simple-bidder locally, you can run the spec
and specify a Tranquility URL
.
export TRANQUILITY_URI=http://192.168.1.2:8200/v1/post/events-v1
bundle exec rspec
You should now have some test data in Druid ready for analysis and visualization.
Setting Up Superset
First setup the events data source
in Superset
, the process of installing and configuring a druid data source
in Superset is beyond the scope of this post. Usually a data source
is automatically added by Superset when using the automatic importer.
If the automatic importer doesn’t properly detect your metrics and dimensions you may have to manually set them up in the Edit Data Source -> List Druid Column
page.
Below is a screenshot of all the metrics setup in Superset
.
Once you’ve defined all your columns in superset
we’ll go ahead and define our metrics . The JSON for the bid_requests
metric is: {"type": "longSum", "name": "bid_requests", "fieldName": "bid_requests"}
.
We’ll also need to define metrics for unique_device_ids
and unique_user_ids
. The JSON for the unique_device_ids
is {"type": "hyperUnique", "name": "unique_device_ids", "fieldName": "unique_device_ids"}
. Notice how these JSON definitions are derived from Druid’s query specification.
Now let’s create a simple report for visualizing number of bid requests. We’ll create a very simple report to show number of requests grouped by country, region, and city
.
Once your report configuration is ready, you can go ahead and render the visualization by hitting query
. We’ll now save our report as a slice
and name it geo-bid-requests
. Slices
can be used to build dashboards composed of various reports and visualizations.
Once we’ve built a few basic slices we can now use them to build a customized dashboard that we can then share with our team.
Now that you’ve built a powerful dashboard with no development time. You can share your creation with the rest of your team using a URL like http://54.191.226.252:8088/superset/dashboard/bidrequests7days/.
Superset is a powerful visualization tool in conjunction with Druid. There’s plenty of fields we didn’t even touch that can be subjects of future posts such as: age, gender, blocked_attributes, dimension, etc
.
In Conclusion
We’ve learned what programatic advertising is, how to build a basic bidder that doesn’t actually bid anything, how to store bid request data in a persistent storage, and how to analyze them with Apache Druid and Superset.
This is the foundation to a set of posts in a series featuring Druid for AdTech. Please support our products and services if you wish to see more of these types of posts. Don’t forget to leave your comments below.
Shaman: Cloud Hosted Druid
If you’d like to tap into your bid requests streams in real-time and build powerful dashboards for your team without breaking your wallet then Shaman is for you.
Shaman is a self-serve platform for deploying single node and multi-node clusters to the cloud.
Avoid the pain and time of setting up Druid yourself, stop wasting developers’ time building internal reporting dashboards, and stop paying for solutions that simply cost way too much.
Shaman is in private testing and is scheduled to go live to the public soon. If you’d like to schedule a demo, please contact us at: contact@zero-x.co
.
About
Miguel Morales, Founder and CEO of ZeroX.
ZeroX provides Hosted Druid Clusters, Data Services, and AdTech Consultancy Services.
Contact us at contact@zero-x.co
and let’s find out if there are any projects we can take off your hands.
Also, find out more about me on LinkedIn and don’t hesitate to connect!