At Teads, we distribute ads to over 1.5bn people every month within professionally-produced content.
One of the main components of our platform is responsible for handling bid requests (an opportunity to display an ad) and for sending back a bid response (the ad to display and the associated price).
An advertising campaign can be set up with delivery constraints :
This post is the second episode from the “Spark from the trenches” article series. In the previous post, we’ve covered best practices and optimization tips.
We will continue to dig into some real-world situations that we have dealt with and focus on two topics:
Spark is the core component of Teads’s Machine Learning stack. We use it for many ML applications, from ad performance predictions to user Look-alike Modeling. We also use Spark for processing intensive jobs like cross-device segment extension or Parquet to SSTables transformation for loading data into Cassandra.
Working with Spark we regularly reach the limits of our clusters’ resources in terms of memory, disk or CPU. A scale-out only pushes back the issue so we have to get our hands dirty.
Here is a collection of best practices and optimization tips for Spark 2.2.0 …