326 Followers
·
Follow

Spark from the trenches — Part III

Image for post
Image for post

In this third article of our Apache Spark series (see Part I, Part II and Part IV), we focus on a real-life use case, where we tried several implementations of an aggregation job.

Business Context

At Teads, we distribute ads to over 1.5bn people every month within professionally-produced content.

One of the main components of our platform is responsible for handling bid requests (an opportunity to display an ad) and for sending back a bid response (the ad to display and the associated price).

An advertising campaign can be set up with delivery constraints :

  • target specific users, depending on their geolocation,
  • target specific devices, OS…


Part II — Tricks and external data source management

Image for post
Image for post

This post is the second episode from the “Spark from the trenches” article series. In the previous post, we’ve covered best practices and optimization tips.

We will continue to dig into some real-world situations that we have dealt with and focus on two topics:

  • First, we will see some operation tricks we actively use for troubleshooting. At Teads, we embrace the following motto: You build it, you run it. We had to make sure that we have the right tools to look at our system’s health and understand what’s going on.
  • Then, we will talk about best practices to use external data sources in your workflows with JDBC. …


Image for post
Image for post

Spark is the core component of Teads’s Machine Learning stack. We use it for many ML applications, from ad performance predictions to user Look-alike Modeling. We also use Spark for processing intensive jobs like cross-device segment extension or Parquet to SSTables transformation for loading data into Cassandra.

Working with Spark we regularly reach the limits of our clusters’ resources in terms of memory, disk or CPU. A scale-out only pushes back the issue so we have to get our hands dirty.

Here is a collection of best practices and optimization tips for Spark 2.2.0

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store