Photo by Ali Moharami on Unsplash

Under the hood of Spark performance, or why query compilation matters

Victor Zaytsev
Sep 8, 2020 · 7 min read

Criteo is a data-driven company. Every day we digest dozens of terabytes of new data to train recommendation models that serve requests at the scale of the internet. Spark is our tool of choice for processing big data. It is a powerful and flexible instrument, but it has a pretty steep learning curve, and effective usage often requires reading source codes of the…