Under the hood of Spark performance, or why query compilation matters
Published in
7 min readSep 8, 2020
--
Criteo is a data-driven company. Every day we digest dozens of terabytes of new data to train recommendation models that serve requests at the scale of the internet. Spark is our tool of choice for processing big data. It is a powerful and flexible instrument, but it has a pretty steep learning curve, and effective usage often requires reading source codes of the…