GumGum speaks at Spark + AI Summit 2020

Published in

GumGum Tech Blog

2 min readJul 27, 2020

GumGum receives around 30 billion programmatic inventory impressions amounting to 25 TB of data each day. Inventory impression is the real estate to show potential ads on a publisher page. By generating near-real-time inventory forecast based on campaign-specific targeting rules, we enable the account managers to set up successful future campaigns. This talk, Real-Time Forecasting at Scale using Delta Lake and Delta Caching, which Jatinder Assi and I presented at Spark + AI Summit 2020, highlights the data pipelines and architecture that help us achieve a forecast response time of less than 30 seconds for this scale. Spark jobs efficiently sample the inventory impressions using AMIND sampling and write to Delta Lake. We talk about how we enable time series forecasting with zero downtime for end-users using auto ARIMA and sinusoids that capture the trends in the inventory data, and discuss about AMIND sampling, Delta Lake, Databricks Delta caching, and time series forecasting.

Spark + AI Summit 2020 — Real-Time Forecasting at Scale using Delta Lake and Delta Caching

We also discuss the details around this solution in two tech blogs here:

Time Series Forecasting at Scale

Forecasting is a common data science task at many organizations to help with sales forecast, inventory forecast…

medium.com

Enhance Spark performance using Delta Lake and Delta Caching

Introduction