Data warehousing solutions have been existing for decades and these have been the backbone of all reporting and analytic needs of both small and large scale enterprises. Even in today’s world of Big Data, Data Lakes, and NoSQL databases, SQL as a language still remains the most powerful querying language and the combination of data warehouses and SQL continues to dominate the most modern data applications.


At GumGum, Amazon Redshift has been the primary warehousing solution for years. Redshift is a fully managed, petabyte-scale cloud data warehouse that has worked very well for our needs. However, our data footprint has…

GumGum receives around 30 billion programmatic inventory impressions amounting to 25 TB of data each day. Inventory impression is the real estate to show potential ads on a publisher page. By generating near-real-time inventory forecast based on campaign-specific targeting rules, we enable the account managers to set up successful future campaigns. This talk, Real-Time Forecasting at Scale using Delta Lake and Delta Caching, which Jatinder Assi and I presented at Spark + AI Summit 2020, highlights the data pipelines and architecture that help us achieve a forecast response time of less than 30 seconds for this scale. Spark jobs efficiently…


When the organizations scale and the data explodes, it becomes vital to have scalable data architecture. This post revisits the problem statement discussed here, but for an entirely different scale. To give a quick recap, the goal is to forecast the inventory impressions per day, given a set of targeting rules and sample data. This time, the inventory being forecasted is programmatic inventory. In part one of the blog post, Jatinder Assi discussed in detail about data architecture and distributed sampling on the programmatic inventory. …

Rashmina Menon

