Our advertising data engineering team at GumGum uses Spark Streaming and Apache Druid to provide real-time analytics to the business stakeholders for analyzing and measuring advertising business performance in near real-time.
Our biggest dataset is RTB (real-time bidding) auction logs which amounts to ~350,000 msg/sec during peak hours every day. It becomes crucial for the data team to leverage distributed computing systems like Apache Kafka, Spark Streaming and Apache Druid to process huge volumes of data, perform business logic transformations, apply aggregations and store data that can power real-time analytics.
Forecasting is a common data science task at many organizations to help with sales forecast, inventory forecast, anomaly detection and many more applications. For GumGum’s advertising division, it is critical for our sales team to forecast available ad inventory in order to setup successful ad campaign.
Our Data Engineering team already uses time series forecasting for directly sold ad inventory (400+ million/day). As advertising industry is evolving, there has been exponential growth in programmatic ad buying and as a result GumGum have 30+ billion programmatic inventory impressions/day. Producing high quality forecasts is not an easy problem at GumGum’s programmatic advertising…