In part one, we described our Analytics data ingestion pipeline, with BigQuery sitting as our data warehouse. However, having our analytics events in BigQuery is not enough. Most importantly, data needs to be served to our end-users.
In this article, we will detail:
Back to where we stopped in the previous article. Our data is stored in raw and aggregated formats in BigQuery. Our data analysts can query it directly for exploration or debugging purpose, but SQL has its limit. That’s where some Scala code takes over in our architecture, in a component we call the Analytics Service. …
In this article, we describe how we orchestrate Kafka, Dataflow and BigQuery together to ingest and transform a large stream of events. When adding scale and latency constraints, reconciling and reordering them becomes a challenge, here is how we tackle it.
Part II of this article focuses on how we use and abuse Redshift to serve this data to our end users.
In digital advertising, day-to-day operations generate a lot of events we need to track in order to transparently report campaign’s performances. These events come from: