How we use Apache Druid’s real-time analytics to power kidtech at SuperAwesome

Natasha Mulla
SuperAwesome Engineering
6 min readSep 15, 2020

Co-authored with Saydul Bashar

Here at SuperAwesome, our mission is to make the internet safer for kids; to help accomplish this goal, our products power over 12 billion kid-safe digital transactions every month.

Digital transactions come in many forms and could be:

  • An ad served to a kid’s device through AwesomeAds
  • A video view on our new kids’ video gaming platform, Rukkaz
  • A like, comment, post or re-jam on PopJam

Every digital transaction is processed to be instantly available for real-time analytics.

In kidtech, kid-safety and privacy protection are paramount, and a traditional approach to analytics and data engineering wouldn’t necessarily be COPPA and GDPR-K compliant.

What makes a traditional digital transaction kid-safe is the absolute absence of personal identifiable information (PII), which is the foundation of our zero-data approach. This is the main characteristic that makes our real-time analytics kid-safe.

Our kid-safe real-time analytics allow us to make the best and quickest decisions for our products and services, as well as our customers, and it enables us to work and iterate in a data-driven way.

Aside from helping us make product and customer decisions, real-time analytics is also used to power some of our products. This is the case for AwesomeAds, where we use this data to drive real-time decision making.

When it comes to collecting, processing and storing this mammoth number of transactions with efficiency and durability, we found Apache Druid to be the perfect database for the job.

What is Apache Druid?

Apache Druid is an open source distributed data store. Druid’s core design combines ideas from data warehouses, time series databases, and search systems to create a unified system for real-time analytics for a broad range of use cases. Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture. —