Designing a Dynamic Automated Reporting Pipeline — 3

Published in

MagicLab

3 min readMar 16, 2022

Previous post of this series can be found here.

A data-driven gaming company has many different needs. It needs high-level KPI reports for everyone involved, detailed UA reports for marketing teams, monetization reports for product teams, in-game economy and game specific feature reports for design teams, error and performance reports for developer teams… List goes on and on even without starting to talk about ML and AI applications like LTV prediction, churn prediction, user segmentation, live-ops optimization etc. But for all of that to happen, we need to have the data in the first place. It would be good if said data is structured, categorized and ordered. It would be even better if we collect it that way to begin with.

Such a thing would not only drastically reduces time and resource consumption for data cleansing but also opens the door for programmatically scale up and expand ETL processes.

Reaching to such a wonderful ideal state might not be possible in a dark, chaotic world we live in; but that doesn’t deter us from trying it, we have enough coffee for fueling us through the journey (at least for now).

Firstly, we have to determine what type of information we need from players and when we need it. We need to identify which parts of our games we want to monitor; and which types of information we need to know at that specific points. Some of these can be vary according to games’ core mechanic, but most of it remains same regardless of genre or type. For our purposes, we found that we will always need to know a when a new install happen, when a new game session starts, when a player watches an ad, when a player makes an in-app purchase, when a player’s soft currency changes etc. Let’s call these common events.

Even if you have the necessary experience for listing what types of events you need in a heartbeat, I would still suggest looking for online resources of analytical tools. I found Firebase’s documentation on events (https://support.google.com/analytics/answer/9322688?hl=en&ref_topic=9756175) especially inspiring for it.

Secondly, you need to figure out what events you will need for monitoring that specific game other than common events. It would greatly varies on core mechanic of the said game; you may need just one or five depending on the game. These are ours game-specific events.

On the third step of our road, we need to figure out what sort of information we need on each event. While for an ad impression event we need to know the type of the ad, but in a new install event we may want to know the install source of our app. And of course our game-specific events will all have their unique parameters.

When we started to write all of these down; we realized we need some information regardless of the game or event type, and we need some information for every event of a specific game. For example, we needed to know timestamp of each event regardless of game; but we needed to know player’s energy amount(a soft currency for one of our games) for every event of one of our games.

When researching different options to design our data schema programmatically reproducible for all of our planned games, we decided to use json schema. Its elastically expandable architecture, easily implemented validation structure and local referencing abilities for schema creating checked most of our boxes.

So, we decided to create a three tiered json schemas for our events. First tier would contain the parameters we wanted for all of our events, second would contain the parameters we wanted for every event of specific game and third tier would contain the parameters of that event.

With referencing abilities of json schema, we would be able to connect these tiers to each other and create our validation schemas for all of our games and events. We could even make us of “oneOf” key of json schema to version our schemas for future purposes.

Now; time to start coding.

On our next post, we will create event schemas from ground up using three tiered structure mentioned above with samples from our architecture.

Next post of these series can be found here.

Designing a Dynamic Automated Reporting Pipeline — 3

Written by Alperen Yüksek