Designing a Dynamic Automated Reporting Pipeline — 1

Published in

MagicLab

3 min readMar 2, 2022

When we were only creating hyper-casual games; we seldom needed in-depth data analysis. Simple tools that provided by our MMPs, mediation partners and little more complex but still generic tools of analytics SDKs (there are veritable ocean of them) are more often than not sufficient for our immediate purposes. For more comprehensive works such as UA optimizations and LTV predictions that needs its data as raw as possible; we were just exporting raw data from said partners and work with it.

We had access to all of the high-level KPIs, and even some low-level ones. Sometimes we couldn’t reach the data we need like “how much time players spends at that part of the game?” or “what is the level distribution for day-2 users?” but for answering that and many more similar questions we needed to extract, process and analyze immense amount of data from our millions of players.

All of that was perfectly adequate for hyper-casual games with short burn-out times. Because even if we create an extensive ETL process and try to extract insights from it, we couldn’t go farther than old method if only for the single fact that there isn’t any more meaningful data to begin with.

Surely some of the data that we have managed to collect and process would have give some insights that would result in some improvements along the way; but designing, building and maintaining such process in an environment that is evolving and changing with a stunning speed simply was not a good decision resource-wise.

So, what changed?

We have decided to go into casual genre. That means longer user journeys, more extensive game contents and plethora of in-game features. For creating well-loved games, we had to have an understanding about what made our players tick. We have to be reasonably sure about which parts of the games they like and which parts they hate. We needed to be sure which type of live-ops strategy works best for sustainable game experience. For these and many other similar reasons; we have decided to create our own analytics stack.

Now, designing and implementing an ETL process for a couple of games is not rocket science. But we have also decided that we will make abundance of games with various core mechanics. All of these games would need both same basic KPI reports and game specific reports for that particular core mechanic. That itself create the need for our analytical stack to be easily scalable throughout different type of games and because we are devoted believers of DRY principle we planned to make it as automated as possible. But it also should be elastic enough to address different needs for different mechanics without making us wanting to tear our hair out.

But, how?

We started planning.

We needed to have a normalized data to create programmatic reports, for that we needed to have systematically processed and cleansed raw data and for that we needed to extract and collect our player data in a unified manner.

So, our design steps became this:

1- Create data packages in clients with programmatic schemas.

2- Collect data packages with easily scalable solution.

3- Validate data.

4- Categorize and store in an easily accessible manner.

5- Process and aggregate for analytical purpose.

6- Create reports for various levels for different teams.

Of course there are also different steps such as data enrichment from other sources to gain as many insights as possible from data and create database for our ML applications; but they are beyond the scope of this series.

On our upcoming posts, we will go step by step from our list to explain tools, methods and technologies we used for our analytics stack.

Next post of these series can be found here.

Designing a Dynamic Automated Reporting Pipeline — 1

So, what changed?

But, how?

Written by Alperen Yüksek