Game Analytics at N3TWORK

Published in

N3TWORK

6 min readFeb 21, 2018

We strive to build great mobile games but until we release a game we do not know if we will be successful. Our initial designs rely solely on our intuition and previous experience as gamemakers, however once we release a game we look to objective measurements to tell us where we can improve. This combination of art and science drives us to continually improve the games we make.

Objective measurements come from the data we collect while people play our games. We instrument our games to record many actions players take and use those measurements to inform us how people play.

At N3TWORK we expect everyone to continually seek and provide context. We support this by making all data available to every employee so they can seek the context they need to be successful. Because it’s not possible to anticipate every question in advance, we need to structure our data to be flexible to support questions from various groups such as game designers, engineers, user acquisition and management.

We solve this using our analytics pipeline, internally named Poole. Poole solves several requirements for us:

Provides baseline feedback for all of our core KPIs without specific game instrumentation.
Allows each game to measure its own dimensions that the team wants to track.
Makes data accessible using standard SQL.
Provides tools to visualize and explore data to answer ad hoc questions.
Provides all data in near real time, with an end to end latency of under 15 minutes.

Measuring Base KPIs

We use a common set of key performance indicators across all our games to understand their high level performance. These KPIs include:

DAU

Daily Active Users: Count of users who played on a given day.

mDAU

Monetizing Daily Active Users: Count of daily active users who have ever spent money in the game.

Sessions/DAU

Average number of play sessions per user, per day

Minutes/Session

Average number of minutes per play session

ARPDAU

Average revenue per DAU

ARPmDAU

Average revenue per mDAU

These metrics tell us at the top most level how a game is performing. We further break these metrics down by many dimensions including geography, acquisition channel, device and operating system, and join date.

Poole tracks the data required to measure these metrics without specific game instrumentation so we can quickly understand and compare each game’s performance.

Game Specific Dimensions

Most of the interesting questions focus on game specific features. Poole allows each game to define up to 10,000 dimensions that can be recorded with each analytics event. Game teams measure the dimensions they need to better understand player behavior.

Each game provides a schema for the dimensions it will track. Poole then uses the schema to validate data as it is received and to manage the SQL tables that store each event.

Accessing Data With SQL

Poole stores all event data in Google BigQuery and each game has its own BigQuery dataset that contains daily tables for every deployed environment of the game. This ensures development versions of the game do not mix their data with production releases of the game.

Poole stores a unique row in BigQuery for each event with a column for each schema dimension. When writing code to instrument the game, it is frequently not known which dimension values are needed to answer all future questions. One solution would be to send all dimension values with every event, but this is not practical for games that have hundreds or thousands of tracked dimensions. Another solution is to query the database to find rows with specific dimension values when working to answer a specific question, but BigQuery does not support table indices, so such queries would result in large table scans to find the correct value.

Forward Propagation

To solve for this, Poole implements a feature called “forward propagation.” Forward propagation can be enabled for any dimension in the Poole schema and guarantees that each row in BigQuery will contain the latest known value for every propagated dimension even when the received event did not explicitly provide a value. This means events only need to contain values for dimensions that changed, but the rows in BigQuery will contain all known state for each event.

For example, imagine a sequence of events that might occur as a player enters, plays in a battle, and ranks up as a result of playing the battle:

Without forward propagation the resulting rows in BigQuery would look like this:

If forward propagation is enabled for the rank dimension, the resulting rows in BigQuery would look like this:

This significantly simplifies the required SQL to answer questions like “what is the distribution of player rank on the Battle Win screen?”

SELECT rank, COUNT(*) as count WHERE screen=”Battle Win” GROUP BY rank ORDER BY count DESC;

Data Availability

Events are continually collected by Poole, processed, and streamed to BigQuery. Poole works to make events available for use within 15 minutes, which allows for buffering time on the client, validation and pipeline processing. In practice, data is typically available in under 10 minutes.

Summary Tables

Working from the daily tables generated by Poole, summary tables are periodically computed to roll up answers to common questions. Often data from these daily tables are joined with external tables such as game configuration state or user acquisition data. Summary table updates are driven from Python or R scripts which are scheduled to run as needed.

Accessing Data

Not every consumer of data knows SQL and some analysis requires further computation. In addition data is commonly exposed through a few different methods.

Google Sheets

Spreadsheets are a very familiar method to work with data. We use the OWOX BI Add On to load data from BigQuery into Google Sheets for people to analyze. Using a general purpose tool like Google Sheets rather than a specific tool for data exploration reduces the learning curve for many users and lets people easily integrate data from additional sources.

Tableau

Many reports do require more presentation and structure than what Google Sheets can provide. We use Tableau to build and manage our standard reports and to present analysis that will be reused frequently.

Scripting

We use a combination of Python with Pandas, and R for exploratory and ad hoc data analysis. These scripts are often built and run by the data science team. For scripts that need to run periodically, we schedule them on our Jenkins infrastructure to run when we need them.

Summary

Game analytics at N3TWORK continually drive our decisions. By making data available across the company, each individual can seek the answers they need to make an impact to grow our games and business.

In a future article we’ll dive into some of the technical details and describe how we operate this system to process hundreds of millions of events each day.