Automating Data Transformation from Firebase

-How we saved 90% of the effort invested in data integration-

Yinon Eitan
Plarium-engineering
4 min readJul 29, 2024

--

Plarium is a leading gaming company that creates some of the best mobile games in the industry with multiple studios worldwide.

The data engineering department develops the company’s centralized data platform, which provides data solutions and services.

Our high-scale, cloud-based platform has a complex data model that stores and organizes data from all the games, marketing systems, and external sources.

The Challenge

One of our challenges is consolidating data from a diverse range of games into the centralized data platform. Achieving this consolidation requires each studio to integrate its data with the platform to adhere to a unified data model, facilitating the creation of generic data services and enabling cross-game analysis. We strive to make this integration as smooth as possible and reduce the time spent on the task.

The Solution

Plarium’s games leverage Firebase as an app development platform and export native and custom events to BigQuery (BQ). A few of our studios already maintain a small internal database based on this export.

Leveraging Firebase’s permanent schema in BQ, we created a tool to automate the transformation of Firebase data into the company’s database conventions, using an SQL query (DDL) created based on source-to-target mapping. By doing that, we are able to point out the data and eliminate the need for studios to perform redundant data integration tasks.

Before jumping in to see what’s under the hood, let’s briefly overview the pros and cons of implementing such a tool:

The pros here are clear — no need to develop the same data integration twice, create one source of truth, the ability to partition the data differently than the default (day) Firebase export allows, and even flatten the data and not use Firebase array fields.

Another significant advantage is the fact we move the workload internally to our data platform and free up valuable time for the studio developer. Internally, the data owner (usually the data product manager) can integrate the data by themself.

The main drawback here is the one-day latency we experience since we are avoiding using the intraday table. But more on this later.

Technical Implementation

The process has several steps

  • Based on the source-to-target mapping, the process created a BigQuery view that has the columns’ transformation and enrichments. It could be one-to-one column mapping (e.g. geo.country to db.country_id) or taking specific keys from a complex (to flat the complex data type).
  • Read the last 3 days (due to late data) from the events_YYYYMMDD tables, transform the data, and store it in a staging table.
  • Deduplicating using a given key.
  • Loading transformed data into a Kafka topic for continuous processing within the platform.

This process, written in GO, is deployed on Cloud Run and scheduled using Cloud Scheduler. Monitoring is a key part, constantly comparing the count of events, distinct users, and duplications.

The next phase will focus on supporting real-time events integration using the Firebase intraday table. However, in the past, we encountered the same issue described here on Reddit.

Conclusion

In an era defined by data-driven decision-making, a modern time of doing “more with less”, the ability to seamlessly integrate systems is paramount. This transformative tool bridges the gap between Firebase and internal data systems, revolutionizing data handling and reducing the burden on developers.

Disclaimer: This article is provided by the data engineers department for informational purposes only. The article doesn’t refer to private data. Data privacy regulations must be met on both ends. The views and processes described reflect the authors’ experiences and are not exhaustive. The company makes no warranties about the accuracy or completeness of the information. Readers are responsible for ensuring their activities comply with applicable data privacy laws and other relevant regulations. The company assumes no liability for any errors, omissions, or actions taken based on this information.

--

--