Replatforming of Performance Management System: Behind The Scenes

Merve Gonencayoglu
Trendyol Tech
Published in
5 min readDec 13, 2023
Image by pch.vector on Freepik

At Trendyol, one of our main goals in the DWH Fulfillment Team is to create meaningful near real-time (NRT) experiences through operational reporting with the various processes for our warehouse operations. To measure and increase efficiency in operations, they require performance monitoring systems for each workspace and warehouse to achieve their daily goals.

In this article, we will introduce the newest version of one of our oldest products in warehouse operations with its new data model: performance management system, we will explain how we overcame technical obstacles with two re-platforming projects: Tableau to React Framework and API-call to Kafka migrations.

Performance Management (PM) System — An Introduction

At Trendyol, we designed two different PM structures to monitor operational performance and efficiency. These can be summarized as PM NRT Dashboards and PM Reports.

PM NRT Dashboards (PM screens) are 18 separate dashboards for each process which conducted by the operations team in three different warehouses. These screens run in NRT, so the screen needed custom-developed queries to operate with a minimum schedule. They are shift-based dashboards as they show only the latest eight-hour cumulative data.

Our second PM structure is PM Reports. In addition to dashboards, our customers needed data with at least six months retention. They needed to use this data for their monthly performance KPIs and workspace efficiency. We designed various table reports in Looker according to operation team’s needs. These reports enabled our customers more detailed analysis compared to dashboards as they contain long-term insights.

In conclusion, it can be said that dashboards can be used for motivational and short-term decision-making purposes in the warehouse. On the other hand, Looker reports can give insights for six months.

Technical Structure of Performance Management System and Re-platforming Challenges

We conducted two different re-platforming projects on PM structure in which multiple Trendyol Technology Teams participated. The first re-platforming project was migrating the PM dashboard infrastructure from Tableau to React Framework. The need for this change can be summarized as follows:

  1. Value freezing issue in Tableau. The values on the dashboard needed to be updated in NRT to be able to show current performance constantly. This problem also resulted in increasing customer support efforts in our team. Most importantly, it caused the operations team to work with wrong performance values until they realized the values were actually not updated for a while.
  2. As in most of the applications, Tableau logged-in users automatically log off after a period of time. To solve this, the IT support team in the warehouse had to manually log in to the application.

With the implementation of the new architecture, the freezing problem could easily be solved. But as we decided to move on from Tableau, our first challenge was to provide the KPI values without a business intelligence tool. We wrote each PM dashboard’s KPI rules in PostgreSQL. These queries became input to the Python code and sent to UI with React which was developed by DWH Analytic Development Team. Currently, these 18 dashboards run in React, Python, and custom-written SQL codes for each dashboard. This method also enabled us to be flexible with calculations as we were not using a BI tool anymore. Also, for the second problem, we created a Trendyol user to log in to new screens with desired qualifications.

Dashboard Architecture

The second re-platforming project was migrating all PM models from the Api-Call-sourced system to the Kafka-sourced system. The need for this re-platforming project can be summarized as follows:

  1. The previous source system in the model was in API-Call format. As the data ownership belonged to source teams, API belonged to the DWH team. Each source team which will feed the PM model creates a request to reach DWH’s IPs. In busy times of ETL, the system may not respond to return and source teams face an error. On the front-end side, we could not provide correct NRT data to our customers in PM dashboards and reports. We needed a source system structure where source teams would not create requests to our APIs. Instead, we should connect to their systems.
  2. Our other target as the DWH team was to migrate from Vertica to Google BigQuery (BQ). API-Call source was located in the Vertica machine and it created a dependency. This was another motivation to choose a source system format that would be compatible with BQ.
  3. The t-1 model runs in BQ with slot management in DWH. In busy times of ETL, all resources are directed to BQ tables, especially at night as finishing ETL is the primary objective. This caused delays in the PM dashboards which run in an older system.

Each source system team was distributed with microservice architecture. In API-Call format, DWH gathered PM data in a single source table in the ODS layer. Different from previous architecture, each source team designed their Kafka events separately with different data models according to their domain’s standards. As a result, DWH consumed 10 different Kafka events from 5 source domains in the ODS layer in BQ.

API-Call and Kafka Architectures

Following the ODS layer, the migration of the EDW layer from Vertica to BQ started. As PM dashboards are one of the oldest products, there were many business changes compared to when it was first designed. There were 54 physical tables in the EDW layer in the old system. As the DWH Fulfillment Team, we analyzed each table considering current business rules, DWH naming standards, modeling standards, and SQL performance aspects. As a result of re-platforming, there are 33 physical tables in the new system now. Also, as we migrate from Vertica, we will save 340GB size in storage in the old system.

All in all, our final PM system product follows more current DWH standards with the centralized data model, performing better than the old system with enabling slot management, does not cause errors for other teams as a data source is not DWH and takes up less physical table space as a result of changed business rules and processes.

About Us

We’re building a team of the brightest minds in our industry. Interested in joining us? Visit the pages below to learn more about our open positions.

--

--