Synergy Part-0: Introduction to our in-house data warehouse solution

Ujjwal Gupta
SquadStack Engineering
4 min readAug 11, 2023

This blog post series highlights how we built our in-house data warehousing solution. We’ll also share our challenges while looking for the optimal solution.

Why Synergy?

“Synergy” refers to the combined effect or interaction of multiple layers, components, systems, or technologies that produce a result greater than the sum of their individual effects. So, we decided to call it “Synergy” because it perfectly captures the strong collaborative spirit of our team.

Here are some topics that we’ll talk about in this blog post:

  1. About Data Warehouse Solution
  2. An Overview of Synergy
  3. What problems we were facing before Synergy?
  4. Why do we choose to build a data warehouse solution?
  5. SLAs that Synergy will help us achieve

About Data Warehousing Solution

A data warehouse solution is a centralized system that stores and integrates large volumes of data from various sources. It consists of data sources, processes for extracting, transforming, and loading data, the data warehouse, data modeling, business intelligence tools for analysis and reporting, metadata management, and security measures.

This solution enables organizations to efficiently store, analyze, and gain insights from data to support decision-making.

Synergy: High-Level Diagram

An Overview of Synergy

  • Synergy is a data warehousing solution that allows us to draw analytics to gain quicker real-time insights.
  • Business users get instant access to crucial data from various sources on a single platform, they can rapidly make informed decisions on key initiatives. They won’t waste precious time retrieving data from multiple sources.
  • The outcome we want to draw from synergy is to boost communication and speed up decision-making.
  • From Synergy, large amounts of historical data are kept in a data warehouse so you can study various periods and trends to foretell the future.

What problems were we facing before Synergy?

As we are a fast-growing startup so the biggest challenge was to handle fast generative data. We listed some of the problems below:

1. Slow Insights

Slow insights lead to unhurried decision-making, so we must move to faster insights to improve decision-making. It was not possible to gain insights from a massive amount of data using a transactional database.

2. Handling PII and Data Security

Handling PII is one of the most challenging problems we were trying to solve recently. Before Synergy, it was difficult to restrict internal users to access PII.

3. Customer-Facing Analytics

Customer-facing analytics was one of the major issues that we faced recently, so we must address this issue to improve the customer experience.

Why do we choose to build an in-house data warehouse solution?

Building an in-house solution was the best option for our use case due to the following reasons:

  1. Cost Effectiveness — Most of the tools in the market were too expensive and didn’t align with our budget.
  2. Immature Tools — Some tools didn’t deliver the desired output and had inadequate support.
  3. PII Handling — This could be due to a variety of reasons such as a lack of customization options, inadequate support, or limitations in the tools’ functionalities. This is why we decided to build an in-house solution to address the issue of PII Data Scrubbing and ensure that our company’s data privacy policies are met.

SLAs that we planned to achieve from Synergy

To overcome the challenges we faced while evaluating tools, we decided to build our data warehousing solution. To ensure the success of the project, we defined the following Service Level Agreements (SLAs):

  1. Service Availability: Ensuring that the service is available and functioning properly during the designated working hours. This is important to ensure that the users can access the data and perform their required operations without any disruptions. We started with an availability SLA of 97.5%. We now have > 99.5% availability, and we plan to hit > 99.8% in the future.
  2. Replication Lag: This SLA defines the acceptable time delay between the source and destination systems. A replication lag of no more than 90 minutes is expected, meaning that the data should be updated and available within 90 minutes of the original change for end users.
  3. Cost Effective: The solution should be cost-effective and not exceed the budget of ~$2500/month.
  4. Hold Historical Data: The solution must be capable of storing 2 years of raw data and 1 year of transformed data. This is important for keeping a record of historical data for analysis and reporting purposes.
  5. Fast Analytics: The solution must be able to execute queries on data older than 6 months within 5 seconds. This is critical for ensuring that the users can quickly analyze the data and make informed decisions.

Stay tuned for our next blog post, where we’ll reveal the planning and evaluation process behind building Synergy. If you enjoyed this post, don’t forget to show some support by giving it a few claps. 🤗

--

--