Spending all your Analytics bandwidth in fulfilling data requirements…?

Anjali Arya
Building Aasaanjobs (An Olx Group Company)
5 min readMay 28, 2020

Solving day-to-day data queries hogs analytics bandwidth and also increases the TAT for solving queries.

Take a step back and think why business teams rely on analytics for basic data requirements?

Any data request, either small or complex needs you to have the knowledge of 3 basic things:

  • Data query language like SQL, Oracle and Python
  • Database structure, i.e., which table refers to what data and which column stores what
  • Understanding of column abbreviations for better readability of data

What if data extraction was as simple as creating a pivot table in excel?

It would solve both for analytics team by saving their bandwidth and the business teams by improving the TAT of data extractions.

Now this involves 4 major steps:

  1. Understanding business requirements
  2. Building data pipelines
  3. Converting data into a readable format
  4. Having an excel-like interface for business teams for data visualisations

The most difficult process here is building data pipelines, it requires a team of data engineers to write transformation codes and load it into the warehouse. Also, any simple change in structure would mean going back to the transformation layer and start all over again….

What if building data pipeline was as easy as writing SQL code?

Here comes DBT (Data build tool)….. It is a command line tool that enables data analysts to transform data in their warehouses by simply writing SQL queries.

This is how a conventional ETL pipeline looks like

DBT however works on ELT principal. Transforming the data becomes quite easy in such type of data pipelines.

dbt does the T in ELT (Extract, Load, Transform) processes – it doesn’t extract or load data, but it’s extremely good at transforming data that’s already loaded into your warehouse. It also has cloud support which plugs directly into your version control system (GitHub, GitLab, BitBucket, etc) to pull the latest version of your dbt project and also allows you to run scheduled tasks

So now when we have solved the transformation problem, the next step is to define your data warehouse structure…

Defining the data warehouse structure

Data structuring may vary from business to business but it is generally preferred to name your schemas based on the business unit such as core, marketing, finance, hr, operations…

This is how it looked like for us

Now that we have the structure in place, the next question to answer is what kind of data are different business units looking for…

To understand this, we took a list of dashboards that they are currently using and also listed down the data points that they are or might be looking at anytime in future.

Having the list ready, now it’s time to create views. Every view or table in dbt is stored in models which then runs on a regular interval to create transformed views and tables in your data warehouse.

Before starting to create views, there are few guidelines that we should follow for best results:

  1. Never use tables from your base schema into your models directly — Create core models for all the base tables and then take reference of these core models in your final models. It helps in making sure that a small change in table architecture doesn’t impact your final model.
  2. Break complex models into smaller pieces — Complex models often include multiple Common Table Expressions (CTEs). Breaking the CTE into a separate model (Also known as DAG architecture) allows you to reference the model from any number of downstream models, reducing duplicated code.
  3. Group your models into directories — Like we broke our models into directories named on business teams
  4. Use Custom Schema names — Use custom schemas to separate relations into logical groupings. We named it based on business teams as well like all the views having finance views will be grouped under “finance” schema. This will also help you set user permissions at schema level.
A Typical DAG Architecture

With the schema setup, here comes the most important part,

Having an easy-to-use tool for data extraction and push its adoption across business teams

You can choose any tool you want, we went ahead with Google data studio as its an open source tool and is quite easy to use. However, training the team and adoption of the process was the most difficult part.

Adoption was completed in 3 phases:

  1. First level training for all the major stakeholders in business teams -After this training, some requests were pushed back to the teams to handle.
  2. Retraining to address the issues faced so far and detailed understanding of schema structure.
  3. Identification of early adopters from all the departments and continuous issue resolution with them - They were responsible for handling the query requirements of their department and also to help their team members get hands on with the tool. They also played a key role in suggesting further improvements in the transformed views.

It took us 2 months to implement this completely but the efforts paid off in a big way. We managed to cut down our analytics bandwidth to 50% for daily data requirements and reduce the TAT of data extraction to less than an hour. This is just a beginning, we will keep on identifying more opportunities of optimizing analytics bandwidth.

Thanks for reading this article. Do clap if you liked it. In case of any questions, feedback or suggestions, feel free to comment or reach out to me directly.

Why work with OLX People?

We don’t give you a job; we build your career and offer an opportunity to work with the brightest minds.

Join our team. Forward your CV to ta@olxpeople.com

--

--