DATA ENGINEERING

Structuring a Self-Service BI Environment with DBT — Part 1

How do we structure a solution to give more autonomy to data analysts here at Afya using DBT + Redshift?

4 min readDec 23, 2023

For a long time, there’s been talk about Self-Service Business Intelligence (BI), which is an approach within the data realm that grants more freedom to the business/product team to explore data and create dashboards without relying on more technical professionals.

The purpose of this text is to provide a more conceptual view, but rest assured, in the upcoming articles, we’ll delve into the technical aspects — how we, as the data engineering team, are ‘paving the way’ for Self-Service BI using the DBT tool.

However, before we proceed, it’s relevant to understand our context here at Afya and how we’re structured as a Data team.

Afya

Afya — which means ‘health and well-being’ in the Swahili dialect — was born from the union of NRE Educacional, the largest group of medical colleges in the country (established in 1999), and MEDCEL, a brand specializing in preparatory courses for medical residency exams. The group’s first college began operating in Tocantins, in the North of Brazil.

In 2019, Afya debuted on the Nasdaq stock exchange, and starting from 2020, through a Corporate Venture Capital (CVC) structure, it began acquiring and merging various healthtech companies to shape Afya Digital Health — a structure where I currently work as the Data Lead Engineer in the Clinical Decisions pillar.

But how did we structure the data team…

We have a Data Directorate within Afya, and within this directorate, we have 4 Chapters that operate cross-functionally between the Business Units (BUs) with the goal of maximizing synergy across the businesses.

These 4 Chapters are:

Data Product
Data Science
Data Analysis
Data Engineering

This team operates in two types of structures: within the Data Squad, which leans towards a more technical direction and includes members from all chapters, and within the Product Squad, consisting of multifunctional teams more focused on the business aspect. Specifically, analysts and data scientists participate in these product squads.

Now, returning to Self-Service BI…

Within this structure we have, oftentimes, the speed at which the product squad needs information isn’t met by the data squad. Therefore, in order to scale adequately for the product teams, an environment with more autonomy is crucial!

Below, to simplify understanding, we can see how the process currently operates: the engineering team not only inputs data into the Data Lake but is also responsible for creating datasets in the Data Warehouse, enabling the data analysis team to craft dashboards for the business areas. An important detail: all business understanding for the dataset to be created was undertaken by the analysis team.

Therefore, as the initial step towards Self-Service BI, it makes more sense to provide a tool that grants greater autonomy to the analysis team for creating their datasets. Considering the solutions integrated with the healthtechs acquired by Afya, DBT, which was already in use by iClinic, proved more promising. Besides data transformation, the tool is comprehensive in terms of documentation, resulting in better Data Governance. After all, something I’ve learned is that there’s no Self-Service BI without Data Governance!

With this in mind, we constructed the architecture, led by our engineer, Alice Thomaz, responsible for designing the solution. This solution involves integrating DBT Cloud as the tool to orchestrate our transformation pipelines into our Data Warehouse — a Redshift AWS cluster. All model codes to be executed reside in our Gitlab repository.

For development, we utilize an Ubuntu container, handling the entire DBT installation and configuration process. This includes pulling the repository within this container, allowing us to locally test the models created for DBT on the engineers’/analysts’ machines. Once the model is validated and approved, we open a Merge Request to the repository.

Are you interested in how we structured our repository, integrated with Redshift, and created a container for local development? Don’t miss out on part 2 of this article, coming next month!

To exchange ideas, share your opinion, or suggest something, feel free to reach out to me on LinkedIn. If you want to be part of the country’s largest medical ecosystem, with cutting-edge data technologies in an innovative environment, come to Afya!