Modern Data Stack: How to build cutting-edge applications?

Arthur Rogério
Indicium Engineering
5 min readMay 9, 2024

--

Modern Data Stack is an innovative approach that offers numerous benefits to organizations. With it, you’ll be prepared to tackle any data analysis challenge.

In this edition, our primary goal is to enhance new skills and knowledge by focusing on a case study centered around Modern Pipeline development practices, utilizing technologies like Airbyte, Snowflake, and dbt.

Let’s delve into the exciting realm of Modern Data Stack and discover how to construct a state-of-the-art application.

We’ve curated an array of powerful tools: MongoDB, Airbyte, Snowflake, and dbt. With this remarkable combination, you’ll be equipped to efficiently extract, transform, and load data at scale, with precision and confidence.

Modern Data Stack: What are the benefits?

Modern Data Stack is an innovative approach that offers numerous benefits to organizations. With it, you’ll be prepared to tackle any data analysis challenge.

Among the advantages, we can highlight:

  1. Flexibility and scalability: The Modern Data Stack (MDS) allows you to handle the diversity and growing volume of data in a flexible and scalable manner, ensuring that your application is prepared for any challenge.
  2. Connectivity with various data sources: with MongoDB as the data source and Airbyte as the integration platform, you can connect to a wide range of data sources, consolidating all the important information in one place.
  3. Efficiency in data ingestion: Airbyte simplifies the process of data extraction and ingestion, allowing you to set up efficient pipelines to bring information from MongoDB to Snowflake quickly and reliably.
  4. Advanced storage and querying capabilities: Snowflake is a cloud-based data warehouse designed to handle large volumes of data and complex queries. It offers exceptional scalability and performance, allowing you to explore your data in an agile and effective manner.

Modern Data Stack with MongoDB, a powerful data source

MongoDB is a highly flexible and scalable NoSQL database that offers a document-based data model. Therefore, it’s a great ally to be used in building a Modern Data Stack

With its ability to efficiently store and access large volumes of data, MongoDB is an ideal choice as the data source for our modern application.

Figure 1 — Database in MongoDB

Modern Data Stack with Airbyte, making data extraction and ingestion easy

Airbyte is an open-source data integration platform designed to simplify the process of extracting and ingesting data from various sources to different destinations.

With an intuitive and user-friendly interface, Airbyte allows you to set up connections with data sources like MongoDB and define data pipelines to extract and load that data into specific destinations.

Figure 2 — How Airbyte works

Modern Data Stack with Snowflake, a scalable destination for data ingestion

Snowflake is a cloud-based data warehouse that offers exceptional scalability and performance. For these benefits, we chose this tool to build our Modern Data Stack.

With its architecture designed to handle large volumes of data, Snowflake is an ideal choice for storing and managing the data extracted from MongoDB by Airbyte.

Moreover, Snowflake provides advanced features such as SQL query support and the ability to automatically scale computing resources, ensuring that the modern application can handle large workloads efficiently.

Figure 3 — Completed ingestion on DW Snowflake

Modern Data Stack with dbt, the data transformation wizard

It’s time to turn your data into gold in our Modern Data Stack.

With dbt, you’ll have a true wizard in your hands.

This open-source data transformation tool allows you to apply business rules, clean the data, and prepare it for analysis. But that’s not all!

dbt also supports the creation of analytical models, enabling the creation of visualizations and aggregations for valuable insights.

It’s pure magic happening behind the scenes.

At the heart of this modern solution is dbt, a powerful tool that adds many additional benefits:

  • Data versioning: dbt allows you to track the transformations applied to the data, ensuring that you have a complete history of changes and the ability to roll back if needed.
  • Governance and compliance: with dbt, you can apply policies and governance rules to your data transformation processes, ensuring compliance with regulations and established standards.
  • Documentation and collaboration: dbt allows you to document your entire data transformation pipeline, making it more transparent and shareable among team members. Moreover, it encourages collaboration and teamwork, facilitating knowledge sharing.
  • Data lineage: dbt offers advanced data lineage capabilities, allowing you to track the origin and impact of each field in your dataset. This provides a complete visibility of the data flow, which is crucial for audits and troubleshooting investigations.
Figure 4 — Data lineage of this project

These are just some of the positive aspects that dbt brings to the solution, but its capabilities go beyond, offering a complete and powerful experience in transforming your data.

With the Modern Data Stack, you are ready to tackle any data analysis challenge.

The combination of MongoDB, Airbyte, Snowflake, and dbt provides:

  • Robust solutions;
  • Flexibility;
  • Feature-rich capabilities for extracting, transforming, and loading data efficiently.

Take advantage of the benefits of this modern approach and explore new insights and opportunities for your business!

Indicium: the path to building a Modern Data Stack

Indicium is a leader in data consulting and data product development in Latin America.

We aim to help you achieve high analytical performance using the Modern Data Stack.

Visit our website to explore our data driven solutions with speed, security, and governance.

See you soon!

Project repository: https://github.com/rogeriothur/mds-na-pratica

References:

--

--