Building a modern analytics stack

Everything you need to know about data analytics stacks with real examples.

Fabio di leta

Published in

paradime.io

9 min readAug 15, 2022

Introduction

Building a modern analytics stack is one of the most critical things for a business. It provides the tools to collect, process, and analyze data to make smarter data-driven decisions than ever before.

Data analytics 101

In this blog, you’ll explore the importance of building a modern analytics stack. We will answer the questions, such as — what is an analytics pipeline, what is an analytics stack, and how it all works. You will also go through its significance and the stages of building it. You will understand how to capture, store, and process your data, including how to analyze and visualize it.

Let’s start with the definition. Data analytics is analyzing data to gain insight into business processes and operations. It can be used by businesses to improve their efficiency and effectiveness.

What is an analytics pipeline? — An analytics pipeline is a set of instructions for processing data, which can be either batch or real-time. This pipeline is a process used by businesses to gather, store, process, analyze, and visualize layers of data to understand their customers, their business operations and their performance compared to their competitors.
What is an analytics stack? — An analytics stack is a set of tools and workflows that allows for extracting useful information from large amounts of data. It is a superpower that every organization needs today. This stack is composed of different components, including data warehouses and various modular tools used to build analytics systems that integrate, aggregate, transform, model, and report data from disparate data sources. These tools typically store and process large amounts of data efficiently while ensuring that it’s easily accessible whenever necessary.

The significance of the analytics stack

Data analytics is essential for every business and is the key to obtaining a competitive advantage. It helps you understand your business, your product, and your customers to ultimately provide a better service/experience. As data workloads move to the cloud and modern organizations deploy a suite of apps to perform various duties across their business functions, it’s becoming increasingly hard and complicated to process data to generate valuable business insights. An average organisation has more than 200 SaaS applications and many hundreds internal data sources they need to keep track of.

The marketing team may utilize a combination of HubSpot and Marketo for marketing automation; the sales team, Salesforce, and Apollo to manage prospects; and the customer support team, Intercom, to manage customer requests and troubleshoot issues. This results in data fragmentation across multiple data sources with no single source of truth — resulting in inaccurate insights and reporting.

Data silos can make even fundamental business insights hard to obtain. An analytics stack can help you consolidate and refine data from various data sources and then transform to create meaningful insights. This typically leads to increased data access, better governance, and fast analysis path to your business insights. It also helps ensure consistent data quality and provide standardization to your reporting layer, which is crucial for reliable business insights.

How does an analytics stack work?

Now that you know what an analytics stack is, let’s find out how it works. Though the specifics of the stack may differ across different businesses, the basic workings are similar and composed of the following stages:

The first stage of a data analytics stack involves extracting data either from product or from third party sources using tools, such as Fivetran, Stitch, Segment and Rudderstack
The next stage involves storing the collected data in a data warehouse, such as Firebolt, Snowflake, Redshift and BigQuery
This is followed by modeling and transforming your data to fit your business requirements using tools, such as dbt and Apache AirFlow
The final stage of the stack involves exposing the data for exploration, visualization, and reporting using tools, such as Tableau, Looker, Preset, and Mode

‍But just putting the stages one after the other does not guarantee success. In the following sections, we will explain to you the ways in which you can ensure you are building a data analytics stack and building a good one.

A multi-source approach to data capture

This is the first stage of successful analytics. In reality, capturing data is much more difficult than it sounds. There is just so much data to capture, it is so easy to get drowned or, worse, capture the data that give incomplete or incorrect insights.

All data sources must work together: It’s important to collect data from multiple sources; otherwise, you run the risk of collecting and analyzing biased or incomplete data. For example, companies often rely on a single source for their CRM (Customer Relationship Management) data. However, this can lead to inaccuracies if the CRM system is not updated in real-time or if the company has poor integration between its other systems, such as Salesforce, Intercom, Marketo, Zuora, and the CRM system. Your CRM data should be tightly integrated with your marketing automation tools, subscription management software, and engagement and customer support platforms.
Quality defines accuracy: Data quality is an important aspect when building an analytics stack. Poor quality data can easily mislead decision-making processes that depend on accurate information about customers, suppliers, partners, or vendors within your organization’s ecosystem(s). It’s therefore essential that organizations establish strong data quality practices around how they capture and transform data. Some popular tools to check the quality of data are Monte Carlo, Datafold, Metaplane and Anomalo.

Finding the ideal storage for your data

If you’re reading this article, you’ve probably already decided that data analytics is important to your organization. Now, it’s time to start thinking about where all this data will be housed. The first step in building a strong data analytics stack is choosing a data warehouse system that can handle your unique needs, fit your cost profile and allow for easy access to all the information being collected.

There are many factors to consider when choosing a modern cloud data warehouse. Some of which are:

Elasticity: do you need more storage and less compute or vice-versa or do you need both to scale at the same time
Cost: depending on your elasticity needs, you need to think how your warehouse cost will scale not only with increasing storage but more increasing compute
Security: what are the permission models supported by the warehouse and does that fit your organization’s security needs
Speed: if you are a growing or scaling organization, you are likely to be processing terrabytes and is your warehouse fast enough for your slowest queries

A modern data analytics stack has at its core cloud-based solutions that allow it to easily scale storage and compute to support analytics use cases. The market today offers a variety of solutions when choosing a cloud data warehouse to store and transform data with different functionalities and more flexibility. The most popular solutions in the market include Snowflake, Google BigQuery, Amazon Redshift, and Firebolt.

Data processing like a pro

Data processing involves extracting data from the source and transforming it into a format that is more suitable for analysis. The industry has now fully adopted an ELT (Extract, Load, Transform) workflow, where data is first extracted and then loaded into a data warehouse to be transformed. The process consists of two well-defined steps:

Data ingestion (EL): This step involves extracting data from various sources into a storage layer/data warehouse — this is where data is extracted and stored. The current market offers an abundance of tools that can extract and load data into a data warehouse, such as Fivetran, Stitch, Segment and Rudderstack for your most well known sources and Portable.io for the long tail of data sources.
Data transformation (T): In this step, the previously ingested raw data is converted into data models by transforming the structure of the data making it easier to understand. Well-defined models enable users to analyze their company’s data without having to trawl through large amounts of raw data. They also help in aligning teams on common metrics, ensuring that all teams speak the same data language at all times if done correctly. The most commonly used workflow or framework to transform data in the warehouse is dbt, which allows analysts and analytics engineers to write their transformation code in SQL (Structured Query Language) and take benefit of all the ergonomics of software engineering workflows like CI/CD.

Insights powered by analytics

Once data has been processed, the next step is to present it in the form of a report or visualization to draw insights from. This visualization is done using BI (Business Intelligence) dashboards, such as Looker, Tableau, Preset, and Mode. These dashboards can help you make informed decisions, predict outcomes, and make recommendations. This stage of your data stack is one of the most critical as it is the point of contact between data analyst teams and business functions across the organization.

Reverse ETL for enriched data

Reverse ETL solutions, such as Census and Hightouch, offer out-of-the-box connectors to various systems, such as Salesforce, Hubspot, and Intercom. Adding this component to your data analytics pipeline allows you to push and sync transformed data back into those SaaS (Software as a Service) applications’ standard and custom fields. By pushing data back into these third-party systems, it operationalizes or activates data back into the organization. This gives sales, marketing, and operations teams direct access to enriched data daily. Some common use cases include:

Helping personalize customer marketing efforts by combining support, sales, and product data in Hubspot
Enriching customer profiles in Salesforce with product usage to improve the sales process
Unifying and syncing your customer data with Intercom to improve customer support and reduce churn

By implementing reverse ETL tools in your data stack, you can push data directly into the SaaS tools used by line of business users while also streamlining automation and eliminating manual export and import using CSV (Comma-Separated Values) in your third-party applications.

Integrated is the way to go

Building your data analytics stack requires an integrated approach. The steps in the stack include capturing, storing, and processing data, visualizing it, and then taking action based on what you learn.

To get you started on this journey, here are some basic steps:

Capturing: You need to know what kind of data sources or tools you want to use. You also need a plan for capturing all of this information reliably to centralize your data in one single destination.
Storing: Once you have captured your information into one location, it’s important that it is stored safely, so no one loses any valuable insights due to a technical or human error.
Processing: Then comes the actual processing that can involve things, such as removing duplicate entries from databases or breaking down the unstructured JSON into structured fields for analysis purposes, or building data models that can better serve your business.

These three vital processes form the foundation upon which everything else rests!

Building your first data analytics stack

Transitioning from dependence on siloed and automated applications to building your own modern data analytics stack can definitely be an uphill struggle. However, because we heavily rely on data to make business decisions, building a modern analytics stack is now a necessity rather than good to have. Furthermore, a well-built stack can evolve with your business — providing you with meaningful insights that you can act upon. However, it’s also no secret that a data analytics stack is made of different components, with many competing tools to choose from. It’s important to embrace good engineering practices to build your data stack upon a set of tools that can provide scalability, reduce switching cost while maintaining control over the challenges of gluing many of these tools together.

In the next blog of this series, you will learn how to build an analytics stack by looking at the architecture of a modern data stack and its components. We will also evaluate some tools based on their ease of use, integration capabilities, community, documentation, and pricing.