ETL vs ELT for Analytics Backend

Jacques Sham
6 min readOct 25, 2023

Even if you are not a data engineer, you may notice ETL and ELT are not the same thing. However, you might not be able to tell the difference. In this article, we are going to discuss the difference between ETL and ELT in both the technical overview and business use cases for analytics.

What are E, T, and L stands for?
Let’s get straight to the point: E — Extract, T — Transformation, L — Load for both cases on ETL and ELT. The difference between ETL and ELT is simply the orders of extract, transformation, and load for both cases. But let’s dive deeper to understand how they are different from the technical overview perspective:

ETL — ETL is widely used as the data pipeline for companies using the Enterprise Data Warehouse Approach. After the data is gathered, the data will be transformed right away and load the transformed data into the data warehouse.

ELT — ELT has become more popular in recent years. The major difference is that data is loaded into the data warehouse right after the data is gathered. Then, the data is transformed and load the transformed data into different tables in the same data warehouse.

Which is better? Depends

Which approach is better for your data pipeline? The unsurprising answer is it depends — It really depends on your business use cases and you should consider these questions:

  1. What is the goal of your data warehouse? And what does your data storage infrastructure look like?
  2. What are your data sources look like? Do they give you clean or tabular data?
  3. Do you need to scale your data warehouse and pipeline? If so, how do you project the plan looks like in the long term?
  4. Most importantly, how much resource and effort you are allocating? Including budget and staff headcount.

In order to explain better for the difference better, let’s look at some business use cases to see why ETL and ELT are used in those cases.

Case 1: Enterprise Data Warehouse Solution

In the modern days, many large companies adopted the Enterprise Data Warehouse solution. The data architecture leans closer to Inmon’s data warehouse approach, but not exactly the same: It gathers all data from the source (such as operation data store) transforms it, and loads the transformed data into the data warehouse. Business intelligence tools in the solution ecosystem would directly or indirectly receive data from the data warehouse (It means the applications connect to OLAP cubes or data marts).

A typical ETL workflow for an Enterprise Data Warehouse Approach

There are a few aspects of this approach:

  1. The data gathered from the operation data stores, which only keep data in the short-term, and the data is required to move to a centralized storage
  2. The centralized data warehouse is the only place where you can find lifetime data, the architecture is a top-down approach, which means it prioritizes storing data rather than having an architecture depends on business use cases
  3. Data transformation needs to be done before loading into the data warehouse
  4. Business intelligence (BI) tools are highly adopted in the enterprise infrastructure

One of the reasons why the data needed to be transformed before entering the data warehouse is because data gathered from the operation data store does not necessarily follow a tabular format required in the data warehouse. Good examples are CRM solutions and JSON files, those data formats are not in row and column structure that require efforts and resources to transform into a format that the data warehouse accepts. Another reason is the data model in the data warehouse — It prefers a denormalized data model in the data warehouse, so it requires some effort to denormalize the data before loading it into the data warehouse. This is how the data lake strategy works very well along with the Enterprise Data Warehouse solution — The data lake strategy explains how unstructured data can be stored and transformed before loading into the data warehouse.

Case 2: Utilizing 3rd Party BI tools

As the data volume and business use cases grow, the Enterprise Data Warehouse model is not hard to upkeep and businesses are utilizing more 3rd party BI tools that companies might create new branches of data storage just for the BI tools. In my experience working at GoodData, I learned that GoodData aimed to improve user experience and dashboard performance, so it requires users to have a sophisticated output stage according to a well-designed logical data model in the data warehouse (GoodData calls it a semantic layer). Therefore, most of the companies that utilize GoodData have a satellite data warehouse architecture from their enterprise data warehouse infrastructure. A typical example is that a company would utilize Vertical or Snowflake as a backend of GoodData, the data from the enterprise data warehouse infrastructure would export and load to the source tables on Vertical/Snowflake, transform via the staging stage, and save the data in the output stage tables and expose the data to GoodData dashboards. Therefore, the data warehouse would typically have 3 schemas to present source, staging, and output stages (Snowflake calls it bronze, silver, and gold layers). The data to be exported in the enterprise data warehouse is relatively clean and denormalized, it requires simple transformations in Vertical/Snowflake to match the logical data model that can be handled by SQL commands, therefore, the ELT model makes sense because the data is able to enter the data warehouse in the earlier stage.

Data Infrastructure when using 3rd Party BI tools

Note: The late versions of GoodData do not require a standalone data warehouse just to support the backend of the Dashboard. Instead, it allows the source, staging, and output tables to live in the data warehouse in your core data infrastructure.

Case 3: Simple Data Structure that Requires Minimal Data Transformation

Besides utilizing 3rd Parth BI tools, perhaps smaller companies may not have complex data sets that do not require complex data transforming, the ELT model may fit those companies’ data strategy requirements because they simply load data to a data warehouse with simple transformation from staging stage to output stage with SQL scripts without any complex transformation tasks. This approach also allows the companies to only utilize one data (Or minimal) storage to keep the data — Especially when companies only have data but received in a tabular format like CSV files or Excel spreadsheets.

Similar to Case 2, the data warehouse would have an architecture of about 3 schemas to represent source, staging, and output stages. Another advantage of this approach, it ensures data will be preserved in every stage and it is very easy to revert the change and retrieve the backup data.

The transformation in both Case 2 and Case 3 only requires SQL as the transformation may only require removing duplication, aggregation, and other simple tasks. Because of that, dbt has become more popular because dbt is very easy to manage and maintain by utilizing Jinja and SQL for data orchestration, you may learn more in my other article Why dbt Makes Sense for ELT?

Conclusion

The major difference between ETL and ELT lies in whether the data is loaded to a data warehouse before or after transformation. Neither of the approaches is superior to the other, it is a choice for companies to fit best to their use cases that depend on data volume, data format, and resources when building their data storage infrastructure. A typical large company may likely use the ETL approach because they gather data from different formats and need to transform the data before loading it into the data warehouse. A small company that only gathers data in tabular format or a company that utilizes 3rd party BI tool may choose the ELT approach to keep the pipeline simple and easy to maintain. At the end of the day, it really depends on the use case and customize to the goals and needs of your data strategy and situations.

Reference

My LinkedIn Profile:
https://www.linkedin.com/in/jacquessham/

--

--

Jacques Sham

A data analyst at a digital consulting firm, a whisky-lover, an aviation enthusiast, and a gamer. Concern how to use data science to answer questions.