ETL and How it works?

Nugroho
3 min readMay 19, 2024

--

Image Cover ETL and How it works? by Nugroho
What is ETL and How ETL works?

Data is now considered like the new oil. Every company is competing to collect, process, and utilize data to be used as a reference in business decisions.

Well, that’s where ETL comes in. In this story, I will explain what ETL is and how the ETL process works.

What is ETL and how does it work?

ETL stands for Extract, Transform, Load and is a process used in data warehousing to extract data from various sources, then transform it into a suitable and consistent format for loading into the destination system, usually a data warehouse or data lake.

Image about ETL Process by Nugroho
ETL Process

Here are the three operations in ETL.

Extract

In this stage, data is extracted from various sources, including :

  1. Transactional systems such as RDBMS (Relational Database Management System) PostgreSQL and MySQL.
  2. Spreadsheets such as Ms Excel (Microsoft Excel) and Google Sheets, and flat files such as CSV (Comma-Separated Values), JSON (JavaScript Object Notation), XML (eXtensible Markup Language), and TXT (Text)
  3. Cloud Storage such as AWS (Amazon Web Services) and Azure.

The data is read from various sources and stored into the staging area.

Transform

In this stage, the data that has been extracted and read is then transformed into a format that is suitable for loading into the data warehouse. This process involves :

  • Data cleaning and validation such as filtering (filtering data with certain filters),
  • Cleaning (adjusting the writing format, for example “United States” is changed to “US”),
  • Joining (combining similar data characteristics into one),
  • Splitting (breaking different data characteristics into two or more), and
  • Sorting (sorting data based on certain characteristics).

The purpose of these transforms is to ensure the quality and consistency of the data, which can later be useful for analysis.

Load

After the data has been transformed, it is loaded into the data warehouse or data lake or another storage system. To be used later for analysis and reporting.

Some popular ETL tools that are often used are Talend, Pentaho, Informatica, and Oracle.

What are the advantages and disadvantages of ETL?

ETL also has its advantages and disadvantages in its application.

Advantages of using ETL

The ETL process provides a number of important benefits in data warehousing management.

One of them is the improvement of data quality, which includes the accuracy, completeness, and currency of the information stored.

It also facilitates the integration of data from various sources, and automates and simplifies the ETL process, reducing the time and effort required to load and update data in the data warehouse.

Not only that, ETL also plays a role in maintaining data security and facilitating accurate data integration.

Disadvantages of using ETL

However, although the ETL process provides great benefits, there are also some drawbacks that need to be considered.

One of them is the high cost of implementation and maintenance. In addition, the complexity of implementation can be challenging, especially if the organization does not have sufficient expertise and resources.

ETL also has limitations in its flexibility, which makes it difficult to handle unstructured data or real-time data.

In conclusion, the ETL process is a crucial stage in data warehousing management that ensures the accuracy, completeness, and currency of the data stored in the data warehouse or data lake.

However, you can consider the challenges and limitations of this ETL process.

That’s all for this story, thanks for reading :).

--

--

Nugroho

Enthusiastic about data, Machine Learning, web scraping, Python, SQL & data viz, I also talk about money at www.cashnug.com