Do You Still Need Commercial ETL Tools?

Seckin Dinc
5 min readFeb 21, 2024

--

Photo by Ma Joseph on Unsplash

At my recent articles, I have been trying to demystify the fundamentals of ETL, ELT, Reverse-ETL and Zero-ETL topics. I got quite a few questions around “Do You Still Need Commercial ETL Tools?”. If we start from the point that there is no Perfect Solution to solve all the problems in the market, we should focus on the use cases.

The data domain is saturated by ETL tools. We might have more than 50+ tools in the market. It is normal! We have ETL in our lives for 20+ years. When you try to solve a problem over 20 years and yet can’t solve it 100%, it is normal to have this many tools in the market.

From the other side of the coin, ETL tools and vendors started to rebrand themselves as data movement, data integration tools over the last years. Some of them position themselves at B2B marketing integrations, some on No-Code space, and some around everything!

In this article we will take a look to the ETL tools and why they still exist in our lives (and most likely to exist for the next 10+ years).

ETL Tools 2.0

Over the past decade, ETL (Extract, Transform, Load) tools have undergone significant evolution driven by technological advancements and changing business needs. Some parts of this evaluation completed successfully and some parts are still under construction. Some parts were necessary to solve the market needs but some of them were just developed because the competitor has developed them.

Let’ take a look where the most investment made over the past decade;

Cloud Adoption

ETL tools have increasingly moved to the cloud, offering greater scalability, flexibility, and cost-effectiveness. Cloud-based ETL platforms like AWS Glue, Google Dataflow, and Azure Data Factory provide managed services for data integration, making it easier to handle large volumes of data.

Real-time Data Processing

There has been a shift towards real-time data processing and streaming analytics. Modern ETL tools are equipped to handle streaming data sources such as Apache Kafka, Amazon Kinesis, and Azure Event Hubs, enabling organizations to make timely decisions based on up-to-date information.

Integration with Data Lakes and Data Warehouses

ETL tools now seamlessly integrate with data lakes (e.g., Amazon S3, Azure Data Lake Storage) and data warehouses (e.g., Amazon Redshift, Google BigQuery) to facilitate data ingestion, transformation, and analytics. They offer connectors and plugins for various storage and analytics platforms, enabling interoperability across the data ecosystem.

Integration with SaaS Products

With the rise of SaaS solutions like Salesforce (CRM), Shopify (e-commerce), and others, businesses started leveraging these platforms for various aspects of their operations due to their flexibility, scalability, and ease of use. ETL tools now seamlessly integrate with these products through their APIs.

Self-Service Data Integration

There’s a rising demand for self-service data integration tools that empower business users to extract, transform, and load data without relying on IT departments. These tools typically offer intuitive interfaces and drag-and-drop functionality, enabling users to create data pipelines with minimal coding.

Containerization and Microservices

Containerization technologies like Docker and Kubernetes have influenced the design of ETL tools, enabling them to be deployed as microservices and orchestrated at scale. Containerized ETL pipelines offer portability, resource isolation, and easier management across hybrid and multi-cloud environments.

Do You Still Need Commercial ETL Tools?

ETL tools are going through a massive re-branding over the last 3–5 years. They are trying to shift from pure ETL Tool to a Data Integration Tool. They are shifting from classical ETL process to combining ETL, ELT, 3rd Party Data Gathering, Data Management topics. While they are changing the tool capabilities, there are massive marketing efforts from all the vendors in the market towards this direction.

If we turn back to the initial question, “Do You Still Need Commercial ETL Tools?”. It depends on various factors;

  • Being a B2B vs B2C company
  • Company and operational size
  • Integration with 3rd party SaaS products
  • Data and Infrastructure Team size, maturity and motivation
  • Data Freshness requirements
  • Analytics, Data Science and AI data product requirements
  • Regulations in the operated countries, …

I think we can populate 100 more factors to impact the decision making process. Of course the more we think the more complex it gets. So it is better to prioritise the most critical topics for your organization. Here is my decision making process;

  • If your data team size is less than 3 — 5% of the company, then get an ETL Tool.
  • If your data team is not senior enough, then get an ETL Tool.
  • If your organization needs the latest 3rd party API integrations, then get an ETL Tool.
  • If your organization is a small or mid-size B2B company, then get an ETL Tool.
  • If data movement is not your companies competitive advantage in the market, then get an ETL Tool.
  • If your organization can invest into tooling from budgeting perspective, then get an ETL Tool.

Conclusion

Today we are facing high pressure to generate ROI at our data teams. C-Suite doesn’t have the patience anymore to build custom ETL or streaming pipelines with the open source tools just for the sake of using open source tools that data team members can write to their CVs.

Most companies miss their AI targets and stay behind the competition as they still struggle on the infrastructure related problems. As data leaders, we should be more cautious where to make an investment and where to cut costs. In order invest into AI and future, we should stop investing our team’s time to a problem in the market for 20+ years. Just get an ETL tool!

Thanks a lot for reading 🙏

If you are interested in Data Engineering, don’t forget to check out my new article series.

--

--

Seckin Dinc

Building successful data teams to develop great data products