Why are you still doing batch processing? “ETL is dead”

William Scott
6 min readMar 26, 2019
ETL : Rest In Peace

It was about year ago that a few colleagues suggested that I research Apache Kafka for an application that I was designing. I watched the rerun video from QCon 2016 titled “ETL is Dead; Long Live Streams”, in that video, Neha Narkhede (CTO of Confluent), describes the concept of replacing ETL batch data processing with messaging and microservices. It took some time for the paradigm to really sink in but after designing and writing a data streaming system, I can say that I am a believer. I will describe the difference between ETL batch processing and a data streaming process.

Every company is still doing batch processing, it’s just a fact of life. A file of data is received, it must be processed: it needs to be parsed, validated, cleansed, calculated, organized, aggregated, then eventually delivered to some downstream system. Most companies are using some sort of workflow tool such as SSIS or Informatica, that can intuitively wrap these tasks into a ridged “package” contained on a single server and execute on schedule.

Modern business demands are outgrowing the paradigm of “batch processing”, most data can’t wait for a “batch” to be collected, analysts want that data now, downstream systems want an instant response. In some cases, batch data has grown so large, these ETL tools just can’t process the data fast enough.

--

--

William Scott

I am a technologist! I help companies find ways to improve their business with tech. @BillScottCoder https://www.linkedin.com/in/bill-scott-boston/