Zero ELT could be the death of the Modern Data Stack

Hugo Lu
5 min readMay 8, 2023

--

Zero-ELT is getting a fair bit of press at the moment despite the fact that, as data professionals, we probably don’t do a lot of it. In this article, we dive in to some practical use-cases for Zero ELT so you can cut through the noise of modern / postmodern / hardcore etc. data stacks vs. Zero ELT.

What is Zero ELT anyway?

Zero ELT describes an approach rather than a set of tools that aims to eliminate “traditional” extraction, loading and transformation tools.

It’s worth noting that conceptually, this isn’t a fundamentally different approach to what most data teams already do. Indeed, the consolidation of multiple parts of the data stack into a single tool would qualify under the zero ELT approach above. It would also mean “all-in-one” platforms are “zero ELT”-friendly. The fact is, though, extraction, loading and transformation still happen — you just hopefully use less tools to do it.

I think as data engineers, we are all for simplifying the stack. The reason we like using tools specialised for a specific purpose is that they can get really really good at them. As the data stack matures, we’ll see zero ELT tools are specialising along use-case grounds rather than functional grounds. There is, therefore, a trade off between increasing our stack complexity and having the best tool for the job. Also noting the problems zero ELT tools solve can also be solved by “traditional” / “Modern data stack” approaches.

Use-cases for Zero ELT

The easiest way to identify use-cases is by looking at who is talking about them. These are mainly Amazon and Snowflake. In Amazon’s case, they announced an integration between Aurora and Redshift, where Aurora data is available in cleaned and aggregated form in Redshift automatically, thereby eliminating the need to ingest data from Aurora to Redshift and to transform it in Redshift using dbt.

Functionally, this is kinda exactly what Fivetran do. They essentially ensure database replication from one place to another, and can even run dbt models automatically which are pre-configured. See the docs here. It’s hard to see how Amazon’s integration is different, particularly given Fivetran’s enable multiple connectors to multiple destinations — Aurora to Redshift is just 1:1, and no-one uses Redshift!

Taking a different approach

Another company that’s caught my eye is Jebra.io. They essentially allow you to write SQL transformations as you would with dbt, but on pre-configured sets of data, with the option to push the transformations you write to pre-configured destinations. It’s sort of like offering a Fivetran + Snowflake + dbt + hightouch in one.

I guess this is the vision

This is a super attractive idea insofar as it’s an awesome way conceptually to deal with simple end-to-end reverse ELT flows. Imagine you have usage by product in database A) and Customer data in Salesforce (Source B). You can essentially write SQL, push this to Jebra, define the ingestion and reverse ELT, and suddenly get usage by customer in your CRM system. But wait — isn’t this what all-in-one platforms were?

The downside is that this data is not usable anywhere else, and your queries are likely not scalable (not clear how incrementality on massive datasets will be handled). It also doesn’t interface nicely with data contracts, semantic layering or the concept of having “data assets”. Fairly important considerations for large enterprises.

Ok, so are Zero ELT tools just all-in-one providers?

I guess so? It feels like there are two use-cases that are clear for zero ELT:

  • Database replication with some out the box transformations
  • End-to-ending data flows where reverse ELT is required e.g. usage by customer

The pros:

  • Use less tools

The cons

  • Don’t use the best tool for the job
  • Create data silos
  • Inability to operate with semantic layering / data assets / reuse logic
  • Stack complexity; zero-ELT tools are mutually exclusive with other tools, so you will actually increase the complexity of your stack if you use one for a few use cases
Zero ELT platforms are kinda like all-in-one platforms, and suffer from their shortcomings

Right now, I think I know what we prefer! There is, however, some hope….

Zero ELT use-cases we know that work

At Codat there was an immense amount of work done in Salesforce. There were hundreds of calculated fields and these were surfaced in Salesforce dashboards that were widely consumed and utilised by every commercial team. Hubspot served a similar purpose, and ingested thousands of datapoints on web activity, form fills, and meetings to calculate “marketing scores” for prospective clients. It was really valuable, and was basically zero ELT.

This was all done without any Modern Data Stack and is what I believe could be really valuable.

For example, it’s conceivable to imagine a zero ELT tool that is use case specific that needs certain inputs (or combinations of them) to serve certain use-cases, where transformations are out the box. For example, you could have an API with two ingest connectors:

  1. Salesforce
  2. Database e.g. SQL

And some predefined fields

  1. Salesforce Salesforce ID
  2. Database Salesforce ID
  3. Database Customer ID
  4. Database time column
  5. Database usage column

And one reverse ELT connector, with some predefined destination fields:

  1. Salesforce (usage last week)

Which would give you enough to auto-calculate usage per customer and surface it in Salesforce. This would be pretty cool because as long as the zero ELT supports the use-case or problem you’re trying to solve, and has connectors, then that’s it: you solve the problem without having to do anything apart from specify some basic configurations.

The question is — what use-cases can have commoditised transformations? Should commoditised transformations inform how we solve use-cases?

For example, suppose there is a transformation that accepts the following columns:

  1. Customer name
  2. Revenue field
  3. Revenue type
  4. Accepted values: subscription, usage, non-recurring
  5. Date

And there is an out-the-box transformation that calculates MRR. What if a company looking to make use of this functionality wants to offset the date by a month? The transformation needs to be parameterised further. What if the company has underlying data not in this format — they need to do pre-transformations for this to work anyway, which surely defeats the point!

Give me the use-case

Zero ELT doesn’t feel like “a thing” yet. Tools feel like “all-in-one” platforms and present real costs as they can’t substitute out the data stack. Therefore, using them entails more complexity and encourages data processes in silo.

If there are use-cases they can solve, however, this is attractive. If you can say “surface me revenue by customer in my CRM system” and it just does it who wouldn’t want that? The question is — can it? Does anyone do it for that? Can they add a GPT layer that just “works it out”? I guess we’ll just have to wait and see!

--

--

Hugo Lu

I write on Data engineering and the coolest data stuff. CEO@ Orchestra, the best-in-class data pipeline management platform. https://app.getorchestra.io/signup