Analytics Engineering

Ravikiran durbha
Business Intelligence and Analytics
3 min readJul 30, 2021

What is the hype?

The evolution of the data warehouse from on-premise to cloud has ushered in plenty of new patterns. For example, during the “on-premise” days it was common to extract data from source and apply transformations in a separate server (with dedicated compute) and then load it into the data warehouse. For brevity, it is called “ETL” (Extract , Transform and Load) with the “T” firmly embedded in the center. There were many vendors in this space, like — Informatica, Microsoft, IBM etc.

Traditional ETL Pattern

With the availability of elastic compute on the cloud that can scale based on need, it became more cost effective to move transformations to the cloud. There is little reason to invest in dedicated compute for transforming data, when you can pay for it based on usage. This uprooted the “T” from the center and attached it to the end giving rise to the “ELT” (Extract, Load and then Transform) pattern.

Modern approach to transformations in the cloud

This meant vendors could focus purely on transformation software that is cloud native and leave moving data to legacy players (There are also cloud native vendors like Matillion that move and transform data). During the initial stages, these transformations were being done in Python, Scala etc. , but embedding transformation logic in esoteric code made it opaque to data analysts who were very skilled in SQL, but not as much in other forms of coding. At the same time, there is well established software engineering principles that can really help improve data quality like version control , continuous integration and development , automated testing, reusability etc. Data analysts are not trained in these patterns and generally do not use SQL in this way.

The solution was to bring these engineering principles to SQL coding and hence “Analytics Engineering”. A cloud software called Data Build Tool (dbt) introduced this concept and they excel at it. Apart from the software engineering principles mentioned above they have also automated documentation and data lineage generation. They recently raised ~ $150 million in Series C at $1.5 Billion valuation. Clearly, investors feel the market is headed in this direction.

In my view, this actually goes even further than just bringing software engineering principles to data. It democratizes the data landscape in an organization. It enables Data Analysts who are closer to business processes to transform raw data and provide clean data sets for business analysis and do it with proper controls and well established engineering principles. In the traditional pattern this work was done by specialized ETL engineers who knew the tools very well, but not necessarily the business.

We can actually go further in democratizing data if we can add an intuitive GUI layer on top of a tool like dbt for business analysts in the organization that know the business processes very well but not so much SQL. This way super users (or Data Analysts) can create robust models with SQL and others can simply use the GUI layer to use the model and build their own simple transformations.

--

--