Understanding Reverse ETL
Reverse ETL promotes operational analytics, avoids data silos, and enables operational teams to make informed decisions without leaving their trusted tools.
Reverse ETL has become a standard pattern in the Modern Data Stack. Today, organizations and practitioners leverage reverse ETL tools like Census and Hightouch to sync analytics back to the operational systems. That helps bring the analytics back to where it matters most; operational decision making.
Reverse ETL promotes operational analytics, allowing operational teams to make informed decisions without leaving their familiar toolset. This article dives deep into reverse ETL to understand why it was critical, how it works, and what values it delivers to the business.
Operational teams and data teams
In the beginning, there was no data warehouse.
Organizations had different operational teams; sales, marketing, product, and customer support teams utilized operational systems for their daily business tasks. CRM, support help desk, accounting packages, HR system, and JIRA are examples touched by employees and customers.
When organizations grew over time, leaders wanted to measure how the business performs. That is where business intelligence (BI) tools enter the scene, demanding reports and KPI dashboards to be produced using operational data. For example, consider generating a sales report for the last week, requiring the BI tool to extract sales data from a CRM like Salesforce. Data teams owned the BI and other analytics processes, separating responsibilities from operational teams.
The birth of ETL
Running BI against operational systems wasn’t scalable as it imposes a severe penalty on performance. Therefore, people moved data from operational systems into a central place like a data warehouse, which allowed different BI tools to be plugged in and run analytics at scale.
That process required extracting raw data from operational systems (E), cleaning and transforming them (T), and loading them into the data warehouse (L). That paved the road for the ETL practice, which began in the 1970s.
Has the data warehouse solved the problem?
Today, organizations are in a far better place than the 60s and 70s when it comes to enterprise data management. Data teams comprising data scientists, analysts, and engineers leverage modern data tools to make informed decisions. The center of gravity of data platforms has shifted towards data warehouses like Snowflake, Google BigQuery, Amazon Redshift, Azure Synapse, etc.
But, have all these modern marvels of data helped make the lives easier for the operational teams? How are these modern data tools contributing to efficient and painless operational analytics? It seems that the analytics has not been utilised for its full capacity.
Friction in data activation
Let’s take a realistic example for a better understanding.
Consider a lead scoring example for sales-qualified leads (SQLs). Typically, sales leads are synced into the data warehouse from a CRM tool like Salesforce. Requested by the sales team, the data analyst generates and shares a report containing lead scores.
The report is hosted in a BI tool, requiring salespersons to log in before accessing it, making it difficult due to corporate security policies. You know the typical drill, right? Navigating to the BI tool, providing user credentials, answering the OTP, and so on.
Ultimately, that kind of friction often causes the report to be opened by nobody, sitting idle inside the BI tool, wasting time and effort of the data analyst. Also, that doesn’t close the feedback loop analytics — the analyst never receives feedback on his work.
Breaking data silos for operational analytics
This is the harsh reality we are facing today. The modern analytics infrastructure gives you more advanced analytics, yet it fails to walk the last mile of the data journey; the data activation.
The analytics resting at the warehouse or BI tools create data silos. That demands the operational teams to be trained to use BI tools, which is often a tough job and doesn’t scale well in reality.
If the operations people don’t go to BI tools, we need to bring analytics to their tools of the trade. This is called operational analytics.
What if we sync the lead scores back to Salesforce, the most friendly and trusted operational tool the sales team uses? That would make the salesperson’s life easier by showing the weighted score of a lead as a field in Salesforce.
That way, the salesperson never leaves the operational tool to access operational analytics, the data required to make informed decisions daily. Reverse ETL is a great enabler of that. Let’s find out.
What is reverse ETL?
Hightouch, a leading vendor in the reverse ETL space, defines reverse ETL as:
Reverse ETL is the process of copying data from a central data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales, and support.
Reverse ETL is the opposite of ETL, syncing analyzed data back to operational systems from the data warehouse. Reverse ETL improves data activation, enabling operational teams to make data-driven decisions without leaving their operational tools.
How does it work?
Reverse ETL is not another architectural paradigm that demands rearchitecting your data platform. Instead, it fits nicely with your existing data tools and practices.
As usual, the ETL/ELT tools sync operational data into the data warehouse, allowing the data team to work on requested operational analytics. For example, analysts and data scientists produce analytics like:
- The propensity to buy
- Personalized content/product recommendations
- Risk factor of churning a customer account
- The conversion rate and attribution for a marketing campaign
- Customer lifetime value (CLV)
Typically, the data team uses a tool like dbt to transform raw data into high-fidelity data analytics models and sync them back to the warehouse.
A reverse ETL tool syncs these data models from the warehouse back to operational systems like CRM, support help desk, marketing automation, etc. E.g., Salesforce, Zendesk, Marketo, Hubspot, etc.
Why do we need reverse ETL?
As more organizations become data-driven, the need for having reverse ETL increases due to two primary reasons.
Instill operational analytics culture across the organization: Having analytics stuck in the data warehouse or BI tools inhibits data-driven decision-making. Operational teams often have to bank on the data team to translate analytics into their operational domain. Reverse ETL tools break this barrier by bringing operational analytics back to operational systems, allowing operational teams to leverage analytics for daily decision-making.
Automate the data infrastructure: Without a reverse ETL tool, developers and data engineers would have to build pipelines and APIs that sync analytics back to disparate operational systems. A reverse ETL tool takes that pain away and automates the syncing from a single control plane.
Closing remarks: Reverse ETL — an enterprise integration in disguise?
As an experienced enterprise integration architect, my thought process always goes back to treating reverse ETL as a fancy term for integration middleware.
We used middleware like ESBs to avoid point-to-point integration among enterprise applications and databases back in the day. I think reverse ETL tools are walking along the same path. In both practices, we have a source system and many different target systems to move data in a reliable and scalable manner. I see no distinction in terms of technical details.
You could’ve gotten the job done using either an ESB or a reverse ETL tool.
But, in my opinion, reverse ETL tools are purpose-built to use with the modern data stack, providing seamless integration to operational systems via pre-built connectors. Also, they have already started playing a critical role in many data platforms, making a reverse ETL a standard pattern.