Reverse ETL: creating and making use of external integrations. START online streaming service: a case study
In order to meet the needs of a constantly changing business and provide personalised services to our subscribers, we have to work with large amounts of data and rely on the usage of reverse ETL approach. Let’s discuss this approach and try to figure out why it was implemented within our system and how it functions in conjunction with external integrations.
Choosing the solution
At one point, our primary business goal was to segment users into groups by certain criteria in order to prepare mailing lists. We used the reverse ETL (Extract, Transform and Load) approach as a way to provide our marketing team with relevant information about our customers. That way we could extract specific data about viewership and time preferences, devices being used, etc.
Overall, data integration helps combine and centralise data from different sources. As such, ETL represents a traditional data integration process. First, information is extracted from first-party databases and third-party or external sources such as SaaS tools for sales and marketing. Then, it is transformed to meet specific business needs, and loaded into a data warehouse (DWH) — a data management system that contains large amounts of archived information and is designed to support business intelligence activities. Popular DWH solutions are Clickhouse, Snowflake, and Google BigQuery; they allow to store, structure, organise and analyse data in a number of ways.
Reverse ETL works differently: you can use the same tools to move data out of the warehouse and into the relevant recording systems across the company — CRM, sales, and support systems, ensuring constant synchronisation. For instance, we use it for mailing segmentation and content recommendation systems — marketers can create or update records, as well as analyse them in order to better understand their parameters.
Companies can implement reverse ETL for several purposes, such as operational analysis, data automation and infrastructure management. It can help teams from sales, marketing, product, and finance areas make deliberate data-driven decisions, such as whether or not to extract data from the warehouse and sync in with external tools.
Integration with external services
Microservices, also known as microservice architecture, is a cloud-based system design approach. Under this approach, a single application is composed from several smaller, not necessarily related components. Microservices used at our company cover three aspects — platform users, streaming content, and the billing system.
Microservices deal with user data, such as consent to receive marketing communications, user emails, SMS messages, etc. An ETL tool is essentially a service that enters a database, transforms relevant data and moves it to another database. For instance, we perform the following steps in order to integrate our DWH with external services:
● We collect all the data from a DWH (Clickhouse, in our case) with the use of CDC (Change Data Capture);
● We create a Data Mart connected to the DWH that has access to information from different services;
● Finally, we provide all the relevant information to the email sending system.
Basically, you can implement any kind of ETL tool to work with data. We developed our own solution for the “Extract and Load” part of the process. We use Apache Airflow as a data orchestration system, which helps automate data management processes by centralising relevant information from multiple sources and preparing it for data analysis. It is worth mentioning that any destination with an API (Application Programming Interface) can be used.
Current status
Reverse ETL helps analyse DWH and obtain relevant data points from it, such as statistics on views, events, devices, user IPs, etc., and then send it over. Every 15 minutes, bit by bit, we collect information and send it to the marketing department.
#STARTteam, #data, #dataengineering, #START