Data Engineering for Marketing: Data Lake, Data Warehouse, Data Lakehouse… Everything you need to know to organize your company’s marketing data — Data Engineering Series (Part IV)

DP6 Team
DP6 US
Published in
5 min readJan 4, 2022

With the evolution of digital marketing, we have seen an increased diversity of tools, and a significant growth in the volume of data generated by AdTech (Advertising Technology) and MarTech (Marketing Technology) platforms. As a result, simple solutions like Excel are no longer capable of serving the new ecosystem of data. More is needed to store and update this data on a constant basis, in an agile and assertive way. We have many things to take into account: LGPD/GDPR; the constant update rate in data and formats; treatment of conversion windows; storage of allocation flows; exponential growth in data volume; and the mitigation of data silos. So, how do we manage it?

What are Data Warehouses and Data Lakes (and their variations)?

A Data Warehouse (DW) for Marketing is a place where all the data necessary for the operation of marketing and media is structured and organized. It complements the analytics solutions (e.g. Google Analytics) with views that are beyond its scope, such as data from media vehicles, data in spreadsheets (e.g. media plans), and data from other systems (CRM, product transactional base etc). The main DW solution for Marketing that we work with is Google BigQuery (BQ), which is part of the Google Cloud Platform (GCP), but we maintain an independent view when looking at other vendors (such as AWS and Microsoft Azure). However, a DW is much more of a concept than a plug and play solution, so we normally build a DW according to business needs.

There are other common concepts related to DW, of which the most important is the Data Lake. The primary purpose of a Data Lake is to centralize the data, unlike the DW, which must be aligned with the actual use of the data in relation to business rules. These solutions are complementary and in some cases the line that divides them is not so clear. They may end up being compositional concepts, not just a choice of one or the other.

Source: DataCamp

This architecture brings a series of benefits to the marketing area and to the business:

  • Assistance in making the right decisions.
  • Consolidated data from multiple sources.
  • Historical data analysis.
  • Quality, consistency and accuracy of data.
  • Flexibility in the use of data for Machine Learning and media activation.
  • Well-defined data lifecycle and orchestration.

Implementation of the Data Warehouse concept

The construction of DW solutions for marketing is complex and requires adequate planning, so that projects achieve their goals. You need to identify all essential data sources, define the frequency of updates, determine the necessary treatments of the data so that they meet business needs, define a data management methodology etc. To deliver successful results, it will be necessary to have highly qualified professionals with different profiles, such as:

Business Analyst — BA: This is the person who is responsible for understanding the customer’s problems and objectives, monitoring their market, raising project requirements, analyzing information from different sources, and preparing reports. In addition, they provide recommendations for optimizing websites and campaigns in a clear and simple way.

Data Engineer — DE: This key member of the team is responsible for gaining a deep understanding of the customer’s situation, mapping their needs and all possible data sources, in order to plan consistent architecture that will take care of the application of best practices for the DW, processing and building databases, and making data available for analysis by MarTech and AdTech tools. They will also be responsible for maintaining the data lifecycle, keeping the DW documented, and ensuring that it is accessible to everyone in the company.

Common architecture of a data warehouse system

Robust Data Warehouse architecture contains a series of technologies, which can be based on Cloud, On-Premise (own server), or a hybrid of various models and providers. The identification of the variety of data sources is another key factor for a successful model, as many media platforms still do not have automated systems for data extraction. It is necessary to use the most diverse data engineering resources to perform consistent extraction on this data.

For this, you must implement what we call ETL:

Extract: Techniques for extracting data, from connecting to an API, to loading an Excel file into the data flow.

Transform: Alignment and cleaning of data according to business rules.

Load: Save the data in the best database architecture for the business.

The entire ETL flow must be monitored by a data quality layer that will ensure the accuracy, assertiveness and veracity of the data.

Consuming a Data Warehouse

Once the data is stored and processed, it is essential that it is used in the processes of the marketing area, via automated reports and dashboards, exploratory analysis, audience segmentation, and the use of machine learning to extract the full potential of DW.

Source: Google Data Studio Gallery

Data only generates value “in motion”, so it is critical to have easily accessible and coherent data available to the company’s operation.

Innovations and Results Achieved

In addition to all the care taken while creating and implementing the DW, it is necessary to commit to the evolution, innovation and quality of the data area. To contribute to the market, DP6 has a repository of open source projects that aim to help the market in general to develop more robust analytics solutions, as we believe that sharing knowledge is the best way to ensure the healthy evolution of services and tools.

Let’s talk about two solutions

  • Raft Suite Hub: This is an ecosystem created by DP6 to ensure Data Quality on data engineering projects implemented with DP6 customers, using automated data monitoring.
  • Marketing Data Sync: This is a solution for integrating DW data with media and analytics tools, such as Google Analytics 360, Google Ads, Facebook Business Center and Google Campaign Manager.

DP6 works in a consultative manner with the integration of data in the customer’s Data Lake or Warehouse (in house), or in Cloud solutions and marketing suites such as Adobe, Salesforce, Oracle and Google. The aim is to develop the collection, automation and integration of data for the unique identification of your customers, combining data from CRM, media, web analytics, applications, email, SMS, relationship rules and other touch points.

Profile of the Author: Robson Oliveira | Graduated in Information Systems from São Judas Tadeu University, he is a Data Engineer at DP6 who is passionate about Data Analysis and Business Intelligence and is always looking for the best way to extract insights from the data around him.

Profile of the author: Bianca Santos | Interested in the wide world of technology, she has been working at DP6 for years on the implementation, management and maintenance of a variety of technical solutions aimed at improving the entire data cycle, such as collecting, structuring, crossing, and converting data to any format necessary to extract the intelligence contained within it.

--

--