Photo by George Prentzas on Unsplash

Summary

This article illustrates how we modernized a client’s data platform by implementing an entity resolution pipeline. We tied disparate data sets together and performed customer matching to create a Golden Customer Record using Google Cloud Dataflow. The solution enabled their Marketing/Analytics teams to derive valuable insights about their customers, make better-informed decisions for marketing campaigns, and explore new ways to improve customer experience/retention.

Preface

What is Entity Resolution?

Entity resolution (ER), otherwise referred to as record linkage or data matching, is the process of disambiguating — identifying, matching, and merging — different manifestations of the same real-world entity across disparate data sources.

For instance, a…


Summary

In this article, we introduce a modern data architecture paradigm known as the Data Lakehouse. The Data Lakehouse provides various advantages over traditional data lakes. We illustrate how the challenges of scalability, data quality, and latency faced by a client were addressed by modernizing their data platform and incorporating a lakehouse into their architecture. By the end of this article, you will be equipped with the fundamental knowledge to implement a Data Lakehouse using Apache Hudi.

Preface

With the evolution of IoT, cloud applications, social media, and machine learning in the past decade, the volume of data being collected by companies…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store