CDP part 1: introduction to end-to-end data lakehouse architecture with CDP

Adaltas
Adaltas
Published in
10 min readJun 19, 2023

--

Cloudera Data Platform (CDP) is a hybrid data platform for big data transformation, machine learning and data analytics. In this series we describe how to build and use an end-to-end big data architecture with CDP Public Cloud on Amazon Web Services (AWS).

Our architecture is designed to retrieve data from an API, store it in a data lake, move it to a data warehouse and eventually serve it in a data visualization application to analytics end users.

This series includes the following six articles:

  • CDP part 1: introduction to end-to-end data lakehouse architecture with CDP
  • CDP part 2: CDP Public Cloud deployment on AWS
  • CDP part 3: Data Services activation on CDP Public Cloud environment
  • CDP part 4: user management on CDP Public Cloud with Keycloak
  • CDP part 5: user permission management on CDP Public Cloud
  • CDP part 6: end-to-end data lakehouse usecase with CDP

Architectural considerations

The purpose of our architecture is to support a data pipeline that allows the analysis of variations in the stock price of multiple companies. We are going to retrieve data, ingest it into…

--

--

Adaltas
Adaltas

Open Source consulting - Big Data, Data Science, Node.js