Member-only story
CDP part 3: Data Services activation on CDP Public Cloud environment
15 min readJun 29, 2023
One of the big selling points of Cloudera Data Platform (CDP) is their mature managed service offering. These are easy to deploy on-premises, in the public cloud or as part of a hybrid solution.
The end-to-end architecture we introduced in the first article of our series makes heavy use of some of these services:
- DataFlow is powered by Apache NiFi and allows us to transport data from a large variety of sources to a large variety of destinations. We make use of DataFlow to ingest data from an API and transport it to our Data Lake hosted on AWS S3.
- Data Engineering builds on Apache Spark and offers powerful features to streamline and operationalize data pipelines. In our architecture, the Data Engineering service is used to run Spark jobs that transform our data and load the results to our analytical data store, the Data Warehouse.
- Data Warehouse is a self-service analytics solution enabling business users to access vast amounts of data. It supports Apache Iceberg, a modern data format used to store ingested and transformed data. Finally, we serve our data via the Data Visualization feature that is built-in the Data Warehouse service.
This article is the third in a series of six: