Streaming Change Events Building Iceberg(s)
CDC, Debezium, GoldenGate, Change Data Capture, Real-Time Events, Apache Iceberg, Apache NiFi, Cloudera DataFlow, Cloudera Data Platform
Source: https://github.com/tspannhw/FLaNK-Ice/tree/main
Apache Iceberg is a high-performance format for huge analytic tables ideal for building Open Data Lakehouses. The Cloudera Apache NiFi PutIcebergCDC processor is capable of applying CDC (Change Data Capture) operations on Iceberg tables using Hive Iceberg catalog.
Supported operation types
- c (Debezium create) or I (GoldenGate insert) — The record — specified in after field — will be inserted to Iceberg.
- d (Debezium delete) or D (GoldenGate delete) — The record — specified in before field — will be deleted from Iceberg.
- u (Debezium update) or U (GoldenGate update) — The record — specified in before field — will be replaced with the new content — specified in after field.
- r (Debezium read) — Read records are handled as create records, those records are inserted to Iceberg.
Let’s start landing data to our Open Data Lakehouse. This is critical for many use cases and for powering our Generative AI applications.
NiFi to Iceberg
Depending on when your table is stored, you may need to set some permissions.
Here is an example:
To give a user write access to an Iceberg table, we need to do two things:
Create a Ranger policy that grants write access to the table object.
Create a Ranger policy that grants “RW Storage” access to the storage URL.
Now we need to do:
Storage type: iceberg
Storage URL: DBname/Table*, or
Storage URL: DBname/*
Cloudera Data Platform Group Rules
DataCatalogCspRuleViewer
DFCatalogAdmin
DFCatalogViewer
DFFunctionMachineUser
EnvironmentUser
RESOURCES
https://www.youtube.com/watch?v=R2T6_eOnV8Y
https://www.youtube.com/watch?v=Q9Cys_N4iQQ
https://www.youtube.com/watch?v=aPSG8hmzbmc