Ingest From Iceberg Tables with Cloudera DataFlow

Tim Spann
Cloudera
Published in
May 5, 2023

See: https://github.com/tspannhw/FLaNK-DataFlows/blob/main/jdbc/README.md

Reading from Apache Iceberg Tables with Cloudera DataFlow

Add a processor to your page to read, for example, ExecuteSQLRecord 1.20.0.2.3.8.1–1, and name it ExecuteSQLRecord Impala. You can use any that use JDBC connections such as

  • ExecuteSQL
  • ExecuteSQLRecord
  • QueryDatabaseTable
  • QueryDatabaseTableRecord

Processor Settings

  • Normalize Tables/Column Names: true
  • Use Avro Logical Types: true
  • Query
SELECT * FROM  `default`.tim_syslog_critical_archive

Set all your parameters in the processor.

Services Settings

First add a connection service for your processor.

Now you can add parameters to your service.

Set all the following parameters.

  • Service Name: DBCPConnectionPool Impala Iceberg
  • Database Connection URL:
jdbc:impala://oss-kudu-demo-gateway.oss-demo.qsm5-opic.cloudera.site:443/;ssl=1;transportMode=http;httpPath=oss-kudu-demo/cdp-proxy-api/impala;AuthMech=3;
  • Database Driver Class Name:
com.cloudera.impala.jdbc.Driver
  • Database Driver Location(s):
#{Database Driver Location}

Set parameter and then upload driver

  • Database User:
#{CDP Workload Username}
  • Password:
#{CDP Workload User Password}

Detailed Parameters

References

https://docs.cloudera.com/cdw-runtime/1.5.0/iceberg-how-to/topics/iceberg-data-types.html

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/