Announcing CDAP 6.2.0 Release

Edwin Elia
cdapio
Published in
2 min readJun 1, 2020

On behalf of the CDAP community, it is my pleasure to announce the release of CDAP version 6.2.0. This release introduces Replication, an easy way to replicate changes from transactional databases into analytical data warehouses. It also enhances the Google Cloud Dataproc runtime provisioner to use the native Google Cloud Dataproc’s job APIs. Additionally, it includes a few improvements to the Pipeline Studio that enhance the user experience of building pipelines.

Replication

Replication allows users to create replication pipelines easily. The user interface guides users through the steps of configuring the source database and then selecting the tables and columns from the database to be replicated. Once users have done adding the target configuration, the system will run an assessment of the configuration to determine whether there is any potential issue that needs to be addressed before deploying the pipeline. An assessment stage also reports on the possible issues during replication, including data type mappings between the source and target databases.

Select tables and columns to replicate

Google Cloud Dataproc Runtime Improvement

Previously, Google Cloud Dataproc runtime was utilizing SSH for job submission. This resulted in a requirement that port 22 be open for the environment running CDAP. With this improvement, the job submission uses native Google Cloud Dataproc APIs, thus not requiring port 22 to be open anymore.

Pipeline Studio Improvements

Users can now select multiple plugins by dragging and making selections. Once the plugins are selected users can move, copy, or delete the plugins. Additionally a right click is now possible in the Pipeline Studio canvas. By right clicking, users can add a new wrangler connection or do common actions such as zooming and aligning the plugins.

Right click on the canvas to open the menu

Download CDAP 6.2.0 today and take it for a spin! Also consider helping us develop the platform by reaching out to the community with any comments, feedback, suggestions, or improvements or by creating and following JIRA issues and submitting pull requests.

For Hadoop distributions packages, you can build them from the following repositories:

--

--

Edwin Elia
cdapio
Writer for

Edwin is a Senior Software Engineer for Netflix, previously at Google Cloud. He specializes in Data Analytics User Interface.