Published in


Announcing CDAP 6.2.0 Release

On behalf of the CDAP community, it is my pleasure to announce the release of CDAP version 6.2.0. This release introduces Replication, an easy way to replicate changes from transactional databases into analytical data warehouses. It also enhances the Google Cloud Dataproc runtime provisioner to use the native Google Cloud Dataproc’s job APIs. Additionally, it includes a few improvements to the Pipeline Studio that enhance the user experience of building pipelines.


Replication allows users to create replication pipelines easily. The user interface guides users through the steps of configuring the source database and then selecting the tables and columns from the database to be replicated. Once users have done adding the target configuration, the system will run an assessment of the configuration to determine whether there is any potential issue that needs to be addressed before deploying the pipeline. An assessment stage also reports on the possible issues during replication, including data type mappings between the source and target databases.

Select tables and columns to replicate

Google Cloud Dataproc Runtime Improvement

Previously, Google Cloud Dataproc runtime was utilizing SSH for job submission. This resulted in a requirement that port 22 be open for the environment running CDAP. With this improvement, the job submission uses native Google Cloud Dataproc APIs, thus not requiring port 22 to be open anymore.

Pipeline Studio Improvements

Users can now select multiple plugins by dragging and making selections. Once the plugins are selected users can move, copy, or delete the plugins. Additionally a right click is now possible in the Pipeline Studio canvas. By right clicking, users can add a new wrangler connection or do common actions such as zooming and aligning the plugins.

Right click on the canvas to open the menu

Download CDAP 6.2.0 today and take it for a spin! Also consider helping us develop the platform by reaching out to the community with any comments, feedback, suggestions, or improvements or by creating and following JIRA issues and submitting pull requests.

For Hadoop distributions packages, you can build them from the following repositories:




CDAP is a 100% open-source framework for build data analytics applications

Recommended from Medium

Guide To Start A Famous DeFi Project Like PancakeSwap

Intricacies of IAM Conditions

Cyber Security For Beginners: Part 9

Stop the Arguments: ITIL v4 and SRE and DevOps All Are Transformation Aids

understanding algorithms and data structures

Software Deployment Best Practices

Top Custom Web Development Framework Techniques that You Must Know

Unique to OpenLM: CATIA Save & Close functionality

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Edwin Elia

Edwin Elia

Edwin is a Software Engineer for Cloud Data Fusion at Google. He specializes in Front End development, specifically Data Analytics user interface.

More from Medium

Business teams and the insatiable need for Data Visualisation, what does it entail?

Question, Preparation, Discovery and Action

Data Analytics — Notes for Tyro

So, your Tableau dashboard is slow