CDAP 3.1 adds MapR support, Spark integration, enhanced Datasets and much more!

cdapio
cdapio
Published in
2 min readApr 19, 2019

August 4, 2015

Shankar Selvam is a software engineer at Cask where he is building software to enable the next generation of data applications. Prior to Cask, he worked on Hadoop and HBase performance evaluation/analysis at Intel.

We are excited to announce the release of the Cask Data Application Platform (CDAP) v3.1.0. In this release we have added support for MapR, that provides users with more distro choice when using CDAP. Furthermore, this release expands our footprint to support CDH 5.4, HDP 2.2 and Apache Hadoop with Hbase 1.0 and Hive 1.1.

In a previous release of CDAP we introduced Spark integration as an experimental feature, with Spark programs running in standalone mode only. We are now proud to support Spark 1.2 and 1.3 for distributed CDAP. This means that CDAP users will have a wider choice of processing paradigms with the ability to run MapReduce, Realtime, Spark on production use-cases.

In addition we made number of improvements to CDAP with v3.1, including

  • Enabling Workflow token persistence
  • Custom and system metadata for fileset partitions
  • Incremental processing in workflows for partitioned filesets
  • Ability to consume existing files in HDFS as CDAP datasets
  • An quick and easy way to create real-time and batch ETL pipelines via the UI.

A complete list of new features, improvement, and bug fixes available in this release can be found in the Release Notes.

CDAP v3.1 also introduces an easy way to create real-time and batch ETL pipelines via the UI, which makes it very easy to set up and configure your Realtime or Batch ETL pipelines.

Check out CDAP 3.1 (download here), give it a whirl and let us know your feedback. Help us make CDAP better by sending us your questions or suggestions to CDAP user group.

--

--

cdapio
cdapio
Editor for

A 100% open source framework for building data analytics applications.