Apr 24 · 6 min read

December 21, 2016

Vinisha Vyasa is a software engineer at Cask, where she is developing software to enable big-data developers to quickly build data-centric applications. Prior to Cask, she worked on Container Monitoring as a Research Engineer at Ericsson.

We are very happy to introduce the general availability of the 4th generation of Cask’s flagship product — CDAP 4. This release builds on what we learned over the past few years from our users and the community. This post summarizes the major enhancements in CDAP 4, namely, New & Revamped User Experience, Cask’s “Big Data App Store”, Cask Market, the new Cask Wrangler Extension, Cask Hydrator Enhancements, and New Platform Features & Improvements.

New & Revamped User Experience

CDAP 4 contains a completely revamped user experience that focuses on improving speed and productivity by thoughtful reduction of the number of clicks users need to make to get things done. The new user experience is centered around three major themes:

I. Jump Button and Fast Actions

One of the unique aspects of CDAP is the combination of the platform and multiple extensions, each providing different functionality across the same datasets. The Jump button allows you to “jump” to various parts of the product for a given entity. For example, jump to lineage information in Cask Tracker from CDAP search results or create a pipeline in Cask Hydrator from a dataset you are analyzing in Cask Tracker. Additionally, we provide fast actions that give single-click access to frequently used actions for various entity types. These fast actions icons are customized for each entity. For example, you have fast actions like exploring a dataset or deleting an application.

II. Global Navigation Refresh and Plus Button

CDAP and its extensions now have a consistent look and feel for navigation. We have put a global “Plus” button so you can get to the Cask Market and Resource Center from no matter where you are.

III. Cleaner Card Views

The new UI presents card views of all entities including Applications, Streams, and Datasets . This allows users to view and filter the entities that they are interested in, get a quick snapshot, and instantly access the jump button and fast actions.

There are other user improvements such as a “spotlight search”, easy navigation via keyboard shortcuts, an improved management screen for viewing operational stats of Hadoop ecosystem components, and a splash screen for “new user” onboarding experience.

Cask’s “Big Data App Store”, Cask Market

Cask provides an ecosystem of pre-built big data solutions, reusable templates, and plugins via our new big data app store, Cask Market. Within CDAP, users can access the market and deploy pre-built Hadoop solutions and big data applications with easy to use guided wizards. Enterprises can create their own internal ecosystem by hosting a private instance of Cask Market, fostering discoverability and reusability in their controlled environment.

Cask Wrangler Extension

One recurring theme across our customers is the pain experienced by data scientists and data engineers when performing data preparation like loading, cleansing, and transforming data. While the custom transformations operator in Cask Hydrator in our earlier release solved some of that problem, the new Cask Wrangler’s simple and interactive way now makes the data preparation process not only easier but more fun too. The result of the wrangling process is a set of rules and output schema which can be seamlessly integrated into a Cask Hydrator production pipeline.

Cask Hydrator Enhancements

I. Pipeline Preview

One of the most common requests from users is the need to actively debug pipelines with real, live data. We have introduced a feature to preview data pipelines within Cask Hydrator without deploying them, meaning, you can see the data as it flows through to create, debug, and fine-tune data pipelines. This allows users to create, enable, and deploy data pipelines correctly, significantly improving time to value.

II. New Pipeline Plugins

CDAP 4 also introduces a variety of new plugins including plugin to ingest and process Mainframe data, plugin for Amazon Kinesis, and plugin to Stream files in batch and Spark Streaming. We are also working with our customers to improve the traditional form of ingesting data via MapReduce using JDBC, and have introduced new Cask Hydrator plugins for faster export of data from relational databases like Oracle.

New Platform Features & Improvements

I. Transactional Messaging System

CDAP 4 introduces a foundational transactional messaging system for reliable messaging between different CDAP components and programs. This will enable many upcoming use cases that both the platform and programs need like reliably publishing and subscribing audit log messages for audit trail and lineage computation. With ACID transactional guarantees, combined with simple and easy-to-use APIs ensures messages can be published and consumed reliably with consistent, exactly-once delivery semantics.

II. Operational & Management Stats

CDAP 4 provides greater visibility into the components that CDAP relies on such as HDFS, YARN, HBase to bring in all the relevant operational metrics in a management user interface. This prevents administrators from having to switch over to multiple different UIs for getting relevant stats to efficiently manage and monitor these components operationally. These stats can also potentially help to debug issues with CDAP applications. Since Operational Stats are implemented as extensions, CDAP users can now bring in metrics from any system that they are dependent on and view them in the management screen of the new UI, by implementing a simple Operational Stats API. In addition, the stats are also published to JMX, so users can also monitor them using external tools such as JConsole and Ganglia.

Other Improvements

We made a number of other significant changes in CDAP 4, such as migrating from Angular to Reactas the front-end framework, allowing the control of the log level of distributed applications dynamically during runtime, and providing a much needed transactional check-pointing feature for Spark Streaming applications.

Supported Distributions

Previous releases of CDAP already supported CDH, HDP and MapR. With CDAP 4, enterprises and developers have even more choices in running CDAP in production. CDAP 4 is now available:

Learn more — take a look at the Cask Product Tour:

Download CDAP 4 and give it a spin. We actively welcome questions, comments and suggestions. Our user group is a great place to engage with the Cask team and the entire CDAP community.


CDAP is a 100% open-source framework for build data analytics applications


Written by


A 100% open source framework for building data analytics applications.



CDAP is a 100% open-source framework for build data analytics applications

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade