Hello everyone! It’s nice to be back after a long pause. Has been a while since we have blogged on CDAP. It’s this month, last year that Cask was acquired and since then a lot has happened with CDAP as well as around it.
Before I get into the details on what we have been up-to in the CDAP world, I would like to thank all our users, partners, and customers for their tremendous support during this transition. You all have been incredible during last 12 months. In addition to valuable feedback and contributions, you all have collaborated in the true spirit of open source to help make CDAP better and I would like to thank you for that.
Now, as part of Google Cloud our commitment to CDAP open source is even stronger than before and we are looking forward to building a more vibrant CDAP community with you. Since the time we were acquired we have had 5 OSS releases. Big Data Application Meetup (BDAM) has also not missed a beat, since our acquisition we have had 4 meet ups with interesting use case talks from various companies in the valley. We will continue to curate and build further on existing 2k+ subscribers.
One of the major achievements for the team has been the integration of CDAP with Google Cloud Platform (GCP) stack and after 11 months of work I am proud to announce that Cloud Data Fusion (CDF) - a managed, cloud native data integration service fully powered by CDAP is available on GCP as public beta. CDF is one of the first accelerators of CDAP to be announced as managed offering. CDF is the same code base (no fork) as CDAP with customizations necessary for running as a managed offering. If you are interested in taking it for a spin, it is available on left-bar in Google Cloud Console. Here is my talk on “Cloud Data Fusion: Data Integration on Google Cloud” at Google Next’ 19 in SF:
Today, I am also happy to announce the release of CDAP 6.0. This release has a few major enhancements including the ability for running CDAP on Kubernetes (K8S) (CDAP Kubernetes Operator available on GitHub). While support for running on Hadoop natively exists, K8S support would allow one to run CDAP on a separate cluster and be able to run jobs on a Hadoop cluster. This release also gives users an options to store CDAP metadata in storage system other than HBase like PostgreSQL, with improved stability, hardened Sandbox and much more. CDF service uses the CDAP K8S operator. As for metadata, CDF uses PostgreSQL for storing all of the metadata.
While we have added new capabilities to CDAP, we also cleaned the stack as part of cleaner technology stack initiative. CDAP Flow and CDAP Stream were deprecated a year ago, with CDAP 6.0 they have been removed. We recommend users to use Spark Streaming instead of CDAP Flow and Apache Kafka in place of CDAP Stream. 6.0 being a major release there are few incompatible changes that you should be aware of. All the incompatible changes are listed here. If you have any questions, please use one of the CDAP OSS channels to reach us.
Journey continues, looking forward to simplifying Big Data, Edge and Cloud with you — Onwards and upwards!