Published in cdapio·Sep 3, 2019A Definitive guide to Wrangler User Defined Directive (UDD)CDAP Wrangler makes it delightful to transform, cleanse, standardize, harmonize, DQ checks, and enrich data in a code-free manner within data pipelines[1]. While Wrangler provides a ton of built-in functions and Directives to manipulate data, there will always exist gaps. …Cdap5 min read
Published in cdapio·Jul 22, 2019Introduction to CDAP WranglerIt’s often the case that you deal with incomplete or messy datasets all the time. Data from varied sources can be unusable in the beginning but once the data is transformed, mapped and cleansed it becomes usable. …Big Data4 min read
Published in cdapio·May 28, 2019Journey Continues — Onwards and Upwards!Hello everyone! It’s nice to be back after a long pause. Has been a while since we have blogged on CDAP. It’s this month, last year that Cask was acquired and since then a lot has happened with CDAP as well as around it. Before I get into the details…Google Cloud Platform3 min read
Published in cdapio·May 26, 2019Building a Data Lake on Google Cloud Platform with CDAPIt is no secret that traditional platforms for data analysis, like data warehouses, are difficult and expensive to scale, to meet the current data demands for storage and compute. And purpose-built platforms designed to process big data often require significant up-front and on-going investment if deployed on-premise. Alternatively, cloud computing…Google Cloud Platform11 min read