Welcome to the third installment of this four part series. In the first article I discussed some of the concepts related to continuous integration and testing. In the second article we got into some hands-on examples for extracting pipelines from CDF/CDAP and used GitHub as a repository for storing pipelines and related artifacts.

In this article we’ll discuss the process of migrating artifacts from GitHub into a TEST, QA, or PROD environment, and explore automation options by leveraging the API more broadly.

Now that you have your pipelines checked into GitHub, deploying those pipelines onto another environment, like Cloud Data…

Welcome to the second article in this four part series. In the first article I discussed some of the concepts related to continuous integration and testing. In this article we’ll get into some hands-on examples for extracting pipelines from CDF/CDAP and use GitHub as a repository for storing pipelines and related artifacts.

I will cover the following topics in this article:

  • Creating a checklist of all the artifacts you will need to test in a target environment
  • How to set up a GitHub project to house our pipeline artifacts
  • How to export pipelines from CDF/CDAP using the export tools in…

Typical CI/CD process for data pipelines

Welcome to my latest series on continuous integration of data pipelines with Cloud Data Fusion (CDF) and/or CDAP. This will be a 4 part series of articles where I’ll discuss the promotion process of data pipelines across multiple environments and all the tools and techniques that we’ll use along the way.

Change Control and CI/CD

Whenever we consider a development lifecycle in an enterprise setting there are a number of gates that a product has to go through before being released to production. Typically we do development in a segregated development environment, most often this is our very own laptop. Artifacts that have completed…

Photo by Dave Clubb on Unsplash

To really become a Ninja with Wrangler Directives you have to get to know all the functions that Wrangler supports. In this article I’m going to list out all the Wrangler functions with a short description of what each one does.

At the time of this writing the Wrangler code branch on GitHub is at version 4.1 for the latest release. The link to the Wrangler functions documentation can be found here.

In a previous article I discussed how you can use JEXL expressions in your directive, and these functions are no different. …

In a previous article I showed you how to get started with plugin development in CDAP. But, wouldn’t it be nice if we can attach a debugger to a deployed plugin and see how it performs inside of a pipeline with actual data?

I’m going to show you how to attach a debugger to a CDAP sandbox instance so that we can inspect the data that is being processed by the plugin. …

Photo by Matthew T Rader on Unsplash

By now you probably know that you run CDAP pipelines on Cloud Data Fusion (CDF), but did you know that you can also control your CDF instance remotely using the rest API. In this blog I will walk you through the process of deploying and starting a CDAP pipeline on CDF using only a REST client. We’ll use the handy curl utility since it is easy to work with and is available on most platforms.

Using the REST API you can do quite a lot with CDF. You can create an instance from scratch, retrieve information about running instances, deploy…

Photo by Steve Johnson on Unsplash

One of my favorite features of CDAP is that the extensibility of the platform allows you to add new functionality yourself. If you need to perform a transformation, or need to source or sink data to or from a system that is not currently available in the plugin ecosystem, you can easily add your own plugin to provide that capability.

Getting started with plugin development is as simple as cloning one of the example plugins and modifying it to add your own implementation logic. In this article I will cover what you’ll need to know to get started with plugin…

Photo by Shahadat Rahman on Unsplash

CDAP provides a number of ways to process data in a pipeline, and one of the most flexible ways to build advanced data processing logic is through the use of directives with JEXL expressions in Wrangler. Transformation logic can be created in CDAP studio directly where a number of CDAP plugins are used to source, transform, and sink data by building a directed acyclic graph (DAG) of processing logic, and each of the transformations is done in sequential phases via purpose built transform plugins.

Wrangler is itself a transform plugin, but you can think of it as the Swiss Army…

Part 3 of 3 — DB Data Pipeline

In part 2 of this series we set up and tested connectivity to the Cloud SQL databases for MySQL and Postgres. In this final article I’ll walk you through he process of loading data to Cloud SQL and using both MySQL and Postgres database instances to build a pipeline where we can join data from disparate databases and write the resulting data to yet another database, BigQuery to be exact.

The fist step requires that we load some data to our database. There are a number or sample databases available on the internet when working with MySQL or Postgres, and…

Part 1 of 3

How to integrate CDAP with Google CloudSQL

This is part one of a three part series that will walk you through the following concepts:

  1. Configure Google CloudSQL with two databases; MySQL and Postgres.
  2. Build the custom JDBC drivers to connect to the CloudSQL databases with CDAP.
  3. Load databases with sample data, and create a pipeline that joins data from two separate databases in a single pipeline.

The concepts covered in this series will allow you to either configure CDAP locally or use Cloud Data Fusion, Google’s cloud managed version of CDAP. …

Tony Hajdari

Tony is a lifelong tinkerer that is passionate about Data and Analytics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store