Welcome to the third installment of this four part series. In the first article I discussed some of the concepts related to continuous integration and testing. In the second article we got into some hands-on examples for extracting pipelines from CDF/CDAP and used GitHub as a repository for storing pipelines and related artifacts.
In this article we’ll discuss the process of migrating artifacts from GitHub into a TEST, QA, or PROD environment, and explore automation options by leveraging the API more broadly.
Now that you have your pipelines checked into GitHub, deploying those pipelines onto another environment, like Cloud Data…
Welcome to the second article in this four part series. In the first article I discussed some of the concepts related to continuous integration and testing. In this article we’ll get into some hands-on examples for extracting pipelines from CDF/CDAP and use GitHub as a repository for storing pipelines and related artifacts.
I will cover the following topics in this article:
Welcome to my latest series on continuous integration of data pipelines with Cloud Data Fusion (CDF) and/or CDAP. This will be a 4 part series of articles where I’ll discuss the promotion process of data pipelines across multiple environments and all the tools and techniques that we’ll use along the way.
Whenever we consider a development lifecycle in an enterprise setting there are a number of gates that a product has to go through before being released to production. Typically we do development in a segregated development environment, most often this is our very own laptop. Artifacts that have completed…
To really become a Ninja with Wrangler Directives you have to get to know all the functions that Wrangler supports. In this article I’m going to list out all the Wrangler functions with a short description of what each one does.
At the time of this writing the Wrangler code branch on GitHub is at version 4.1 for the latest release. The link to the Wrangler functions documentation can be found here.
In a previous article I discussed how you can use JEXL expressions in your directive, and these functions are no different. …
In a previous article I showed you how to get started with plugin development in CDAP. But, wouldn’t it be nice if we can attach a debugger to a deployed plugin and see how it performs inside of a pipeline with actual data?
I’m going to show you how to attach a debugger to a CDAP sandbox instance so that we can inspect the data that is being processed by the plugin. …
By now you probably know that you run CDAP pipelines on Cloud Data Fusion (CDF), but did you know that you can also control your CDF instance remotely using the rest API. In this blog I will walk you through the process of deploying and starting a CDAP pipeline on CDF using only a REST client. We’ll use the handy curl utility since it is easy to work with and is available on most platforms.
Using the REST API you can do quite a lot with CDF. You can create an instance from scratch, retrieve information about running instances, deploy…
One of my favorite features of CDAP is that the extensibility of the platform allows you to add new functionality yourself. If you need to perform a transformation, or need to source or sink data to or from a system that is not currently available in the plugin ecosystem, you can easily add your own plugin to provide that capability.
Getting started with plugin development is as simple as cloning one of the example plugins and modifying it to add your own implementation logic. In this article I will cover what you’ll need to know to get started with plugin…
CDAP provides a number of ways to process data in a pipeline, and one of the most flexible ways to build advanced data processing logic is through the use of directives with JEXL expressions in Wrangler. Transformation logic can be created in CDAP studio directly where a number of CDAP plugins are used to source, transform, and sink data by building a directed acyclic graph (DAG) of processing logic, and each of the transformations is done in sequential phases via purpose built transform plugins.
Wrangler is itself a transform plugin, but you can think of it as the Swiss Army…
In part 2 of this series we set up and tested connectivity to the Cloud SQL databases for MySQL and Postgres. In this final article I’ll walk you through he process of loading data to Cloud SQL and using both MySQL and Postgres database instances to build a pipeline where we can join data from disparate databases and write the resulting data to yet another database, BigQuery to be exact.
The fist step requires that we load some data to our database. There are a number or sample databases available on the internet when working with MySQL or Postgres, and…
This is part one of a three part series that will walk you through the following concepts:
The concepts covered in this series will allow you to either configure CDAP locally or use Cloud Data Fusion, Google’s cloud managed version of CDAP. …
Tony is a lifelong tinkerer that is passionate about Data and Analytics.