CDAP — Geocoding Transform

Neil Kolban
3 min readOct 22, 2020

--

Geocoding is the notion of taking an address and determining information about its location on the Earth. This is typically the geo position given by latitude and longitude coordinates. Recently, a CDAP/Data Fusion user had a list of addresses as input and needed to enrich the data with geocoding information. While Google Maps provides an easy to use API to perform this task, there was nothing baked into CDAP that would allow us to leverage this API. When we encounter situations which are not accommodated by the basic product we can create our own extensions using the documented CDAP plugin architecture.

We created a new transform plugin for CDAP and that is what is described in the remainder of this article.

From a usage perspective, we install the new plugin and then wire it into our pipelines in Studio just as we would for any other stage. In the following screenshot we see the Geocoding transform in the palette of available transforms as well as it being wired into the pipeline.

The configuration properties of the Geo-coding transform are what we will now consider. The first parameter we touch on is called “APIKey”. This is the Google Maps API key that you have obtained allowing you to leverage the services of Google Maps.

The next parameter is the field within the input record that contains the address that you wish to look up. It is expected that this will be a String field. The value contained within the field will then be supplied as a parameter to the request made to Google Maps.

The final parameter is a field that will be created in the output record that will contain the results of the Google Maps lookup. The input record will be copied to the output record and this new field will be added. The field is of record type and contains:

  • formattedAddress — String — The full address returned from Google Maps.
  • geometry — Record — Geometry information.
  • -> latlng — Record
  • -> -> lat — Double — latitude.
  • -> -> lng — Double — longitude

You do not need to define the structure in the output schema, it will automatically be added for you. The following is an example:

To conclude, here is a short video illustrating the installation and use of the plugin.

Note: Google Maps geocoding APIs have a cost associated with them. There is free allowance (1000 calls/month) and a charge of $5/1000 after that. You must factor that into your plans for using this transform.

Note: Google Maps geocoding constrains requests for an individual API key to no more than 50 queries per second. Typically that will be far more than you need but by the nature of big data processing, you should also consider this factor.

See also:

--

--

Neil Kolban

IT specialist with 30+ years industry experience. I am also a Google Customer Engineer assisting users to get the most out of Google Cloud Platform.