CDAP — Static Mapper — transform

Neil Kolban
2 min readOct 16, 2020

--

When we write a CDAP pipeline, we commonly wish to transform the data received at the source to the data that we wish to expose at the sink. CDAP provides a rich set of features to achieve this task. A common data transformation we find is the need to map data values by lookup. Consider a table that contains US state names and their corresponding 2 character code:

  • Alabama -> AL
  • Alaska -> AK
  • Wyoming -> WY

Now imagine we have a source of data in CDAP that looks like:

Our goal would be to take the value of the state field which we assume to contain a US state name. We would then use that value as the key in the lookup table and set the output value in the state field to the corresponding value in the lookup table.

A custom plugin has been created that is called Static Mapping. Within its configuration, we can specify a JSON array of objects. Each object contains fields and values. It is these that describe our mapping. An example for our story here might then be:

[
{
"state_long": "Alabama",
"state_short": "AL"
},
{
"state_long": "Alaska",
"state_short": "AK"
}
{
"state_long": "Wyoming",
"state_short": "WY"
}
]

Once the mapping table has been provided, we can describe the individual mappings we want to achieve. Each mapping is described by four parameters. The first is the field in the input data that we wish to use as the source of the lookup. In our example, this would be “state”. The next parameter is the name of the field in our JSON data that is used to select the correct record/object. In our example this would be “state_long”. Next we would specify the resulting field name in our JSON data that we want to use as the final value. In our example, this would be “state_short”. Finally, we can specify an optional default value to use if there is no match in our lookup table.

A full example of the configuration can be seen here:

The custom plugin can be found on Github and is available in both source and binary forms. The binary is in the repository release section.

A YouTube video illustrating the plugin is available.

--

--

Neil Kolban

IT specialist with 30+ years industry experience. I am also a Google Customer Engineer assisting users to get the most out of Google Cloud Platform.