Building a custom Apache Atlas Hook

Rahul Nandi
The Startup
Published in
4 min readAug 13, 2019

Apache Atlas is a metadata management tool built for Hadoop ecosystem. An Atlas Hook is an application that listens to the source system event logs for any metadata changes and notifies Atlas to reflect the same. Building an Atlas Hook is fairly easy. Though it can be tricky if the source system does not have adequate logging available.

The target audience for this article should have a basic understanding of the Atlas features and the Type System in Atlas. This article is focused on what it takes to write an Atlas Hook from scratch and things to consider.

Atlas has predefined types (hive_table, hive_column, etc) to represent metadata objects of the supported systems. Hooks listens to the source system event logs to generate/update metadata objects of the respective types.

Let’s say we are creating a database table that generates the following log in the source system.

INFO:: CREATING TABLE user WITH COLUMNS name, age, gender

This information can be parsed and sourced into Atlas to be available as a metadata object. But first, we need a placeholder in Atlas to hold this information. For that, we have to define types.

We can create 2 types “table” and “column” to hold this information. This will require us to call “entity def” API. This is a one-time operation. Once those 2 types are created, we can start parsing the log message to extract information and generate an appropriate atlas entity. In this case, we will create an atlas entity to represent the table called “user” and 3 column entity namely “name”, “gender”, “age”. We can also create a relationship between “table” and “column” to represent a table that can have multiple columns.

Similar way, we can also build our own Apache Atlas Hook application. To do so, we need to make sure the source system for which we are trying to build the hook, does have adequate logging. Atlas hook can interpret that information to generate corresponding metadata objects to be sourced inside Atlas. The interesting bit here is knowing how to parse the log message and map that information into the corresponding type entity.

Apart from the metadata object, the lineage and relationships can also be updated in a similar fashion. By now you might have already understood that the richness of the metadata object depends on the source system event logs. So, before you start writing a hook, make sure there are enough logs available for the source system.

Atlas Architecture Diagram
Apache Atlas Architecture Diagram

Two ways atlas can be notified about the metadata changes, API or Kafka Topic. This is represented in the above diagram in the Integration layer. The JSON structure of the notification message is almost the same for both. The preferred way is to publish the notification messages to the ATLAS_HOOK topic. Any notification message passed to the ATLAS_HOOK topic is processed automatically.

Before you jump into writing an Atlas Hook application, spend some effort to identify all information you wanted to capture in the metadata object. Once you are clear with that, check to see whether all those pieces of information are available at the source system or not. Also, be mindful of the level of abstractions you wanted to maintain for your metadata objects. Validate these assumptions with the users of the Atlas UI. This will help to design a system that is useful and accurate.

The next step to consider is to define a strategy for handling Type evaluation. Types can be thought of as the DDL statements in SQL database. There is no concept of Type migration available as of now in Atlas. So, in case of any kind of error occurs after the types get evolved, you have to consider how to handle the rollback of the system. Because atlas uses JanusGraph to maintain all information, you might have to do a backup and restore of the JanusGraph data. Which is basically again relies on the HBase and Solr backup and restore capability. All this while you also need to make sure the Hook application is in sync with the types present in the Atlas system.

I hope this gives you a basic understanding of how to start writing an Atlas Hook application.

--

--

Rahul Nandi
The Startup

Data Engineer, enthusiastic in coding, big data systems, finance. If you like my work, buy me a coffee at https://www.buymeacoffee.com/rahulnandi