How to Discover Your Sap HANA Assets in Google Data Catalog

Learn how to ingest SAP HANA metadata to Google Data Catalog and extend it with your user needs

Google Data Catalog has community-maintained tools like the open source connectors, to ingest metadata from different data sources:

Data Catalog connectors use cases, from official docs.

The target audience for this article should have a basic understanding of Data Catalog features and the available connectors. This article is focused on the SAP HANA connector and a new feature that enables the existent RDBMS connectors to ingest any SQL statement into Data Catalog.

For more context in the existent connectors please check the previous post talking about the RDBMS connectors.

Disclaimer: All opinions expressed are my own, and represent no one but myself…. They come from the experience of participating in the development of fully operational sample connectors, available on: GitHub.

What is SAP HANA?

SAP HANA is a high-performance in-memory database that speeds data-driven, real-time decisions and actions. As the data foundation of SAP’s Business Technology Platform, it provides advanced analytics on multi-model data, on premise and in the cloud. From SAP official docs.

SAP HANA data masking

One of its interesting features is using data masking and obfuscation on columns like a social security number data:

From SAP HANA docs

In SAP HANA, when you create Tables you can specify the type of data mask used — inline or with functions.

So users without the proper access, like in the image above, won’t be able to view the sensitive data.

A sample of how it works, first creating the Function:

Then we can use the Function we’ve created in a Table:

The main idea here is, besides ingesting the basic metadata like Schemas, Tables and Columns.

We can ingest metadata about those Functions and discover all Tables that depend on it, and could contain sensitive data.

Tell me more…

The Connector

It is available in the GitHub: google-datacatalog-saphana-connector. The usual structure of the RDBMS connector, ingests the basic metadata from the source SQL database:

From google-datacatalog-rdbms-connector

Now with the SQL Objects component, it also supports ingesting any SQL statement into Data Catalog:

From google-datacatalog-rdbms-connector

We are going to use it to ingest the Functions type from SAP HANA, and look at each required file to enable its usage.

SQL Objects component

This component is set up by creating 3 files:

  • Metadata Definition
Omitted some fields to improve readability

The Metadata Definition file contains the mapping between the SQL Query and the Data Catalog Entry and Tag types to be ingested.

  • SQL Query
Omitted some fields to improve readability

The SQL Query file contains the SQL statement with the fields queried from the source database.

  • Feature Flag
Omitted some fields to improve readability

And finally, the Feature Flag enables users to activate/disable the SQL Object type according to their needs.

When the RDBMS connector implements those files, the connector will ingest that SQL Object type. At the moment of this writing we have two RDBMS connectors using the component:

  • SAP HANA set up with functions.
  • SQL Server with functions, triggers, and stored procedures.

But it can be extended to all the other RDBMS connectors by just providing the demonstrated configuration files.

Execution in 4 steps

At last let’s ingest the SAP HANA metadata with the Functions SQL Object config, following 4 steps:

  1. Install the Connector

2. Set up the Service Account

If you are running the connector in a GCP environment, like a compute engine VM, Cloud Functions, Cloud Run, App Engine… you don’t need to download the service account file.

3. Set up environment variables

4. Run Connector

In case you want a quick overview, I’ve put together a demo video showing the execution:

Results

After running the connector, we are able to discover the main assets from a SAP HANA database like Tables, Schemas, and Functions.

1 Search results

1. SAP HANA search results

Data Catalog indexes the Table and Columns metadata, and we have access to Functions that mask data on each Column and may work with sensitive data:

2 Ingested Table

2. SAP HANA ingested Table

We can even look at the Function metadata, and leverage Data Curators to enrich it with additional Tags, like if the Function is responsible for anonymizing data.

3 Ingested Function

SAP HANA ingested Function

Finally, we can discover the Functions which work with anonymized data and Tables that use them:

4 Finding dependent table

SAP HANA finding dependent table

Having that metadata discoverable can help us in many ways, like when we think about data lineage. If we need to change that credit_mask function, how do we know which data assets will be affected?

A good start is by making it easily discoverable :).

The sample connector

All topics discussed in this article are covered in a sample connector, available on GitHub: google-datacatalog-saphana-connector.

It is also available on PyPI, google-datacatalog-saphana-connector, if you don’t want to bother cloning the GitHub repo.

Feel free to get it and run it according to the instructions. Contributions are welcome, by the way!

It’s licensed under the Apache License Version 2.0, distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

Closing thoughts

In this article, we have covered how to ingest metadata from SAP HANA into Google Data Catalog. We also went through the SQL Objects component that helps us ingest user-defined Functions into Data Catalog, by doing it we can improve our decision making, knowing what data assets depend on them and could possibly contain sensitive data. That’s it, cheers!

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store