Enabling Enterprise Governed Data Access

Indrajit Das
Zaloni Engineering
Published in
5 min readFeb 24, 2020

With the exponential growth of data both in terms of volume and variety it has become imperative for organizations to accommodate new data sources in order to tap into their constantly filling data lakes. They then need to provide data governance solutions to grant self-service access so business users may derive meaningful insights from their data.

However, the traditional data warehousing tools and processes do not add enough capabilities to plunge into those data lakes to meet the analytical needs of these organizations. They are in need of a solution to quickly access a catalog of the data assets and provision them in a governed way so they may be consumed in an analytics platform.

What is a data catalog?

A data catalog is a critical component of an organization’s data management platform which highlights data and enables self-service access to users so they may identify and understand how the data is used across the enterprise data consumption landscape.

An efficiently crafted data catalog provides a centralized place to store the information about the organization’s data assets in the form of a well structured metadata model and also facilitates utilization, enrichment and management of the data. The importance of Data Catalogs to the success of a Data Management strategy is emphasized in Gartner’s 2017 report, Data Catalogs are the New Black in Data Management and Analytics.

Providing secure and governed access

But the reality is that while data catalogs are a “must-have,” there are also risks associated with allowing access to data. In today’s highly regulated world, companies are facing challenges effectively securing and governing their data. Organizations also need to ensure that their data is clean and reliable.

Without appropriate data governance or data quality, data lakes can quickly turn into unmanageable data swamps. Data users know that the data they need lives in these swamps, but without a clear data governance strategy they won’t be able to find it, trust it or use it.

At Zaloni, we have enabled customers to build governed self-service data catalog from various data sources along with a built-in approval process for data consumption. The idea was to build a mechanism by which users can request access to specific datasets from the rich data catalog. Once access is granted to a user for selected datasets, a sandbox would be provisioned that allows the user to run queries on the dataset.

ServiceNow integration with data catalog

The Zaloni Data Platform’s (ZDP) self-service data catalog allows users to checkout datasets and provision them to be consumed for further analytics. However, in this case first an approval must be provided based on requests through an external ticketing system based on ServiceNow. This approval by the data owners validated a user’s access to the dataset for further consumption.

Additionally, a Virtual Machine(VM) based sandbox with appropriate tools was dynamically provisioned to support ad-hoc, exploratory ML model building. For data governance compliance, the VMs and datasets were made available for a specific lease period after which the access to the data had to be revoked.

Zaloni plus ServiceNow

In order to accomplish this goal, ZDP had to:

  • Integrate with ServiceNow
  • Provision approved datasets & VMs and required analytics tools
  • Data lease management of approved datasets & VMs
  • Track and report provisioned requests and approvals/denials

A new external plugin based approval mechanism was introduced into the provisioning pipeline in ZDP. The provisioning requests for the selected datasets from the users would be queued and the approval plugin would be triggered to submit a request to ServiceNow through an API call.

ServiceNow would then issue an approval ticket if the dataset is approved to be provisioned, otherwise a rejection is issued where ZDP would fail the provisioning request.

Defining the dataset provision with attributes like VM Details and Data Lease Duration
Reviewing the dataset provision request attributes
a screenshot of the provisioning screen on the zaloni data platform
Monitor and Audit the dataset provision requests

To ensure well-tracked data governance throughout the pipeline, there are audit entries for each and every approval request that is made. Data lineage is also maintained to track the data from the source to the destination to which it is provisioned.

ServiceNow integration

A plugin was created to integrate with ServiceNow to leverage existing enterprise deployments for approval of data provisioning requests.

The ServiceNow plugin in ZDP uses following fields for the requests submitted:

  • Datasets: list of datasets requested for approval
  • Intended Use: details on intended use of provision request
  • Tools: which tools will be user by provisioning user
  • Expected Output: what is the expected output of the analytics performed on the provisioned dataset
  • Lease Duration: from and to date of data availability for analysis to the provision user

Other extra properties that will be passed to ServiceNow are:

  • ZaloniID: this is the correlating ID used by ZDP to correlate requests with ServiceNow.
  • Approvers: list of approver SIDs for ServiceNow to create approval tickets. These could be data owners, data stewards responsible for granting access to the data and approving such provisioning requests.
  • Requestor: requestor SID of the dataset requestor
a sample workflow of how data flows from the zaloni data platform into ServiceNow and back

Once the provision request is approved by the data owner via ServiceNow, ZDP provisions those data sets into VMs with required analytical tools.

Once the data lease expires and in accordance with data governance rules, ZDP revokes access to the provisioned dataset using security policies, deletes the storage locations (typically S3 or ADLS), and finally destroys provisioned VMs which were launched after the provision requests were approved.

Turn governed self-service into a reality

The Zaloni Data Platform (ZDP) provides unified capabilities to support data management, data governance and self-service data preparation.

Zaloni’s actionable data catalog turns a self-service data into a reality. As a capability of the Zaloni Data Platform, it provides management and governance of data throughout the supply chain from source to consumer resulting in a solid foundation for secure, reliable, analytics-ready data.

--

--