Protect your data with Microsoft Purview, achieving Azure data governance and enterprise risk compliance.

The components of Microsoft Purview and how it can securely manage your data initiatives.

Daniel Paes
Tech x Talent
9 min readOct 2, 2023

--

Photo by Sajad Nori on Unsplash

Using data to have competitive differences has become crucial in some business domains, and thanks to technology advancements, it is now possible to get data from sources and use formats that weren’t around a couple of years ago. However, with these new ingestions, there is a need for updated procedures and tools to achieve Data governance. This publication will explore how Azure’s Purview allows your team to achieve robust data governance in your organization. We will start by checking some of its most used components and understanding the core concepts and tool functionalities, such as data maps and how it composes the data catalog component. Once we get that covered, I’d like to present some valuable considerations to be taken alongside good practices when managing your Microsoft purview environment. Let’s start by revisiting the Microsoft Purview solutions and their components.

Overlook on Microsoft Purview solutions and its components.

The Microsoft purview suite counts with two solutions, targeting different use cases; the unified data solutions and the risk governance solutions. And while the unified data solutions aim to facilitate the onboarding and exploration of new data sources whilst updating the existing ones. The risk governance solutions count with out-of-the-box classifiers that prevent data from being wrongly shared. However, it also allows you to create custom classifiers to fit your business data compliance better. Both services, alongside the Synapses analytics, build up a solid tool shed when designing and developing your data governance implementation. Let’s start by checking the components of the unified data solutions options under the Microsoft Purview governance portal.

Microsoft Purview unified data solution governance portal.

Unified data solutions are data services that help data producers, data consumers, and data officers to know which data assets exist in the organization. It is based on the open-source project Apache Atlas being accessible by the user interface from the purview governance portal or over the REST APIs available for Apache Atlas. It also allows the sharing of data securely in an optimized way, needing only to configure workflows to guardrail the approbation pipeline when sharing data. The schema below represents how the Microsoft purview governance portal is structured.

Microsoft purview governance portal

The Microsoft purview unified data solution components bring valuable outcomes when it comes to:

  • Data classification and labeling
  • Business Glossary
  • Lineage Extraction

Let’s start by understanding what a data map is, and its crucial role on Microsoft purview.

What is a Data map?

The data map is the building block for unified data governance solutions. It scans the metadata from different sources, such as operational systems from on-premises or another cloud provider, for organizations with multi-cloud environments.

Each data map starts with one capacity unit, which is auto-scaled to fit the necessary compute, having a base charge of 1 capacity unity. Each capacity unity allows 10 GB of metadata storage and 25 operations per second. The logical aggregation of different data maps is called collections. Each Data map has operations and capacity unity.

Below is an example of a data map with corsoPurview as root collection while having Outreach, Development, Sales, Marketing, Finance, Azure data lake, and powerBI datasets as leaf collections.

Different authentication methods are required when configuring each data source, Microsoft purview counts with the following ones:

  • Managed Identity
  • Service Principal
  • SQL Authentication
  • Windows Authentication
  • Role ARN
  • Delegated Authentication
  • Consumer Key
  • Account Key or Basic Authentication

Now that we’re more clear about what a data map is, we can check on the other components. Starting by the Data catalog.

Data catalog

The data catalog in Purview was defined at its core to work as a centralized source of truth. Microsoft purview data catalog allows the onboarding of new data assets by creating a Data map. The designed collections will be shown as data assets. The data assets and the business rules, called workflows, allow users to curate and share the data securely, enabling the self-service consumption of your data assets. Workflow contains connectors that orchestrate the necessary actions to validate the data asset. Each data collection will have workflows attached to it, and the parent workflow will be taken in case its leaf data map doesn't have one in place. It is advantageous when defining enterprise-conformed self-service data access default workflows. The collections are hierarchically structured; in this sense, all configuration and workflow request pipelines on the root or higher hierarchical level are also accessible by their children.

The data discovery capability of scanning file servers and databases, on-premises and in the cloud, is very handful. Microsoft purview data discovery allows data to be reviewed on different network topologies, such as found on-prem, alongside being able to run under Vnet subnets for an extra layer of security.

Data Policy

The data policy option becomes available by enabling the Policy enforcement on the Microsoft Purview admin console. The data policy statement lists allowed access actions per data resource per subject, allowing hierarchical enforcement of policies, making it possible to deny data services on a higher level with granular permission to its data map collection leaves.

Data estate insights

Microsoft purview gives a bird’s-eye view of your data assets. The data estate insights is a robust data stewardship tool that allows data stakeholders to have a unified view that facilitates the data management, compliance, and usage by its data stewards. It provides valuable data about governance gaps by highlighting them in its top metrics.

Best practices when using Data governance solutions in Microsoft's Purview

Now that we have covered each listed component, let's revisit some hints when using and configuring your data governance solutions in Microsoft purview.

Your data stewards can be notified of real-time changes in the Microsoft purview environment and take immediate action. Thanks to Kafka notifications and Azure event hub. The interaction is done using the Microsoft purview account. By this writing, the Kafka topic created by Microsoft purview can have the following statuses.

ENTITY_CREATE            : create an entity. 
ENTITY_FULL_UPDATE : update an entity.
ENTITY_PARTIAL_UPDATE : update specific attributes of an entity.
ENTITY_DELETE : delete an entity.
ENTITY_CREATE_V2 : create an entity.
ENTITY_FULL_UPDATE_V2 : update an entity.
ENTITY_PARTIAL_UPDATE_V2 : update specific attributes of an entity.
ENTITY_DELETE_V2 : delete one or more entities.

Managed identities are the preferred authentication method when interacting with Purview. Extra layers of security can be achieved by using private endpoints configured on Azure-secured subnets or VNETS. It allows secure endpoints to be used by Purview to access its portal, integrate with Azure data services, and for Purview account management.

By designing the scope of your scan, your team will lower data leaks due to broad read permissions. It is good to note that the account used when exploring the data sources must have Data Reader access authorization configured.

Implementing the least privilege principle while creating the Data policy is also a good practice. It is an excellent practice to have different access policies to handle your user principals and groups, another to take the self-service workflow, and one last to manage your DevOps pipelines. Then, creating custom policies per organization requirements narrows your attack surface due to credential loss. More details about best practices while building and maintaining Azure environments can be found on Microsoft Azure Well-architected Framework.

Now that we have covered the components of data governance of Microsoft purview solutions. Let's discuss some positive and negative aspects of using Microsoft purview solutions for data governance.

The Good

The onboarding of new data sources and the capability of grouping them into logical components, called collections, are very useful. Allowing to expand not only on data center servers where the data resides but also per department, grouping all data services they use.

Also, the lineage is not limited to database entities such as tables and views, but it also allows you to integrate temporary data containers created by your ingestion code. Microsoft purview enables the exploration of temporary data containers used by Azure Data Factory, Synapse SQL entities, and data visualization like PowerBI and Looker. Allowing the cataloging of sub artifacts such as datasets and tables in addition to Azure Data Factory and Azure Synapse Spark jobs, Microsoft purview can also scan single CSV zipped files in GZIP format.

The support for stored procedures in Snowflake and code stored in Azure Databricks is also a relief for some departments where most of their analytics runs. It is also comforting for some users to support modeling tools such as Erwin, adding support for legacy systems that still count on it. It also supports other cloud provider services, such as Google cloud storage and Amazon S3.

The Bad

Microsoft purview is a solid data governance platform. However, there are some flaws like any other tool. One bad aspect is the regional limitation of the Azure purview account coverage when scanning and onboarding new data sources. While it helps secure the data, it should allow managing multiple regions on a centralized platform. Creating multiple Microsoft purview accounts, one per Azure region, where your data resides, is suggested as security best practice.

It is good to remember that additional accounts will lead to higher costs as the computer cannot be shared. The compute configuration is settled during the purview account creation alongside the region the report will cover.

Another down factor is that sharing the compute used by any Azure Data Factory self-hosted integration runtime with Microsoft Purview self-hosted integration runtime is impossible.

The Ugly​

Unfortunately, the support for MDM doesn’t come as a feature of Microsoft Purview by the time of this writing. To achieve Master Data Management capabilities, a partner solution developed by Profisee is needed. When used in conjunction, the Microsoft Purview adds up to the MDM functionality, with the cost of adding an extra service that needs to be managed in addition to your Microsoft Purview environment.

While using Microsoft Purview, there are some valuable hints that we list. Check the best practices when using data governance solutions under Microsoft's Purview.

Now let’s check on what Microsoft Purview Risk and Compliance solutions are.

Microsoft Purview Risk and Compliance solutions

Microsoft Purview compliance solutions have out-of-the-box classifiers that manage insider risks, information protection, and data lifecycle based on resources tags for conformity reasons. By adding rules, it implements data loss protection by requesting confirmation on some actions, such as confirming sending an email with data objects classified as confidential. It easily integrates with your organization's Azure Active Directory. Support for Azure AD administrative units is attained by using adaptive scopes.

Microsoft purview Risk and Compliance solutions enforce guardrails to your data assets, and thanks to the Microsoft purview auditing solutions, your environment becomes more responsive to security events. The auditing solutions provide valuable insights also for root cause analysis of post-mortem events. In addition to the following

  • Communication compliance
  • Insider risk management
  • Information Barriers
  • Privileged access management
  • Data classification
  • Sensitivity labels
  • Data lifecycle retention policies
  • Data archival and deletion lifecycle policies
  • Industry-specific guidance on security and compliance for Financial and Energy services

Now that we have both significant domains that Microsoft Purview aims to solve. Let's sum up what we covered in this publication.

Conclusion

Microsoft purview solutions are a solid choice to have compliance and risk guardrails while providing superior support when managing and onboarding new data into your data governance and observability environments. Based on Apache Atlas, management, and compliance for Hadoop environments, it allows prescriptive and forensic models to operationalize your data sources not limited to Hadoop.

See you folks next time!

--

--

Daniel Paes
Tech x Talent

Data-focused professional with an interest in AI for cognitive enhancement. Evangelist on the awareness of the risks about security and privacy on our data.