Azure Databricks Unity Catalog — Part 2: Get the infra — build UC metastore and initial set up

hwangdb
3 min readMar 12, 2023

--

This is Part 2 of series — Azure Databricks Unity Catalog — up and running; we talk about how to implement the infra required to set up Unity Catalog metastores, all the detailed steps are from documentation, we just list down brief steps in this article.

UC metastore required resources

Step 1: Create an Azure Databricks Access Connector

This resource is a first party service on Azure and the network communication is through Azure backbone network. Note that you can either use system-assigned managed identity (default choice), or user-assigned managed identity to build your connector, see this doc for details.

Create Access Connector on Azure

Step 2: Create an ADLS storage account

Now we create a storage account, this will be the actual storage for your UC metastores. For production we recommend using ZRS for redundancy. Always tick Enable hierarchical namespace to use ADLS gen 2.

ZRS recommended for production

Note that each region can only have 1 UC metastore, and it needs to be configured with the storage account with the same location. In this example I will create a UC Metastore in Southeast Asia, based on Southeast Asia storage account.

Now go to storage account’s IAM, add role assignment: Storage Blob Data Contributor to the managed identity (access connector) you created in step 1. Repeat this step and assign Storage Queue Data Contributor role to the managed identity too; this maps to official doc’s step 2 and 3.

Add storage blob data contributor role to managed identity

Then go to networking, select Enabled from selected virtual networks and IP addresses and provide the resource ID of the access connector, as shown below, we should also add the Databricks’s VNet into selected network (Part 4 talks about networking options in detail):

Now you can create a container in the storage account, this container can be your UC metastore’s root storage location. This ADLS Gen 2 Path will be your metastore’s root location, and will be used to store managed tables’ data files, if the managed table’s parent schema and catalog does not have a default storage location configured.

Step 3: Create the UC metastore on Account Console and assign metastore to workspaces

Fill in required info to create your UC metastore

Note that only 1 metastore per region is allowed and you should think about segregating groups at catalog / schema levels.

Next you just need to assign the metastore to selected workspaces. Now you are ready to build some catalogs / schemas and manage your user groups access in Unity Catalog.

In part 3, we will talk about how to automate the set up of Unity Catalog Metastore.

--

--

hwangdb

To simplify and automate building well architected solutions.