Streamlining Databricks Catalog Permission Management
The Databricks Unity Catalog is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. With Unity Catalog, organizations can seamlessly govern both structured and unstructured data in any format, as well as machine learning models, notebooks, dashboards and files across any cloud or platform. This unified and open approach to governance promotes interoperability and accelerates data and AI initiatives while simplifying regulatory compliance.
One key benefit of the Unity Catalog is that it allows for fine grained access controls. Being able to navigate and understand these access controls is fundamental to proper data governance and helps to ensure that the right people have access to the right data. While access controls are easy to set up and are viewable within the Databricks UI, there is not a single place to view them all together for a specific group of users. In order to see all the grants for a specific group, you need to query each catalog, schema and table. This blog post provides a way to do it programmatically, both saving time and decreasing the probability of human oversight. In the pursuit of showing access grants correctly, we will create the user groups, catalogs, schemas and tables we want to assess programmatically. In addition to learning how to view catalog permissions, people who follow this post can expect to gain greater understanding in concepts around Databricks access controls and the databricks-sdk in general.
Concepts
To better understand how to achieve programmatic access control, I’ll explain two concepts essential for managing RBACs (Role Based Access Controls) and ACLs (Access Control Lists) in Databricks.
- Workspaces are the individual environments where users can create and manage their notebooks, jobs, and other resources.
- Accounts are the higher-level organizational units that contain multiple workspaces. They provide a way to manage billing, user access, and other administrative tasks across multiple workspaces.
2. Users, Service Principals and Groups:
- Users are individual accounts that can log in to Databricks and perform actions within a workspace.
- Service principals are non-human accounts that can be used to authenticate applications or services to access Databricks resources.
- Groups are collections of users or service principals that can be assigned permissions to resources within a workspace. Groups can be used to simplify permission management by allowing you to assign permissions to a group rather than individual users or service principals.
Databricks recommends using groups to manage permissions because it allows for easier management of access controls. By assigning permissions to groups rather than individual users or service principals, you can simplify the process of managing access to resources and reduce the risk of errors or inconsistencies in permission assignments.
Groups and service principals can be created both at the workspace and account levels but in many workspaces, workspace level groups can’t be given permissions to many resources. So, in this post we will create both a Workspace Client and an Account Client. The Account Client is used to create the groups and the Workspace Client is used to create the catalogs, schemas and tables.
With these two concepts out of the way we’re ready to dive into the code!
Plan of Attack
We can’t query permissions without having any groups or permissions set up in the first place. To that end, summarizing tables and privileges will be the last step in our workflow. At a high level, it will be the last of four steps:
- Create groups
- Create a catalog with a set of schemas and tables
- Assign groups to random sets of schemas and tables
- Query access for a group
Install SDK and configure clients
We’ll use the Databricks SDK for Python to manage our workspace and account. The SDK is a powerful tool that allows you to interact with Databricks resources programmatically.
Using the SDK, we’ll create a workspace client and an account client. A common way to configure the clients to access the correct Databricks resources is to use a .databrickscfg file, which is similar to a .env file. This file contains the necessary credentials and configuration settings to connect to your Databricks workspace and account. You’ll see that I configure an ACCOUNT client and a WORKSPACE client.
1. Create a groups
With the prerequisites out of the way, we’re ready to create some groups. In the cell below, I create two groups using the account client. The one we’re interested in is called `marshall-super-group` but I also created `some-other-group` so that we can set up assets that might have multiple access groups assigned to them.
I’ll note again that groups should be created at the account level so that they can manage permissions.
2. Create a catalog with a set of schemas and tables
In this section, we create a catalog and a set of schemas and tables. The catalog is created using the workspace client, and the schemas and tables are created within that catalog.
There are a decent number of lines below but reading through it should be fairly straightforward.
We have two helper functions:
raise_if_failed: Raise an exception with an error message if a statement we run doesn’t workrecreate_catalog: Create catalog or rebuild it if it already exists
And we have our main function create_catalog_assets. This function creates a catalog, a set of schemas, and runs 15 create table queries across those schemas. The function also supports passing an external storage location if you have that configured though setting it up is outside the scope of this post. In the example, we simply pass None for the external storage location.
3. Assign groups to tables and schemas
Now that we have a way to create catalogs, schemas and tables, we can assign groups to them. Since we’re assigning permissions to many tables, we can create another helper function called grant_table_access that chooses a random access level and assigns it to a table. The statement it runs is simply:
GRANT {privilege} ON TABLE {table_name} TO `{group_name}`;The second change is to decide what tables to grant access to. The below logic to assign access is silly but demonstrates what we’re trying to achieve:
- Generate a random number
- If above X both groups get access
- If below X and above Y, a random group gets access
- If below Y, no access is granted
Lastly, we grant access to a schema to make sure we can show that level of access in the final query. The modified function create_catalog_assets is below.
In fewer words, we now have some schemas, tables and two groups where we have no idea which group has access to which asset.
4. Query access for a group
So far, we have created our groups and randomly assigned them to a bunch of tables and schemas that we also created. We can continue to use the workspace client to query access for our group within a catalog.
Our approach is as follows:
- Find all schemas in our catalog
- Collect schema level permissions (Schema level permissions are inherited by tables within the schema)
- For each schema
- Find all tables in the schema
- Collect table level permissions
- Collect function level permissions (if any exist)
The function get_catalog_permissions takes a group name and a catalog name and returns a list of permissions for that group. We run the function on both of the groups we created earlier and print the results.
Finally, we can run our function and view the output we were looking for.
Cleanup
Mission accomplished! As a final step, we can clean up the catalog we created as well as its underlying assets with a one-liner.
Conclusion and Next Steps
Investigating grants across a Databricks workspace is a common task for platform administrators. While the Databricks UI can be used to view permissions on individual assets, it can be cumbersome to navigate through the UI to find the information you need. By using the Databricks SDK for Python, we can easily automate this process and make it much easier to manage and view our catalog permissions.
Along the way, we talked about some important concepts in Databricks, including workspaces, accounts, users, service principals, and groups. We also discussed the importance of using groups to manage permissions and how to create and assign permissions to groups programmatically.
Investigating governance across the organization is never a one time need. Luckily Databricks has the capabilities to turn our simple script into an end to end audit report. Doing so is outside the scope of this post, but I list the steps below for those who might find it useful.
- Schedule Script: Configure the script to run regularly using Databricks Jobs or Workflows.
- Store output in Delta Table: Ensure the output is stored in a Databricks Delta table for efficient querying and historical tracking.
- Build a Dashboard: Query the data and design dashboard visualizations to display group access and privileges for easy auditing.
- Monitor Regularly: Use the dashboard for ongoing monitoring of permission changes and adherence to policies.
This post touched on only one aspect of data governance. Databricks provides additional recommendations for using Unity Catalog to meet your organization’s data governance needs. I encourage curious readers to visit our documentation and learn about further best practices regarding Unity Catalog.

