Providing Fine-Grained Access Control (FGAC) for single-user clusters

Data filtering for everyone

Michele Lamarca
databricks-unity-catalog-sme
3 min readSep 6, 2024

--

TL;DR

Databricks users have multiple needs. There is a tension between the right compute for your job and the data access control. With the latest Public Preview, customers with single-user clusters will be able to combine the benefits of single-user compute with Fine-Grained Access Control (FGAC).

What is FGAC?

When we talk about access control, we usually imply coarse-grained access control: providing access at the catalog, schema, and table levels.

In certain scenarios, you want to restrict access only to select columns or rows. This is called Fine-Grained Access Control. You can implement such controls by different means: views (standard, materialized, dynamic), row filters, and column masks.

Data filtering enables FGAC for single-user compute for:

  • Views built over tables that the user does not have the SELECT privilege on
  • Dynamic views
  • Tables with row filters or column masks defined
  • Materialized views and streaming tables

When and how to enable

In the context of all-purpose and jobs compute, Databricks offers two different access modes for Unity Catalog compute: single-user and shared.

The recommendation is that you use shared clusters for all your workloads. Reality is more nuanced: shared clusters provide data isolation among users, but there are better choices if you need to code in R, or use RDDs, or Databricks Runtime ML.

To enable FGAC you need to enable serverless compute and use Databricks Runtime 15.4 LTS or newer.

How does it work

When utilizing a single-user cluster, your queries take two different paths, depending on which level of access control is applied. As you can see from the image below, the green arrows represent coarse-grained access control, while the orange arrows indicate fine-grained access control.

When using a single-user cluster, you have dedicated compute resources and Spark can only fetch the entire table without enforcing FGAC. To enforce FGAC, you need to decouple the client from the compute which fetches the data. In fact, with FGAC, your cluster uses a (serverless) “Filtering Service” that computes the relevant dynamic view (or filter, mask).

Suppose you have the following rights:

  • SELECT on table_1
  • SELECT on view_2 (not on table_2)
  • SELECT on table_w_rls

When executing the first query, your single-user compute will handle the access to the table, while the serverless compute will handle the data filtering for the other 2 cases.

You will incur additional costs for serverless compute. In order to use FGAC, you need to enable serverless compute and use Databricks Runtime 15.4 LTS.

How to verify it’s working

If you want to explore the feature, you can use the “Table ACL & Row and Column Level Security With Unity Catalog” demo from the Databricks Demo Center. The notebook with row and column-level access control contains relevant examples. Keep in mind that its notebook mentions to use “Shared” access mode: to test the new feature, you should select “Single user” access mode.

Suppose you have a row filter on dbdemos.uc_acl.customers. You can already verify whether your query will work (with filtering) or fail.

You ask Spark to show the physical plan for a query with:

EXPLAIN SELECT DISTINCT(country) FROM customers;

When using DBR version 15.3 or older, Spark does not leverage the filtering service, so you’ll get an error message like the following:

Failed to acquire a SAS token for list on /metastore/[…]/tables/[…]/_delta_log
due to java.util.concurrent.ExecutionException:
com.databricks.sql.managedcatalog.UnityCatalogServiceException:
[RequestId=fb10520a-c43b-4b94-be89–9fa986a6138a
ErrorClass=INVALID_PARAMETER_VALUE.ROW_COLUMN_ACCESS_POLICIES_NOT_SUPPORTED
_ON_ASSIGNED_CLUSTERS] Query on table dbdemos.uc_acl.customers with row filter
or column mask not supported on assigned clusters.

while with the correct configuration:

== Physical Plan == *(1) Project [country#782] +- RemoteSparkConnectScan
`dbdemos`.`uc_acl`.`customers`[country#782] class
com.databricks.sql.remotefiltering.SparkConnectScan RuntimeFilters: []

the RemoteSparkConnectScan will indicate the need for data filtering service. You can find more detailed information by looking at the “SQL/Dataframe” tab in the Spark UI.

Conclusion

Customers who need to use the Single User access mode (e.g., users of Distributed ML libraries, GPUs, RDD-heavy libraries, language R, DCS, etc.) can also want to take advantage of Unity Catalog’s fine-grained access control features (views, row filters, column masks) as well as Materialized Views and Streaming Tables.

NB: The views/opinions expressed in the blog are my own and do not necessarily represent the views/opinions of Databricks.

--

--