Photo by Emily Morter on Unsplash

Databricks System Tables Overview — the Node Types and Clusters Tables

Rob Wesley

--

Part 3

Note: The views expressed in this blog are my own and do not represent the official stance of Databricks.

What are Databricks System Tables, and how do you use them? I want to provide some answers to these questions in a series of posts. My intention is to cover them in depth, so I’ll be covering one to two tables in each post, providing details on the tables and example queries for these tables. This is part 3 of the series. Part 2 covered the list_prices and usage tables.

These are intended to be introductory-level posts, and I will provide SQL queries for these tables. However, these queries are written to be understandable rather than focused on performance. If you have a lot of experience with SQL, you’ll have different ways of writing the queries and the ability to write much more complex queries against these tables.

Overview

From part 1, here is a recap of the two tables we’re going to look at in this post:

system.compute.node_types — A regional dimension table providing details on available node types and hardware information such as the number of cores and the amount of memory. Join this to the clusters table to flesh out details on your cluster hardware.

system.compute.clusters — A regional Type 2 slow-changing dimension providing a history of changes made in all clusters (all-purpose and jobs) in the region. This does not have information on DBSQL warehouses (Fear not, there are other tables that will provide information on your warehouses). Any questions about when/how a cluster was changed or the current setup of a cluster will use this table to get the answer. For instance, if you want to know how many clusters use a particular init script, or a particular instance pool, this is the table to provide those answers.

Node Types

This link provides Databricks documentation for system.compute.node_types. Let’s run a describe table query to get a list of the table’s columns and datatypes.

describe table system.compute.node_types

Node_types is a pretty straightforward table. Node_type will tie to other table node_type columns (e.g. usage, clusters) and acts as the primary key on node_types. The remaining columns provide the hardware specs for the node type: how many cores, how much memory in megabytes, and how many GPUs.

Node Type Questions

What are some questions that node_types can answer by itself?

Are there any node types that have more than one GPU?

select
node_type,
memory_mb,
core_count,
gpu_count
from
system.compute.node_types
where
gpu_count >= 2
order by
gpu_count desc,
core_count desc

Running in an Azure account will show that there are currently nine node types in Azure Databricks having multiple GPUs, with the Standard_ND96asr_v4 node type having 8 GPUs.

What node type has both the fewest cores and the least amount of memory?

with min_count as (
select
min(core_count) as min_cores,
min(memory_mb) as min_mb
from
system.compute.node_types
)
select
node_type,
m.*
from
system.compute.node_types n
inner join min_count m on n.memory_mb = m.min_mb
and n.core_count = m.min_cores

If you ran the CTE alone in an Azure account, you would see that the minimum number of cores is 4, and the minimum memory is 8192 MB (8 GB). Without doing more querying, you don’t know just from the CTE whether those two values come from the same row(s). Running the full query in an Azure account will show that there are three node types in Azure Databricks each with both 4 cores and 8 GB of memory. So those three node types have both the fewest cores and the least amount of memory. Note there may be other node types that have 4 nodes but more memory (hint: there are) or there might be node types with 8 GB of memory with more than 4 nodes (hint: there aren’t).

Clusters

This link provides Databricks documentation for system.compute.clusters. Let’s run a describe table query to get a list of the table’s columns and datatypes.

describe table system.compute.clusters

Remember that clusters is a regional table and a Type 2 slow-changing dimension table. So, it only has data for clusters (all-purpose and jobs, not warehouses) in the region where a query is executed, and it will return the entire history of each cluster unless a filter is applied to limit it. Change_date and change_time are the columns used to determine history and to find the current row. One thing to know is that this table is not like list_prices in that the current row can’t be found by a null date. We’ll see in the example queries how to find the current row using the row_number function.

Because this table contains a history of changes to a cluster, some columns have the potential for their value to change from row to row for the same cluster_id, whereas others will be constant for the cluster across all of its rows. In the discussion below, to indicate columns whose values can change from row to row for a cluster, I’ll use the phrase “variable by time” to indicate these columns and either “constant” or “fixed” for columns whose values don’t change for a cluster.

You can see from the above that clusters has quite a few columns. As would be expected, account_id is going to have the same value in all rows. Workspace_id is the regional workspace_id for where the cluster exists (or existed if it’s been deleted) and will be constant for a cluster. Cluster_id is fixed for each cluster. Cluster_name is the name for the cluster and is variable by time. Owned_by is the user ID for the cluster owner and is variable by time. Create_time is fixed for a cluster and represents when the cluster was created. Delete_time will be null unless the cluster was deleted. Only one row will have non-null delete_time for a cluster if it was deleted. The node type columns, driver_node_type, and worker_node_type, are the cluster’s node types and are variable by time. Worker_count shows the number of workers for non-autoscale clusters. For single-node clusters, this value is zero. Worker_count is variable by time. For single-node clusters, the driver_node_type and worker_node_type will be the same value. For autoscaling clusters worker_count is null. Min_autoscale_workers and max_autoscale_workers are the min and max worker counts and are variable by time. Auto_termination_minutes is the number of minutes at the time of that row if auto-termination is on and is variable by time. If auto-termination is off auto_termination_minutes will be null (I highly recommend setting an auto-termination time for all-purpose clusters). Enable_elastic_disk is either true or false and is variable by time. Tags is a map of tag strings and is variable by time. See this link for more information on the map datatype. Cluster_source indicates how the cluster was created, it can be JOB, API, or UI. It is constant for a cluster. Init_scripts is an array of strings that list the init scripts for the cluster and is variable by time. The attributes columns (azure_attributes, aws_attributes, gcp_attributes) are any attributes specific to the cloud that the account is on. They are null if not on the representative account type. Otherwise, they are non-null, although fields in the structure may be null and are variable by time. The instance pool ID columns, driver_instance_pool_id and worker_instance_pool_id, are either null or have the instance pool ID for the driver and workers at the time of that row and are variable by time. Dbr_version is the DBR version, variable by time. As mentioned earlier, change_date and change_time are the columns used to determine history and to find the current row.

Clusters Questions

What are the current value records for all of the clusters in the region?

SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY cluster_id
ORDER BY
change_time DESC
) AS row_num
FROM
system.compute.clusters QUALIFY row_num = 1

If you haven’t used the ROW_NUMBER() window function before, see this link for more information. Since we’re partitioning by cluster_id, and ordering by change_time descending, by using the QUALIFY clause, we get the newest (i.e., the current) row for each cluster. Since it’s very helpful to be able to quickly get the current cluster record, rather than always remembering this syntax, we’ll turn this query into a view to use in our queries below.

CREATE CATALOG system_reporting;
CREATE SCHEMA system_reporting.compute;
CREATE OR REPLACE view system_reporting.compute.clusters_current as
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY cluster_id
ORDER BY
change_time DESC
) AS row_num
FROM
system.compute.clusters QUALIFY row_num = 1

If you’ve already got a system_reporting catalog, or a system_reporting.compute schema, you won’t need those two commands. Now that we have system_reporting.compute.clusters_current, we can join to this view instead of always remembering the query. My naming of the view may be cryptic, but I wanted to name it similarly to the base clusters system table to make it easy for people looking at the system_reporting.compute schema to recognize that this view is a form of the system.compute.clusters table. Your naming convention may vary to whatever standards your organization has. It also makes it very easy to change queries that use the base clusters table to use this view.

What clusters currently have autoscaling turned on with the max workers set to a value higher than 50?

select
*
from
system_reporting.compute.clusters_current
where
max_autoscale_workers >= 50
and delete_time is null

This query uses our new view along with the max_autoscale_workers column to filter. Remember that this column will only be non-null if autoscaling is turned on. We also filter out deleted clusters. This will work with this view because only the current record for deleted clusters will have a non-null delete_time. Typically in production queries, it’s not a good idea to use select *. This is done in the example to shorten the length of the query. If you’d like, rewrite this query against system.compute.clusters to see how much using the view cleans up the query. It gets even better with even more complex queries.

What clusters currently don’t have auto-termination enabled or their auto-termination timeout longer than two hours?

SELECT
*
FROM
system_reporting.compute.clusters_current
where
(
auto_termination_minutes is null
or auto_termination_minutes > 120
)
and cluster_name not like 'job-%'
and delete_time is null

This query also uses our view for simpler syntax. It then filters on several things. First it looks for auto_termination_minutes values of null or larger than 120 minutes. Remember, null indicates that it’s not turned on. (The parentheses are important to these filters to get the correct answer). We then filter to not include clusters whose names start with ‘job-’ to filter out job clusters. The reason for this filter is that job clusters have auto termination turned off since job clusters will only run as long as necessary, and if we didn’t filter them out, we’d get thousands of job clusters in the results. And again we filter out deleted clusters.

Note that this query could be set up as an alert (see this link for how to create alerts). Have the alert notification sent to an admin group email address or a slack channel and you’ll quickly know if a user has created a cluster without setting an appropriate auto-termination. You could also set up a policy (highly recommended) to make sure auto-termination is configured correctly.

Give a full history of changes made to cluster currently name ‘xyz’

select c.*
from system.compute.clusters c
inner join system_reporting.compute.clusters_current curr using(cluster_id)
where curr.cluster_name = 'xyz'
order by c.change_date desc, c.change_time desc

Just because a cluster is currently named ‘xyz’ doesn’t mean it’s always had that name. So, we use our view which has the current names in it and join that to the system.compute.clusters table for the complete history of the matching cluster_id, which is a unique value. Ordering it will have the current record as the first row and the oldest change as the last row.

Warehouse Names

In part 1 we included a query to get warehouse names. Let’s use that query but turn it into a view named system_reporting.compute.warehouses so that it’s easier to use.

CREATE OR REPLACE view system_reporting.compute.warehouses as
with data as ( -- get all of the successful creates and edits of warehouses and endpoints
select event_time, request_params.name as warehouse_name, from_json(response ['result'], 'Map<STRING, STRING>') ["id"] as warehouse_id
from system.access.audit
where service_name = 'databrickssql'
and action_name in ('createWarehouse', 'createEndpoint')
and response.status_code = '200'
union
select event_time, request_params.name as warehouse_name, request_params.id as warehouse_id
from system.access.audit
where service_name = 'databrickssql'
and action_name in ('editWarehouse', 'editEndpoint')
and response.status_code = '200'
),
current_data as ( -- get the most recent create or edit of each warehouse or endpoint
select *,
ROW_NUMBER() OVER (
PARTITION BY warehouse_id
ORDER BY
event_time DESC
) AS row_num
from data
qualify row_num = 1
)
select warehouse_id, warehouse_name from current_data

Usage by cluster/warehouse name

Let’s use our clusters_current and warehouses views to break out usage for each cluster or warehouse name

select
'JOBS' as work_type,
u.usage_date,
cc.cluster_id as cluster_or_warehouse_id,
cc.cluster_name as cluster_or_warehouse_name,
sum(u.usage_quantity) as total_dbus,
sum(lp.pricing.default * u.usage_quantity) as list_cost
from
system.billing.usage u
inner join system.billing.list_prices lp on u.cloud = lp.cloud
and u.sku_name = lp.sku_name
and u.usage_start_time >= lp.price_start_time
and (
u.usage_end_time <= lp.price_end_time
or lp.price_end_time is null
)
inner join system_reporting.compute.clusters_current cc
on u.usage_metadata.cluster_id = cc.cluster_id
where
(usage_metadata.job_id is not null or
upper(lp.sku_name) like '%JOBS%')
and usage_metadata.cluster_id is not null
group by
all
union all
select
'ALL PURPOSE' as work_type,
u.usage_date,
cc.cluster_id as cluster_or_warehouse_id,
cc.cluster_name as cluster_or_warehouse_name,
sum(u.usage_quantity) as total_dbus,
sum(lp.pricing.default * u.usage_quantity) as list_cost
from
system.billing.usage u
inner join system.billing.list_prices lp on u.cloud = lp.cloud
and u.sku_name = lp.sku_name
and u.usage_start_time >= lp.price_start_time
and (
u.usage_end_time <= lp.price_end_time
or lp.price_end_time is null
)
inner join system_reporting.compute.clusters_current cc
on u.usage_metadata.cluster_id = cc.cluster_id
where
(usage_metadata.job_id is Null
and usage_metadata.cluster_id is not null)
or upper(lp.sku_name) LIKE '%ALL_PURPOSE%'
group by
all
union all
select
'SQL' as work_type,
u.usage_date,
w.warehouse_id as cluster_or_warehouse_id,
w.warehouse_name as cluster_or_warehouse_name,
sum(u.usage_quantity) as total_dbus,
sum(lp.pricing.default * u.usage_quantity) as list_cost
from
system.billing.usage u
inner join system.billing.list_prices lp on u.cloud = lp.cloud
and u.sku_name = lp.sku_name
and u.usage_start_time >= lp.price_start_time
and (
u.usage_end_time <= lp.price_end_time
or lp.price_end_time is null
)
inner join system_reporting.compute.warehouses w
on u.usage_metadata.warehouse_id = w.warehouse_id
where
(usage_metadata.job_id is Null
and usage_metadata.cluster_id is null
and usage_metadata.warehouse_id is not null)
or upper(lp.sku_name) LIKE '%SQL%'
group by
all

By adding an inner join in each sub-query to the correct view, and adding the cluster or warehouse ID and names into the select column list, we have the usage across our account by cluster and warehouse. Note that this query will return a lot of rows for jobs as each job run will be its own cluster name. It’s likely that you’ll also want to filter on a single workspace to reduce the number of rows returned. Also, remember that clusters is a regional table, not global, and audit is regional for workspace events (which databrickssql events are), so the usage returned by this query will only be for clusters and warehouses in this region.

Up Next

In the following post, I’ll cover the warehouse_events and audit tables in depth. You’ve already seen a little bit of audit to find warehouse names. We’ll dig into it in more depth in the next post. Audit is a very comprehensive table, so we’ll see if I can cover it adequately in one post or if I need to have multiple posts to cover it fully.

--

--