Monitoring Mount Point Health in Databricks

Published in

datalex

2 min readAug 9, 2023

In Databricks, a mount point refers to a user-defined mapping of an external data source, typically located in cloud-based storage systems like Amazon S3, Azure Data Lake Storage, or Google Cloud Storage, to a virtual directory within the Databricks file system.

This mapping allows users to access and manipulate data stored externally as if it were stored within the Databricks environment itself.

Data Engineers employ mount points in diverse job processes to streamline access to cloud-based storage. You can display information about what is mounted within DBFS with the following command:

dbutils.fs.mounts()

Prior to accessing any mount point, it’s crucial to initiate its setup. This ensures the ability to read or write desired data seamlessly.

Nevertheless, there could be situations where a job starts before the correct setup is in place or due to incorrect naming. In such cases, attempting to read will result in a “java.io.FileNotFoundException: Operation failed: ‘The specified path does not exist.’ , 404, GET”.

On the other hand, if you attempt to write to it, you won’t receive an error. Instead, your job will be written to a folder within the Databricks file system (DBFS) rather than the intended cloud-based storage location, without any explicit warning.

If it’s happen you can see the difference when you’re using DBFS browser, as follow:

The directory named “bronze_ds_plateform” is merely a folder residing within the DBFS, not a mount point.

Errors can occur; hence, it holds significance to proactively assess the health of mount points to anticipate such issues at the earliest opportunity.

To achieve this, execute the provided code to ensure that the ‘unlinkedEndpoint’ remains consistently empty:

unlinkedEndpoint = []
for endpoint in dbutils.fs.ls('/mnt'):
    if any(m.mountPoint == f"/mnt/{endpoint.name.replace('/', '')}" for m in dbutils.fs.mounts()):
        print(f"/mnt/{endpoint.name.replace('/', '')} match.")
    else:
        print(f"/mnt/{endpoint.name.replace('/', '')} does't match.")
        unlinkedEndpoint.append(f"/mnt/{endpoint.name.replace('/', '')}")

If you want to know how to connect Azure storage to Databricks, check my dedicated article.

Mounted data does not work with Unity Catalog, and Databricks recommends migrating away from using mounts and managing data governance with Unity Catalog. Volumes in Databricks Unity Catalog are in public preview.

Resources:

Monitoring Mount Point Health in Databricks

Written by Alexandre Bergere