How to structure your Databricks Workspaces?

Leigh Robertson
2 min readAug 8, 2024

--

Photo by shawnanggg on Unsplash

Introduction

Databricks defines a workspace as “an environment for accessing all of your Databricks assets. A workspace organizes objects (notebooks, libraries, dashboards, and experiments) into folders and provides access to data objects and computational resources.” Your first decision, then, is whether to use one or multiple workspaces.

Like any decision, there are pros and cons to each approach, as well as best practice recommendations.

Common Approaches

TLDR Best Practice Recommendation: Use multiple workspaces with infrastructure as code (IaC) like Terraform to build and manage each.

Why: The best data engineering teams adopt software engineering practices, including safeguarding production and separating environments. By creating a separate workspace for each environment, you achieve a clear separation of production data. This provides the most protection and isolation of production, albeit at the expense of additional overhead.

When using multiple workspaces, I highly recommend using IaC. Without it, management at scale becomes difficult, if not impossible. Additionally, having all infrastructure decisions as part of your code ensures higher audibility compared to making changes through the UI.

Alternative Approach: Use one workspace with a specific catalog structure that allows for clean separation of environments.

Why: In some cases, the additional overhead of multiple workspaces may not be necessary. Using a single workspace is simpler to manage. However, the trade-off is an increased likelihood of unintended access to production data. I would still recommend using some form of infrastructure as code, as it’s more traceable and easier to recover from in case something goes wrong.

Conclusion:

At the end of day, when planning your workspace structure, you need to balance simplicity with protection and isolation of environments. If you are a small company with limited resources, it might make sense to use just one workspace. On the other hand, if you are a large organization with complex access requirements, it will almost certainly make sense to use multiple workspaces. Hopefully this article explains the two common approaches, which ever you choose though, I would recommend using Terraform for infrastructure as code. It will make everything more scalable and auditable.

--

--