Portable Databricks: How to migrate Databricks from one cloud to another 🚢
Imagine your delight; you have just finished deploying Databricks for your large organization, onto your chosen Cloud Service Provider (AWS, Azure, or GCP), and now you are getting good value out of the Data Intelligence Platform. Everything is awesome!🚀
However, what happens if your chosen Cloud Service Provider (CSP) is not so good to you anymore? What if your new CEO/CTO wants to use a different CSP for their new multi-cloud strategy? What happens if you want/need to utilize multi-cloud and run Databricks on the 3 main CSPs with full Disaster Recovery and High Availability?
This is where the Databricks Terraform Resource Exporter (still an experimental tool mind you) comes in to help!
So what is the Databricks Resource Exporter?🤔
From the web page Experimental resource exporter | Guides | databricks/databricks | Terraform | Terraform Registry : (latest release at time of writing is 1.42.0)
Generates
*.tf
files for Databricks resources together withimport.sh
that is used to import objects into the Terraform state. Available as part of provider binary. The only way to authenticate is through environment variables. It's best used when you need to export Terraform configuration for an existing Databricks workspace quickly. After generating the configuration, we strongly recommend manually reviewing all created files.
If you’re a more visual learner, then here’s an overview diagram to help explain:
In a nutshell, the Databricks Terraform Exporter will:
[1] Discover all existing Databricks resources within your existing Databricks workspace using Terraform. Here’s an example screenshot of the tool in action:
[2] Create HashiCorp Configuration Language (HCL) files for each resource type that mirror your existing configurations e.g. access.tf, compute.tf, etc.
[3] The exported files will need to be manually adjusted for the underlying differences from one CSP to another. Here’s my vscode after running the exporter:
Editing of the TF files will be required because a compute cluster type on Azure will be named differently on GCP/AWS, or region specific feature availability will need to be addressed i.e. SQL Serverless not available in all regions even in the same CSP
[4] The modified .TF can then be applied to the new CSP to recreate all the Databricks resources using the autogenerated import.sh file.
NB: The import.sh script just does a lot of terraform import commands so you don’t have to!
[5] Repeat Steps [1] to [4] for all remaining Databricks workspaces
You can read even more about the Databricks Terraform provider here: GitHub — databricks/terraform-provider-databricks: Databricks Terraform Provider
I’ve only briefly scratched the surface of what this tool can deliver! If you would like to engage with Mphasis Datalytyx to help perform Databricks to Databricks migrations or get better usage of Terraform in your environments, then please reach out to me!🎉
Special shout out to the NextGenLakehouse Youtube channel for making some excellent videos that go in-depth into using Databricks with Terraform especially this video: (119) How to setup Databricks Unity Catalog with Terraform — YouTube — Thanks NGL! 🤗
Please note the opinions above are the author’s own and not necessarily my current employer’s opinion. This blog article is intended to generate discussion and dialogue with the audience. If I have inadvertently hurt your feelings in anyway, then I’m sorry.