Integrating Azure Databricks with Microsoft Fabric OneLake

Dhanasri Mahendramani
BI3 Technologies
Published in
4 min readApr 25, 2024

Introduction:

This guide explores a powerful integration between two industry leaders: Azure Databricks and Microsoft Fabric OneLake. This strategic allows you to harness the strengths of both platforms, facilitating smooth data processing, sophisticated analytics, and extensive business intelligence.

By adhering to this guide, you will be equipped to:

  • Read data directly from OneLake: Access and process data stored within your OneLake for advanced analytics using Azure Databricks notebooks.
  • Write data back to OneLake: Prepare and transform data in Databricks notebooks and seamlessly write the results back to your OneLake storage for centralized management.
  • Maintain a secure and centralized data environment: Leverage Azure Databricks and Microsoft Fabric within the Azure cloud, ensuring robust security and simplified data governance.

Necessary Resources:

  • Azure Databricks workspace (Premium): You’ll need an active Premium Azure Databricks workspace to run your data processing and analytics tasks.
  • Microsoft Fabric workspace with a configured OneLake: Ensure you have a Microsoft Fabric workspace with a configured OneLake for data storage and management.
  • Basic understanding of Azure Databricks notebooks: Familiarity with creating and using notebooks within Databricks is helpful for writing code to interact with data.

Step 1:

Create a Databricks Cluster:

Open your Databricks workspace and navigate to the Compute pane.

  • Click Create Cluster to initiate cluster creation.
  • Configure your cluster with desired settings like worker types, number of nodes, and runtime version.
  • In the Advanced Options, enable Azure Data Lake Storage (ADLS) credential passthrough. This is crucial for authentication to OneLake using your Microsoft Entra identity.
  • Create the cluster with your chosen configuration.

Step 2:

Open a Notebook and Connect to the Cluster:

  • Once the cluster is created, launch a new notebook.
  • In the notebook, use the Cluster menu to select the cluster you just created. This establishes the connection between your notebook and the processing power of the cluster.

Step 3:

Authoring Your Workspace:

Navigate to your Fabric Lakehouse within the Microsoft Fabric console.

Locate the ABFS path for your One Lake storage. You can usually find this path in the properties section of your Lakehouse.

The ABFS path typically follows the format: abfss://myWorkspace@onelake.dfs.fabric.microsoft.com/ (replace "my Workspace" with your actual workspace name).

Read Data from OneLake:

Within your Databricks notebook, use Spark functions to read data from your One Lake storage:

Loading from your OneLake storage: Utilize the ABFS path you obtained earlier.

Here’s an example using the spark.read function:

data = spark.read.format('csv').option('header','true').load("abfss://myWorkspace@onelake.dfs.fabric.microsoft.com/path/to/your/data")

Replace "path/to/your/data" with the actual location of your data within your One Lake storage.

Step 4:

Write Data to OneLake:

Utilize Spark functions like write.format("csv") to save your prepared data to your OneLake storage location.

Here’s an example:

data.write.format("delta").save("abfss://myWorkspace@onelake.dfs.fabric.microsoft.com/path/to/your/data")

# Replace 'path/to/your/data' with the desired location in your OneLake

Remember to replace "my Workspace" with your actual workspace name and "path/to/your/data" with the specific folder path within your OneLake storage where you want to save the data.

Additional aspects:

Overwriting vs. Appending:

By default, the write.format("delta") function overwrites existing data in the specified location. If you want to append new data to an existing Delta table, use the mode="append" option:

data.write.format("delta").mode("append").save("...")

Partitioning Data:

Delta tables support partitioning data for efficient querying. You can specify partition columns while writing using the partitionBy option:

data.write.format("delta").partitionBy("year", "month").save("...")

Conclusion:

Now you can seamlessly access, analyze, and manage your data directly within Databricks notebooks unlocking valuable insights and driving informed decisions within a secure Azure environment.

For a concise overview, refer to the Microsoft documentation.

Integrate OneLake with Azure Databricks: https://learn.microsoft.com/en-us/fabric/onelake/onelake-azure-databricks

About Us

Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe, and we’re quickly building a brand that is well-known for superior delivery.

Website: https://bi3technologies.com/

Follow us on,
LinkedIn: https://www.linkedin.com/company/bi3technologies
Instagram:
https://www.instagram.com/bi3technologies/
Twitter:
https://twitter.com/Bi3Technologies

--

--