A Comprehensive Guide to Connecting Power BI with Databricks

Pubudu Dewagama
Tributary Data
Published in
6 min readJan 17, 2024
Photo by Waldemar on Unsplash

In today’s data-driven landscape, businesses seek powerful solutions to harness the full potential of their data. This post aims to guide you through the process of connecting Power BI, a leading business intelligence tool, with Databricks, a robust analytics platform. By seamlessly integrating these two technologies, users can unlock a myriad of advantages.

Advantages of Connecting Power BI with Databricks:

  1. Unified Data Processing and Visualization: Combine the powerful data processing capabilities of Databricks with the intuitive visualization tools of Power BI for a unified end-to-end analytics solution.
  2. Real-time Data Exploration: Leverage the real-time data processing capabilities of Databricks to enable Power BI to visualize and analyze the most up-to-date information.
  3. Scalability and Performance: Databricks, built on Apache Spark, offers scalable and high-performance data processing, ensuring that Power BI can handle large datasets efficiently.
  4. Advanced Data Transformations: Utilize the data transformation capabilities of Power BI in tandem with Databricks’ data wrangling capabilities, allowing for advanced and seamless data preparation.
  5. Collaboration and Data Sharing: Foster collaboration among data teams by integrating Databricks notebooks and Power BI reports, enabling the sharing of insights and analyses in a collaborative environment.
  6. Enhanced Predictive Analytics: Databricks’ machine learning capabilities can be seamlessly integrated with Power BI, enhancing predictive analytics and enabling the creation of more sophisticated data models.
  7. Cost-Effective and Cloud-Ready: Leverage the cloud-ready architecture of both Power BI and Databricks for a cost-effective and scalable analytics solution suitable for cloud platforms like Azure and AWS.

2. Prerequisites

Before embarking on the journey to connect Power BI with Databricks, it’s crucial to ensure that you have the following prerequisites in place:

Power BI Desktop: Ensure that you have installed Power BI Desktop. If you don’t have one, you can download it from the Microsoft site.

Download Link:https://powerbi.microsoft.com/en-us/downloads/

Databricks Workspace Access: Access to a Databricks workspace is essential. If you don’t have a Databricks account, sign up for one and create a Databricks workspace.

3. Setting up Databricks Community Edition

Databricks Community Edition provides a free, cloud-based environment for learning and experimenting with Databricks capabilities. Follow these steps to create your Databricks Community Edition account and set up a workspace:

  1. Go to the Databricks Community Edition website: Open your web browser and navigate to the Databricks Community Edition website.
  2. Sign Up for an Account: Click on the “Sign Up” button to create a new account. You may need to provide your email address and create a password.
  3. Verification Email: Check your email inbox for a verification email from Databricks. Click on the verification link to confirm your account.
  4. Create a New Workspace: Once your account is verified, log in to the Databricks Community Edition platform.
  5. Access the Databricks Workspace: After logging in, you’ll be directed to the Databricks workspace. Here, you can create and manage clusters, notebooks, and libraries.
  6. Create a Cluster: To get started, create a cluster. A cluster is a set of computation resources that you can use to run notebooks and jobs.
  • Click on the “Clusters” tab in the sidebar.
  • Click the “Create Cluster” button.
  • Provide a cluster name and configure the cluster settings based on your requirements.
  • Click the “Create Cluster” button.

7. Create a notebook: Next, create a notebook where you can run Spark code.

  • Click on the “Workspace” tab in the sidebar.
  • Click the “Create” button and select “Notebook.”
  • Choose the default language (Scala, Python, or SQL) for your notebook.
  • Provide a name for the notebook, and select the cluster you created.
  • Click the “Create” button.

8. Explore the Databricks Workspace: Take some time to explore the Databricks workspace. Familiarize yourself with the different tabs, such as “Clusters,” “Workspace,” and “Data.”

4. Create a Delta table with sample data.

Databricks File System (DBFS) is a distributed file system that allows you to store data persistently in Databricks. Follow these steps to create a database and delta table in the Databricks environment.

  • Open a Python notebook and create a database called ’landing’.
spark.sql("create database if not exists landing")

then you can see the database as below in the Databricks Catelog section.

Then you can create Delta table with sample data by using Dataframe.

student_data = [(1,"Pubudu",10,10000),
(2,"Suranga",20,20000),
(3,"Dewagama",30,30000)
]

student_schema = ["studentId","StudentName","DeptNo","Salary"]

df = spark.createDataFrame(data=student_data, schema = student_schema)

Table creation script:

df.write.format("delta").saveAsTable("landing.Student_Table")

You can view the sample data by executing the below query.

%sql
select * from landing.Student_Table

5. Connecting Databricks to Power BI

  • Open Power BI Desktop:
  • Launch Power BI Desktop on your local machine.
  • In Power BI Desktop, click on “Get Data” from the Home tab.
  • Choose Databricks:In the “Get Data” window, select "More" and find “Databricks” in the list of available connectors.
  • Enter Connection Details: In the Databricks connector window, enter the following details:
We can get those from Databricks Cluster configuration section

Server URL: The URL of your Databricks workspace.

HTTP Path: Leave it blank unless you have a specific HTTP path to your workspace.

Coonection details for Power BI Desktop.

Then you have to enter the credentials for the Databricks environment.

  • After configuring the connection, click “Connect” to establish the connection to your Databricks workspace.
  • Choose the desired tables, views, or write custom queries to load data into Power BI.

6. Creating Visualizations

Click on the “Report” view in Power BI to start building visualizations.

In the Fields pane on the right, you’ll see the tables and fields imported from Databricks.

Drag and drop the desired fields onto the report canvas to start creating visualizations.

7. Refreshing Data

If we change the data in the Delta table, we have to refresh the dataset in PowerBI Desktop. In order to refresh, you can click on the Refresh button in the Home Menu.

8. Conclusion

In this comprehensive guide, we’ve explored the seamless integration of Power BI with Databricks, combining the robust data processing capabilities of Databricks with the intuitive visualization tools of Power BI. By following the step-by-step instructions, users can harness the power of these two platforms to unlock valuable insights from their data.

As you embark on your journey with Power BI and Databricks integration, consider exploring additional features, such as scheduled data refresh, data governance best practices, and optimization techniques. Stay connected with the vibrant communities around both platforms for ongoing learning and troubleshooting.

Remember, the true power of this integration lies in your ability to adapt and tailor it to your specific business needs. Whether you’re creating financial dashboards, predictive analytics models, or exploring trends in your data, the combined force of Power BI and Databricks empowers you to turn your data into actionable insights.

As technology continues to evolve, staying informed about updates and new features in both Power BI and Databricks will be essential. The integration between these platforms is a journey, not a destination, and your commitment to learning and adapting will ensure you stay at the forefront of data analytics capabilities.

Here’s to the journey of turning raw data into meaningful stories, driving informed decisions, and unlocking the true potential of your data with Power BI and Databricks. Happy analyzing!

--

--