How to refresh a firewall enabled Azure Data Lake Storage gen2 data source from Power BI Service?

Moumi Panja
Microsoft Azure
Published in
7 min readDec 12, 2020

Have you ever tried to refresh a dataset in Power BI Service which is connected to an Azure Data Lake Storage (ADLS) gen2? Probably, yes. When you publish your report from Power BI Desktop to Service, usually you provide the data source credentials again in the Power BI Service to configure on-demand or scheduled refresh.

What if your ADLS gen2 data store resides behind the firewall? When you try to refresh such a dataset from Power BI Service, it will prompt you an error. In this case, providing just the data source credentials in Power BI Service will not work. You will need to configure a gateway and few additional steps to establish a connection between your storage account and Power BI even though both are in the cloud.

Prerequisites:

  • Advanced knowledge of Power BI
  • Basic understanding of ADLS gen2
  • Access to Azure portal and permission to create resources
  • (optional) Access to an existing ADLS gen2 resource protected by firewall
  • (optional) A Power BI report published in the Service based on ADLS gen2 connection

This articles covers the steps you will need to follow to establish a successful connection between your Power BI service and firewall-enabled ADLS gen2 resource.

STEP 1 (Optional): If you do not have an ADLS gen2 resource available, you can create one using the Azure portal. You can find more information here — LINK.

Once it is created, please visit your Azure portal and check the Networking details of the ADLS gen2 resource as shown in the image. I have an ADLS gen2 resource already created named as adlsgen2mp in the North Central US region. As you can see from the image above, my ADLS gen2 allows only selected networks to connect to it.

If you scroll down a little bit in the same Networking configuration page of your storage account, you will find an option to “Allow trusted Microsoft services to access this storage account”.

By enabling this option, you can grant a subset of Microsoft services access to the storage account, while maintaining network rules for other apps. You can find more information here — LINK. However, Power BI is not included in the trusted Microsoft services list yet. Hence, to connect Power BI to your firewall enabled ADLS gen2 resource, you will need to follow some steps discussed later in this article.

Next, if you have not already published a report connected to your ADLS gen2 from Power BI Desktop to Service, you can create a report by following the steps here — LINK.

A quick note here: because your ADLS gen2 is secured by firewall, to access your data from Power BI desktop you will need to add your client IP address in the ADLS gen2 Networking page under the Firewall section .

STEP 2: Let’s see what are different resources you will need to create. As I mentioned before, my ADLS gen2 account named adlsgen2mp sits behind the firewall in the North Central US region. I have a csv file inside a private container named “data” with no anonymous access. I have created a Resource Group named Experiment2 where my ADLS gen2 resource resides. I would be using the same resource group to create the other necessary Azure resources.

First, create a Virtual Network (VNet) resource in the same region as your ADLS gen2. You can find more information here — LINK.

I used the Azure portal to create a VNet resource without changing anything in the default configuration in the same region as my ADLS gen2 i.e. North Central US region.

Second, create a private endpoint resource. You can find more information here — LINK.

Give the private endpoint a name, choose the same region as your ADLS gen2, next in the Resource page settings select Microsoft.Storage/storageAccounts as Resource Type, select the ADLS gen2 account for which you want to create the private endpoint as Resource and then set the Target sub-resource as dfs.

Next, in the Configuration page choose the VNet which you created earlier. You can keep subnet selection as default. Keep the rest of the settings as it is and create the resource.

Third, you need to create a Virtual Machine (VM) which will be used to set up the Power BI gateway. Again, I used the Azure portal to create a Windows Server 2019 Datacenter — Gen 1 in the North Central US region. I selected the inbound ports as RDP.

Next, in the Networking configurations page, select the vnet which you created earlier and also the default subnet. I kept the rest of the settings as it is and created the resource.

All of the three necessary resources are created now. If you check your resource group, you would be able to see the list of different resources which were created as part of the process.

STEP 3: Next you will need to set up the gateway VM for Power BI. Access the VM resource which you just created from the Azure portal. In the Overview tab, click on Connect and then download the RDP file.

Once downloaded, double click on the RDP file to connect to your VM, provide the credentials which you created earlier at the time of the VM creation process. You can find more information here — LINK.

Once you access your VM, open a browser there and visit the Power BI website (link given below) to download the Power BI gateway installation file: https://powerbi.microsoft.com/en-us/gateway/ . Select the Download Standard Mode option on the page. Once downloaded, double click on the .exe file, accept the terms of use and run the installation.

It would take few minutes to complete the installation and then prompt you to provide the email address which you would want to use with this gateway. Use the same email address which you use to access Power BI Service and sign in. I registered this gateway as a new gateway on the VM.

In the next dialogue box, give the gateway a name and set up your recovery key. The most important step here is to ensure that you are selecting the same region as your Power BI tenant (see image below). Finish the set up and your gateway will be ready to use.

Note: To find the region where your Power BI tenant is located, login to your Power BI Service > Help & Support > About Power BI. You will see a section as “Your data is stored in” where you will find the region name. You can find more information here — LINK.

STEP 4: Now we will configure the gateway in Power BI Service. Once your gateway VM is setup and ready, login to Power BI Service. Open Settings > Manage Gateways page. First select the name of the gateway which you just created and then click on ADD DATA SOURCE from the top.

Give the data source a name, select the data source type as Azure Data Lake Storage Gen2, select the Authentication method as Key and paste the access key from ADLS gen2.

Additionally, provide the server details as below:

<datastoragename>.dfs.core.windows.net

Provide the path as below:

/<filesystemname>/<subfolder>/

Your data source configuration under the selected gateway should look like this:

STEP 5: Next we will set up the gateway for the Power BI dataset in the dataset settings page. In the dataset settings page, expand the Gateway connection and select the gateway which you created for this purpose. In the Maps to dropdown select the gateway data source which you created earlier.

Your data source credentials are automatically updated. Now if you hit Refresh Now in your Power BI dataset or configure a scheduled refresh, Power BI Service will be able to successfully create a connection with your ADLS gen2 resource sitting behind the firewall.

Explore and enjoy!

<<Update: Microsoft has recently released a new capability that helps Power BI connect with Azure data sources seamlessly within an Azure Virtual Network (VNet). To learn more visit this link.>>

--

--

Moumi Panja
Microsoft Azure

Senior Technical Solutions Consultant at Google. Opinions are my own and not the views of my employer.