Power BI & R Part 2: Running R scripts in Power BI

Suraj Thatte
4 min readMar 27, 2019

--

In my previous article, I had outlined various scenarios in which Power BI and R can be used together. The focus of this article is to demonstrate use of R scripts in Power BI Desktop. R scripts can be used to import data into Power BI from sources that are not currently supported by Power BI connectors. In addition, R has powerful packages for numerous data transformation tasks or “data wrangling” as they call it! Let’s look at a specific example to understand this functionality.

Consider a hypothetical furniture manufacturer who has a Distribution Center (DC) in Chicago, IL. Towards the end of each Quarter, their sales team generates a demand file for the next Quarter. The file includes order number, SKU (Item number), order quantity, destination address and requested delivery date.

The supply chain team has asked for our help in analyzing this data for planning the distribution operation. They are looking for trends in orders, variation in demand by region by SKU. Secondly, they want to know the distance to each of the sites from the DC to estimate transportation costs and resources required.

Our approach is to use R for importing data, transformation with ‘dplyr’ package and computation of distances using the ‘gmapsdistance’ package. We will then run R script in Power BI as a data source and leverage the interactive visualization capabilities of Power BI.

If you want try this on your Power BI Desktop, it might be worthwhile to check this link to make sure that you have all the prerequisites covered for running R.

To begin with, let’s load the required packages -

library(readxl) 
library(dplyr)
library(gmapsdistance)

Then, the ‘read_xlsx()’ function imports the Excel file into R as a data frame. The data frame looks like this.

d1<-read_xlsx("<Enter file path here>OrdersFileSample.xlsx",1)
names(d1)[2]<-paste("RequestedDate") #Change column names

Now, we re-format the address to make it compatible for the ‘gmapsdistance’ call. We extract unique destination address strings to limit the number of API calls and also reduce the script run time.

# Make address compatible for the API call
d1$Dest_Addr<-gsub(" ","+",d1$Address)
d1$Dest<-paste(d1$Dest_Addr,d1$City,d1$State,sep = "+")
Origin_address<-"<Address of Distribution center>"

d<-unique(d1$Dest) #Get unique address strings

Next step is to run the function with the Google maps API key for all unique origin-destination combinations. We then save the results to the ‘distance’ data frame. With the destination address as a key, we join the two data frames. Note that merge function with ‘all.x=TRUE’ performs a left outer join. The final result is the output data frame.

set.api.key("<enter your API key here>") #set your API key here

dist<-c()
n<-length(d)
for (i in 1:n)
{
results=gmapsdistance(origin = Origin_address, destination = d[i], mode = "driving")
dist[i]<-(results$Distance)*0.000621371 #convert metre to mile
}

distance<-data.frame(dist,d)
names(distance)[2]<-paste("Dest") #Change column names
output<-merge(d1, distance, by="Dest", all.x=TRUE)

You can test the script in RStudio (or an IDE of your choice) where it’s easier to debug the code. Once the script runs successfully, we can then drop it into Power BI using ‘Get Data> Other> R script’.

We can then create the presentation layer over the data using Power BI. The Orders tab on the report shows orders by region (A & C), by SKU (B) and by Requested delivery date (D)

On the Transportation planning tab, Chart E shows average order quantity and variation by SKU. Visual F shows relative volume (width of the line) and distances from the DC. The Tooltip feature in Power BI provides additional details as shown. Given the transportation cost per ton-mile, we can estimate the total variable costs.

Thanks for going through this post. Feedback is greatly appreciated.

--

--

Suraj Thatte

Passionate about coding, exploring new analytics tools, visual design, statistics, math and coaching others; all views are my own