Creating Data Visualizations in R using Power BI
This is the last of the 3 part series, “Gentle Introduction to R for Power BI”. In previous blogs I described how to use R in Power BI to bring in data, I described how to leverage R in Power Query, and in this blog I will describe how to leverage R in Power BI to create custom visualizations.
The package that we are going to use to develop our custom visualization is ggplot2. The ggplot2 package is arguably the most popular data visualization package in R. It is based on the “grammar of graphics” concept that was created by the statistician, Leland Wilkinson. The ggplot2 package allows you to approach creating charts and graphs in the same manner that Bob Ross approached painting trees in the forest. With ggplot2 you are able to start with a blank canvas and add layers upon layers via short code snippets that builds on each other until you end up with the desired visualization.
The pbix file that is being used in this blog can be found here: http://bit.ly/2jwoCyP. The GentleIntroToR_ChartExample.pbix file contains an example of using R to create a box plot chart that shows the distribution of player scores for the L.A. Lakers. Chiclet slicers were added that allows you to filter by division and/or opponent. The R visualization was created in four steps. Those steps were:
1. Add the “R Visual Script” component to your dashboard
2. Identify the fields for the data frame in the “Fields” tab
3. Make any required transformations to your data frame in R
4. Develop the visualization using the ggplot package
Add the “R Visual Script” component to your dashboard
The first thing we need to do is add the “R Visual Script” component to our dashboard. The component is pictured below:
Clicking on it creates a plot area for a R visualization on the dashboard. You can position and resize it to your desired specifications.
Identify the fields for the data frame in the “Fields” tab
After you put the “R Visual Script” component on your dashboard you must tell R what data you want the R visual to use. You do so by dragging fields to the value section of the “Fields” tab. Pictured below is a screen shot of the fields that I passed to R for the visualization that we are going to build:
R will take that data set and convert it into a R object called a “data frame”. To the MS Excel users that are reading this post, R data frames are similar to MS Excel tables but much more powerful.
One of the requirements of a data frame is that all rows need to be unique. If you pass a data set that contains duplicate rows then R will automatically remove the dups. This may cause undesired results. You can make sure that your data set is unique by passing a field with unique values. In this example I used the “ID” field to make sure that the rows in my data set are unique.
Make any required transformations to your data frame in R
Since we are required to send a data set to R with unique rows the data may be at a level that is too detail for the visual that we want to create. Say for instance we wanted to create a bar chart that displayed sales by month but the data that we passed to R was at the day level. We would need to aggregate the data to the month level before we are able to create the visualization. Fortunately R has many packages that will help you get your data in the shape that it needs to be in. My “go to” packages are dplyr and tidyr. The data in this tutorial is already in the required shape so we can by pass this step.
Develop the visualization using the ggplot package
Now to the fun part! Developing a graph in R is very intuitive. Have you ever developed a chart on the fly in MS Excel using VBA? Not a good experience. Even if you start with code generated from a recorded macro the resulting code is not intuitive at all. This is not the case with R. Here is all the code needed for us to create a box plot chart in R:
The code above is very simple and human readable. The code in line 2 simply loads the ggplot2 package into your R session. Line 4 assigns the “dataset” data frame that Power BI sent to R to a new variable that we will use for our visualization. Line 5 and 6 are converting the “Player” and “opponent” field into a “factor” variable (field). Any field in your data frame that represents categorical data should be converted into the “factor” data type. There are many benefits to having categorical data in that format and some visualizations require it. Line 8 builds the base of your visualization using the “ggplot()” function. We defined our data source using the “data” parameter and we map our data using the “mapping” parameter along with the “aes()” function. Line 8 adds on to the base we built in line 7 and tells R that we want to use the variable we created in line 7 to build a “box plot”. By putting “p” in line 9 by itself tells R that we want to display the chart that is represented in the variable “p”. When executed the code generates the box plot visualization below:
Isn’t that probably the best experience you had creating a visualization via code? Lol. Hadley Wickham and his crew did a great job of making it much easier to create custom visualization via custom code.
Now we have a R visualization in Power BI that is totally interactive with other elements in our Power BI dashboards. The filters created from the outside slicers and charts are propagated to the R visualization.
We kept things very simple since this is a 100 level tutorial. If you migrate to the “Complete Example” tab will see an example of the dashboard that is production ready. Here is the screen shot of that dashboard:
I was able to add dynamic titles that changes based on the filter selections and change background color with very little code. There are many more things we can do to make the visualization better. Many of those things are easier to do in R than they are using other programming language.