R with SAP Analytics Cloud : Part Two

Rajkumar Benny
SAP Analytics Cloud
4 min readOct 2, 2019

The previous article dealt with the very basics of the R programming language that’s useful in SAP Analytics Cloud. In this segment, let’s go to the Dplyr package and see how it could be used with creating visualizations.

There are hundreds of articles over the internet if you intend to get into further applications of this package. For now, let’s just cover the bare basics of getting started with it. The plus point of understanding this package is that it’s going to make your journey through R programming a lot more fun and easier.

The Dplyr package consists of five main elements that make it up. They are select(), arrange(), filter(), mutate() and group_by().

The elements seem fairly intuitive, right ? Let’s apply them and try understanding what they are about.

Here’s a gif below to show you what select(), arrange() and filter() does -

After watching the above GIF, you may ask — what is the %>% symbol and what role does it play at all?

%>% is known as the Pipe operator, and it’s a part and parcel of the dplyr package. It basically helps in maintaining the sequence of the analysis.

Let us translate the R code that I have typed to a pseudocode to get a better understanding —

a) Retrieve the dataset called sports_data !

b) Next with the retrieved data set, with the help of the piping operator %>% , select only the Player Team, Rebounds and the Teams in the dataset.

c) Based on the selected data, arrange the rebounds ! (By default it’s in ascending order. If you intend to arrange the Rebounds descending order, use desc(Rebounds) )

d) With the new dataset, filter and display only the data where rebounds are greater than 100.

Fairly intuitive, right ?

Next, let’s look into group_by() and summarize() with GIFs.

After inserting the group_by function and try to group the data by Teams, we see that nothing really. Wait, are we hitting a dead end here ?

Let’s introduce summarize() and check it out.

Yes, the group_by function cannot function on its own, it’s dependent on the summarize function, which as the name implies, summarizes everything into a single column! Thus, by grouping the data by either Player Names or Teams, we can see their maximum rebounds only for them.

The other applications for the summarize() are in calculating the mean, median and the counts of the variable being a few of the many applications.

For more, you can check out by typing ?summarize in the console.

Let’s apply the skills we have picked up so far into creating a Donut chart !

The donut chart is an alternative to Pie charts, simply because it occupies lesser area and it looks better.

The goal here is to create a count on the Agents and find out which agent is on higher demand and who isn’t !

The code below:

On SAP Analytics cloud :

Agent Kay has a very high demand while Dylan has a low count in contrast !

So, I started by taking a count on the agents , as denoted by count = n() , and then, I used the plot_ly function, which I installed using the plotly package.

It’s a package that’s dedicated especially for visualizations which rivals another popular package in R that’s called ggplot. https://plot.ly/r/

So, in the code, we insert a pie chart by adding the add_pie function. But since our goal is to create a donut chart, I added a hole and mentioned its size.

Lastly, the layout, as the name implies, is used to check on the design of the layout. I removed the legends since it looks a little cluttered.

I hope that this blog gave a fair idea on how one could use the dplyr package in R, and create some neat visualizations in SAC.

--

--