Short: Retail Dashboarding with Shiny

Connor Higgins
Connor Higgins
Published in
5 min readDec 10, 2019

Lately I have been looking at the Kaggle Retail Data Analytics set. Businesses large and small have a need to make sense of their data.

Dashboards are popular tools for visualizing business data. Using R’s Shiny package we can make a dashboard of a sorts–though one where we don’t need to worry about keeping the data up to date. As it happens, this is a highly effective way to visualize the Retail Analytics dataset.

The goal here is to compare the performance of the 45 stores in the dataset.

Code behind the app

Shiny apps have two components: a ui and a server function.

The ui governs the layout of what the user sees, buttons they can click, and plots they are presented with.

The server meanwhile is what runs the R code to generate plots, tables etc.

The two communicate using two lists:

  • ui sends and server can recieve inputs (i.e. input$’selection’)
  • server sends and the ui can recieve outputs (i.e. output$YearlySales)

Here is the barebones of the app being used here, with the server code removed for conciseness (this code is described in the next section). We load in the librarys used, the data, then we set up the ui and server sections. Finally we run the app.

The ui code is governing the layout, in this case organizing the visuals into tabs.

library(shiny)
library(ggplot2)
library(data.table)
library(scales)
library(magrittr)
library(DT)
library(tidyr)
sales <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/sales%20data-set.csv")
sales$Store<-as.factor(sales$Store)
sales$Date <- sales$Date %>% as.Date("%d/%m/%Y")
storesDat <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/stores%20data-set.csv")
storesDat$Store<-as.factor(storesDat$Store)
ui <- fluidPage( tabsetPanel(
tabPanel("Weekly sales over time",
sidebarLayout(position = "left",
sidebarPanel(
selectInput(inputId = "selection",
label = "Select a Store's Sales to visualize", choices = c(1:length(unique(sales$Store)),"All"))
),
mainPanel(
plotOutput("sales") )
)
),
tabPanel("Total yearly sales by store",
p("Table is sorted by 2012 (descending total sales)"),
dataTableOutput("YearlySales")
),
tabPanel("Store size and type",
plotOutput("characteristics"))
)
)server <- function(input, output, session) {
output$sales <- renderPlot({
if(input$selection=="All"){
# If all stores are selected
# "sales" plot
}
else{
# If individual stores are selected
# "sales" plot
}
})
output$YearlySales <-DT::renderDataTable({
# "YearlySales" plot
})
output$characteristics <- renderPlot({
# "characteristics" plot
})
}
shinyApp(ui, server)

Code behind the visualizations

The data we are using comes from two sets: “stores data-set.csv”, “sales data-set.csv”

The two sets tell us the following. We have 45 separate stores, of varying sizes and types (stores data-set.csv). We also know their weekly sales over 3 years.

First prepare the data, then on to the visualizations…

library(shiny)
library(ggplot2)
library(data.table)
library(scales)
library(magrittr)
library(DT)
library(tidyr)
sales <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/sales%20data-set.csv")
sales$Store<-as.factor(sales$Store)
sales$Date <- sales$Date %>% as.Date("%d/%m/%Y")
storesDat <- read.csv("https://raw.githubusercontent.com/connorH982/projects/master/stores%20data-set.csv")
storesDat$Store<-as.factor(storesDat$Store)

‘WEEKLY SALES OVER TIME’

if(input$selection=="All"){      ggplot(data = sales,aes(x=sales$Date,y=sales$Weekly_Sales)) +
geom_smooth(color = "dodgerblue4", aes(group = as.factor(sales$Store)), size = 0.1)+
xlab("Date")+ylab("Weekly Sales")+ggtitle("Weekly Sales for all Stores (45)")+scale_x_date(
labels = labelsdate_format("%Y")) + geom_smooth(color = "dark red", size =1,
aes(x=sales$Date,y=sales$Weekly_Sales,group = NULL))
}
else{ subData <- sales[sales$Store==input$selection,] ggplot(data = subData,aes(x=subData$Date,y=subData$Weekly_Sales))+
geom_smooth(color = "dark red", size=0.1) +
geom_point(color = "dodgerblue4", size = 0.2, shape = 18)+
xlab("Date")+ylab("Weekly Sales")+ggtitle(paste("Weekly Sales for Store", ... input$selection))+
scale_x_date(labels = date_format("%Y"))
}

This code block is dependent on user input, where they can choose to plot weekly sales over time for one or all stores. If they choose to visualize one store, then a the data is subset by that store. A scatter plot (geom_point) is generated that includes a smoothed conditional means line to show overall behavior (geom_smooth).

If “All” is selected by the user, then smoothed lines are generated showing the conditional means of the individual stores in blue (with a 95% prediciton interval). A red line showing the overall behavior of all stores is also generated.

‘Total yearly sales by store’

##Prepare data, cut data by year, convert to wide format while summing up yearly revenue    SaleDT <-aggregate(sales$Weekly_Sales, by = list(sales$Store,cut.Date(sales$Date, 
x = "years")),sum)%>%spread(2,3)
colnames(SaleDT) <- c("Store","2010","2011","2012")
Type <-storesDat[,-3]
SaleDT <- merge(Type,SaleDT, by = "Store")
rownames(SaleDT) <- paste("Store",rownames(SaleDT))
SaleDT <- SaleDT[order(SaleDT$`2012`,decreasing = TRUE),-1]
SaleDT$Type<-as.factor(SaleDT$Type)
##Display using Datatable #Set up colors used in the heatmap
brks <- quantile(SaleDT[,-1], probs = seq(.05, .95, .05), na.rm = TRUE)
clrs <- round(seq(255, 150, length.out = length(brks) + 1), 0) %>%
{paste0("rgb(", ., ",", ., ",255)")}
#Set up the datatable
datatable(SaleDT) %>% formatCurrency(2:4) %>% formatStyle(c("2010","2011","2012"),
backgroundColor =styleInterval(brks, clrs),backgroundPosition = 'center')%>%
formatStyle("Type",backgroundColor = styleEqual(levels(SaleDT$Type),c("tomato","seagreen","goldenrod")))

Arguably the most interesting plot though also the most complicated to implement. This code block makes extensive use of two packages: tidyvr and DT.

Here is what the data we are interested in currently looks like:

StoreDateWeekly_Sales105/02/201024924.50112/02/201046039.49119/02/201041595.55126/02/201019403.54105/03/201021827.90

What we want is to show yearly sales by each store. First prepare the data using tidyvr’s spread function and base R’s aggreate function. We aggregate the data by slicing it into sections by year, which gives us the following columns: Store, Year, and sum of sales given year and store.

This is “long” format data, so for presentation we spread the data using tidyvr by year and yearly revenue. We are left with Stores names as “rows” and 3 new columns for years, with the cells containing the yearly revenue. We also perform a few simple steps the make the data more presentable: merging the store “Types” from the “stores data-set”, removing Stores as a column and instead renaming the rows “Store 1”,“Store 2”, etc.

SaleDT <-aggregate(sales$Weekly_Sales, by = list(sales$Store,cut.Date(sales$Date, 
x = "years")),sum)%>%spread(2,3)

The second code section is all focused on making the data more presentable using the DT package, which is designed to implement datatables in Shiny. First set up breaks and colors for a heatmap (I went with a light blue theme).
Second we set up the visuals:

  • convert SaleDT to a datatable
  • format the yearly revenue as currency
  • For year columns: set up the “heatmap” based on the breaks and colors set up previously
  • For the Type column: color these based on values used in the next visual for consistency
##Display using Datatable    #Set up colors used in the heatmap
brks <- quantile(SaleDT[,-1], probs = seq(.05, .95, .05), na.rm = TRUE)
clrs <- round(seq(255, 150, length.out = length(brks) + 1), 0) %>%
{paste0("rgb(", ., ",", ., ",255)")}
#Set up the datatable
datatable(SaleDT) %>% formatCurrency(2:4) %>% formatStyle(c("2010","2011","2012"),
backgroundColor =styleInterval(brks, clrs),backgroundPosition = 'center')%>%
formatStyle("Type",backgroundColor = styleEqual(levels(SaleDT$Type),c("tomato","seagreen","goldenrod")))

‘Store size and type’

We set up a barplot that gives us the size of each of the 45 stores colored by their type. This is nothing particularly complicated compared to the other visuals, though we can immediately see a pattern in the data where we can roughly see Type A stores are larger than B, which is larger than C.

ggplot(data = storesDat)+geom_bar(aes(x = storesDat$Store,y = storesDat$Size, fill = storesDat$Type),
stat = "identity")+xlab("Store #")+ylab("Store Size")+guides(fill = guide_legend(title = "Store Type"))+
scale_fill_manual(values = c("tomato","seagreen","goldenrod"))

--

--

Connor Higgins
Connor Higgins

Current graduate student at Northeastern University, pursuing a career in data science. Also an avid reader of speculative fiction!