Using googlesheets and mailR packages in R to automate reporting

By Pauline Glikman

Most companies make extensive use of spreadsheets to store, work and share data internally.

One important use is for reporting purposes. If you are an analyst, you are often tasked with reporting sales, service level, or marketing campaign results on a monthly, weekly or even daily basis. Most decision-makers do not pull their own data and are used to seeing and working on this data in spreadsheets. For instance, at Airbnb, I share a weekly summary of our performance in a spreadsheet for my business stakeholders to be able to make data-driven decisions.

For years, Excel has been the main software that allowed people to share internal data. But using Excel for reporting has its drawbacks:

  1. First manually updating data into spreadsheets is time-consuming and takes a large amount of analysts’ time. Before automating much of the reporting I am responsible for, I was often spending well over 30% of my time pulling data from our database, inputting the updated dataset in a spreadsheet and wait until all computations were done.
  2. Manual reporting can lead to errors. Shifting the inputted data by one cell will yield wrong results and will trigger angry emails from decision-makers!
  3. Excel files are shared via emails as attached files. It is not uncommon for multiple versions of the same files to be shared in the organization. This can lead to confusions, miscommunication and loss in efficiency.

With its Google docs suite, Google has solved point 3) by giving anyone the ability to create online collaborative spreadsheets for teams to work on simultaneously, for colleagues to give comments and communicate with one another in one place without having to email different mark-ups back and forth.

This post is about providing solutions to points 1) and 2) to help you reduce the amount of time you spend on reporting.

R allows for easy automation of the reporting process. This means that the analyst’s task of pulling data, making some computations, uploading it in a nice spreadsheets and emailing it to relevant business stakeholders can be completely automated in a single script. And because with the same script and the same dataset, you will confidently obtain the same reproducible results you will not have to worry about making errors anymore.

First, one of the most important feature of R is that it allows for reproducible results: with the same script and the same dataset, you will confidently obtain the same results and not have to worry about making errors. At Airbnb, for example, our Data Science team built an R package which allows employees to collaborate and share reproducible R code. More info about how Airbnb scales data science by using R packages here.

So whether you are currently running a daily reports showing sales or service level, this post can help you improve your productivity.

Installing the googlesheets package and the mailR package

Before we get started, and assuming you have already installed R and Rstudio on your computer, let’s install everything we need:

Googlesheets package

If you have never installed this package on your computer, install it by typing in the console:

install.packages(“googlesheets”)

You can now load the googlesheets package by typing:

library(“googlesheets”)

You are now ready to work on your google spreadsheets in R!

Installing rJava and mailR

Before installing mailR, you will need to install another package called rJava. In Rstudio:

install.packages(“rJava”)
library(rJava)
install.packages(“mailR”)
library(mailR)

Building your report with googlesheets package

For the purpose of this post, we will use this spreadsheet.

To Authorize googlesheets to view and manage your files, you first need to type:

gs_auth()

You will be directed to a web browser, asked to sign in to your Google account, and to grant googlesheets permission to operate on your behalf with Google Sheets and Google Drive.

Now, let’s assume that you have been asked to create a daily report in a google spreadsheet. The first thing you need to do is to get some data. This may be via the web, via an API or a database. At Airbnb, our internal R package called Rbnb allows anyone in the company to quickly access our database without leaving the friendly environment of Rstudio.

For the purpose of this post we will be using the mtcars dataset and we will be pretending that the report you need to update daily shows the top 10 cars with the highest mpg.

We can select the top 10 cars in the following way:

# select 10 cars with highest mgp in order and store it in a
# dataframe called data
data = head(mtcars[with(mtcars, order(-mpg)), ], 10)

You now have some data to upload to your spreadsheet. To do so you will first load your spreadsheet with either the gs_title() or the gs_url() function. It is probably a good practice to use gs_url(“url_of_your_spreadsheet”) in case the name of your spreadsheet changes over time.

You can then use gs_edit_cells() and specify the worksheet you want to update with the ws argument:

# To load your spreadsheet use the gs_title() or gs_url() functions
top_10_cars <- gs_url(“https://docs.google.com/spreadsheets/d/1LYxV8Z324o-OALSwO2h99fUecBdb_t-8BYgBbe0G1KU/edit#gid=0")
# To edit the cells of your spreadsheet with your newly pulled data
# use gs_edit_cells and specify the work sheet with the ws argument
top_10_cars <- top_10_cars %>%
gs_edit_cells(ws = “Tab 1”, input =data)

Done! You can now run your script every single day and it will update your sheet faster than loading your browser!

mailR: Updating your stakeholder with an automated email

Once your report is updated, you might want to send an email to the people looking at your dashboard. If you are running this script daily, opening your gmail and drafting an email seems like a waste of time.

Don’t worry, mailR has your back!

First go to this page to add a specific app password. This will generate a random password that you can just use for mailR, allowing you to save it in a script without revealing your actual credentials. You can also revoke this password at any time from your Google accounts settings page. Be careful, however — if this password is compromised, it can be used to fully access your account from anywhere! Consequently, make sure you don’t share a script with your password in it with anyone else.

Then add these lines at the end of your script and observe the magic:

# Write the content of your email
msg <- paste(“Hi team,”,””,”The mpg car dashboard is up-to-date as of”,as.character(date()),”and can be accessed here: https://docs.google.com/spreadsheets/d/1LYxV8Z324o-OALSwO2h99fUecBdb_t-8BYgBbe0G1KU/edit#gid=0","","Best,","Your name”)
# Define who the sender is
sender <- “firstname.lastname@email.com”
# Define who should get your email
recipients <- c(“email_of_recipient1”,
”email_of_recipient2",
”email_of_recipient3")
# Send your email with the send.mail function
send.mail(from = sender,
to = recipients,
subject = “Top 10 cars dashboard”,
body = msg,
smtp = list(host.name = “smtp.gmail.com”, port = 587,
user.name = “firstname.lastname@email.com”,
passwd = “your_app_specific_password”, ssl = TRUE),
authenticate = TRUE,
send = TRUE)

Appendix : One more thing on googlesheets…

There is one major limitation with the googlesheets package is that it relies on a Google API which is apparently slow (see this). As a result, uploading large datasets to your spreadsheet will not work in one chunk.

One fix I have found and been using is to batch my data and load it in the spreadsheet in chunks.

You can use this code to chunk your data:

# Chunk your data in batches of 500 rows to ease the upload
data_in_chunks = list()
batches = round(nrow(data)/500)
for(i in 1:batches){
data_in_chunks[[i]] = data[(((i-1)*500)+1):(i*500),]
}
# Then upload your batches with a for loop
for(i in 1:(length(data_in_chunks)-1)){
row = paste("A",as.character(2+(i*500)), sep='')
top_10_cars <- top_10_cars %>% gs_edit_cells(ws = "Tab 1", input = data_in_chunks[[i+1]], anchor=row, col_names = FALSE, verbose = TRUE)
}

Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData