How to read and write data from and to S3 bucket using R?

Saurabh Agarwal
2 min readMar 23, 2020

--

This article talks about reading and writing .csv files from and to S3 bucket respectively. Knowing there are plenty of ways to do so, this article shares the most novel and easy methodology to do so using the in-built functions available in the package “aws.s3".

To do so, let’s first start with installing necessay package and importing the library:

#on mac this works, may work for windows as well
install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))

OR

# on windows you may need:
install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"), INSTALL_opts = "--no-multiarch")

Now that you have installed the only necessary package lets dive deeper.

#importing the library
library("aws.s3")

For reading .csv file from S3 bucket, a connection need to be setup between the R and S3 bucket. This can be done by setting up the system environment using the aws access code and the aws secret key as below:

Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
"AWS_SECRET_ACCESS_KEY" = "mysecretkey")

OR

Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
"AWS_SECRET_ACCESS_KEY" = "mysecretkey",
"AWS_DEFAULT_REGION" = "us-east-1",
"AWS_SESSION_TOKEN" = "mytoken")

Once the system is setup correctly the get_bucket command allows to check and connect to the required bucket as below:

get_bucket(bucket)

Let’s start with the magic!!

Note: filename mentioned below includes the path through which the file needs to be accessed.

Below function for reading the .csv file from S3 uses in the in-built available in “aws.s3” package. By default below function reads comma separated .csv files, however, it can be changed to ‘|’ or ‘\t’ for tab separated depending on the file through the argument sep available in the function.

#’Read .csv from S3 without need to download on local
#’@param filename: Name of the file to be downloaded with path for example ‘path/filename.csv’
#’@param bucket: Name of the bucket
#’@param sep: seperator type in csv, for example, '|' or '\t'
#’@return: csv data
readFromS3 = function(filename, bucket, sep = ‘,’){
return(s3read_using(FUN=read.csv,
bucket = bucket,
object=filename,
sep = sep, header=T))
}

Example:

readFromS3(‘folder1/folder2/myS3file.csv’, ‘s3bucketName’, sep=’|’)

Below function does the same thing (read .csv file from S3) but provides additional flexibility, in case, someone wants to modify the function, read headers or not etc.

readFromS3_other = function(filename,bucket){

key = paste("s3://",bucket,’/’,filename, sep = ‘’)

csvcharobj <- rawToChar(get_object(key))
con <- textConnection(csvcharobj)
data <- read.csv(con, sep = ‘|’, header = T)
close(con)

return(data)
}

Below function helps with writing a .csv file to S3 bucket.

#’Write csv to S3 without need to store in local
#’@param file: pass the variable in which the data is stored
#’@param filename: Name of the file to be downloaded with path for example ‘path/filename.csv’
#’@param bucket: Name of the bucket
writeToS3 = function(file,bucket,filename){
s3write_using(file, FUN = write.csv,
bucket = bucket,
object = filename)
}

Example:

writeToS3(dataframe, ‘s3bucketName’, ‘folder1/folder2/myS3file.csv’)

Note: Setting up environment using access keys is a must to use above methods.

Enjoy!! Let me know for any comments to improve the post or any additional article post that may help.

References:
https://github.com/cloudyr/aws.s3

--

--

Saurabh Agarwal

I am an avid traveller, data science enthusiast, spiritually inclined and like anything that gets my heart pumped.