Loading Data from Excel Files into R with readxl package.

Maureen Waitherero
4 min readJan 8, 2018

--

Excel is a powerful tool that is in most data analyst toolbox. This is for analysts taking their first steps to working with R alongside excel. In this one we look at how to;

1. Prep your data for importation in excel to R

2. Setting your working directory

3. Read excel files into R

1. Prepping Your Data in Excel

Best Practices of preparing data before importing it into R

Getting your data into a ‘Tidy’ state is recommended for ease of importation and subsequent data wrangling of data from Excel to R.
Data is considered tidy when:

i). Each variable has its own column.
ii). Each observation has its own row.
iii). Each value has its own cell.

The first row of the spreadsheet is commonly reserved for the header, while the first column is used to identify the unit count.
In the case that the first row of the spreadsheet is the header(column names) in your spreadsheet ,it is recommended that the column names;

i). Have short names over longer names.

ii). Avoid using names that contain symbols such as ?, $,%, ^, &, *, (, ),-,#, ?,,,<,>, /, |, \, [ ,] ,{, and }Only underscore can be used.

iii). Avoid names with blank spaces. Good column names: Long_jump or Long.jump. Bad column name: Long jump.

iv). Avoid beginning variable names with a number. Use letter instead. Good column names: sport_100m or x100m. Bad column name: 100m

v). Column names must be unique. Duplicated names are not allowed.

Other recommended practices include;

  • Avoid blank rows in your data
  • Delete any comments in your file
  • Replace missing values by NA (for not available)
  • If you have a column containing date, use the four digit format. Good format: 01/01/2016. Bad format: 01/01/16

2. Setting up working directory

A working directory is the default location that R looks for files to bring in and where to store files as outputs. Among other best practices (learn others here) setting a working directory is considered good practice and gives the convenience of knowing where all your files are located.

R is always pointed at a directory on your computer. You can find out which directory by running the getwd()(get working directory) function; this function has no arguments.

getwd()  #Get current working directory

To change your working directory, use setwd() function and specify the path to the desired folder, preferably the one containing your data.

setwd("/home/maureen/Documents/R-Analytics") # set working directory

3. Read excel files into R

i). Installation and loading of readxl package.

The readxl package makes it easy to get data out of Excel and into R. Install the readxl package as illustrated below;

install.packages("readxl")

When you have gone through the installation, just type in the following to activate the package into your workspace:

library(readxl)

ii). Loading one spreadsheet in the workbook

To read in worksheets we use the read_excel().This function reads both xls and xlsx files and detects the format from the extension.

We load a worksheet by specifying a worksheet by its name or its index (sheet indexing starts at 1)

#import a specific worksheet by referencing the sheet's index (number)read_excel(“MUMIAS_SURVEY.xlsx”,sheet=1)#import a specific worksheet by referencing the sheet's nameread_excel("MUMIAS_SURVEY.xlsx",sheet="Financial") 

If sheet= argument is not specified, by default read_excel() will import the first worksheet of the excel worksheet.

read_excel("MUMIAS_SURVEY.xlsx")

To load any other spreadsheet with its index number, change the worksheet sheet argument index number. Sheet indexing starts at 1 and the sheet argument takes only one integer and one string argument.
Note! you
cannot concatenate.

iii). Load entire workbook or more than one worksheet in a workbook.

In the case that you would like to load in the entire workbook, essentially all the worksheets in the workbook, use the lapply() function as illustrated below.

The lapply( ) function does the following simple series of operations:

It loops over the first argument which should be a list (If the first argument is not a list, it will be coerced to a list), iterating over each element in that list while applying a function to each element of the list (a function that you specify) and returns a list.

Using the excel_sheets()function we create a list of all the file names in our working directory. We now use the lapply() to loop over that list and read each file with the read_excel()function from the readxl package.

#List names of all sheets in an excel workbook.
sp.list <- excel_sheets("MUMIAS_SURVEY.xlsx")

We have our list all worksheets so lets read them in.

# load all sheets in a workbook, using lapplydata <- lapply(sp.list, read_excel, path="MUMIAS_SURVEY.xlsx")

iv). Load multiple workbooks.

Using the list.files()function we create a list of all the file names in our working directory. We now use the lapply to loop over that list and read each file with the read_excel function from the readxl package.

#List names of all workbooks with the .xlsx extension in your working directorydata.list <- list.files(pattern = '*.xlsx')# load all workbooks with the .xlsx extension in your working directorydf.list <- lapply(data.list, read_excel)

Hope you learnt alot. See you in the next one!

--

--