A Comprehensive Summary of Data Import in R: From the Most User-Friendly to the Most Time-Saving
In this short article, I summarize different techniques in data import to R. A spectrum of approaches that trade-off speed vs convenience is presented for my readers to choose from.
Assuming the data files are all stored on Folder F:\BDMA\data\. The most user-friendly data import approach is an interactive method; that is, it allows you to open folders to locate the data file like a regular file-open operation. Make sure you notice a flashing window on the taskbar that invites you to choose the destination file.
Another you-import-what-you-see approach is through copy & paste.
Suppose you want ‘directly’ copy &paste from a spreasheet. read.table function can help. Copy the selected cells first.
And then go to R-Studio, paste the content inside the pair of quotation marks. Do not adjust anything in the content, e.g., spaces, blanks, returns. Just leave them as is.
df.name=read.table(text=c(“ID Ads Orders Leads Proved
1 265 454 245 Y
2 317 197 434 Y
3 322 478 416 Y
4 233 198 377 Y
5 365 364 193 Y
6 349 384 437 Y
7 331 203 331 N
8 237 133 443 N
9 229 312 445 N
Alternatively, use read.delim. Amazingly this one-line code gets you the same result.
When you have to run the same code numerous times, all the way from data import to analyses, and to results export, you want to avoid the manual work above. You want to select the chunk of code and click “run,” so they can take care of themselves while you can have a rest. Then you need to use the following syntax approaches.
The most common file type for R users is csv format. Excel can open a csv file as a regular spreadsheet file with a single worksheet. Suppose your data file is stored as F:\BDMA\data\my_data.csv. You can use the build-in function read.csv to import the file to R.
Note that I use “/“ in telling R the address/location of the data file, whereas the folders and files are described using “\” in Windows’ file manager. To avoid this nuisance, I usually select the file address and replace all “\” with “/” in R-Studio.
Another way to deal with the switch of slashes is to pre-designate the working directory where the data file is stored using the following code. where wd stands working directory.
You can use getwd() to check if the working directory is correctly set. Then
Accordingly, it is just a one-line code to export dataframe in your R-Studio back to a csv file using write.csv function.
write.csv(data.name, “new_file_name.csv”, row.names=F)
Make sure you have included the argument row.names=F; otherwise, you just add a useless column of row IDs to the file. And when you import from that file next time, a new row-ID column is added to the imported dataframe, which screws up your code where you cite variables by column IDs.
Dropbox is a service that I enjoy storing data files. in a Dropbox folder I can use the Dropbox link to import data, completely avoiding those slashes. Dropbox generates a URL link similar to the one below.
To apply read.csv to Dropbox links, simply change the last character in the Dropbox link from 0 to 1.
df.name=read.csv(“https://www.dropbox.com/z/rs4j7617jlrsk1mu8of/my_data.csv?dl=1",header = T)
Zip files are conveinent to tranfer with compressed csv files. To import a zip file without unzipping it, we can use a package called readr and use user read_csv to read zipped data file.
If a zip file includes multiple csv files, and you don’t bother to unzip all of them, you can use the following code.
df.name = read.csv(unz(“zip_file.zip”, “my_data.csv”), header = T, fileEncoding=”UTF-16")
If we are dealing with Excel files such as xls and xlsx, there are several packages can help you. readxl is one of them. You need to install Java on your computer first to be able to use readxl. Once loaded, you can use read_excel function to import both xls and xlsx files.
For xlsx files, I recommend read.xlsx function in openxlsx package.
df.name=openxlsx::read.xlsx(“my_data.xlsx”, sheet = 1, skipEmptyRows = T)
I found read.xlsx cannot import xls files.
Regarding how to export dataframes to Excel, please refer to this article.
Huge csv files take long time to import. If you have a huge csv file and you cannot even open it in Excel, you can use fread function in data.table package, where “f” stands for “fast.”(no joking)
for a csv file that read.csv needs 4 minutes to import, fread takes about 14 seconds on my computer.
Accordingly, to export a large data file to a csv file, you can use fwrite.
Finally, some files are stored as JSON format, and this format is really beyond the scope of this article. A Google search of rjson package hopefully will give you what you are looking for.
# — — — — -
This article is devoted to my students who are suffering a headache from our first BDMA lecture. The second lecture is coming… Hope this article helps.