We talk about R functions that will make your data science journey easier.
Having a bunch of data is nice, but the real fun starts when you load that data into a program that can interpret what’s going on. The most common way to get data into R is the read.csv function. However, I suggest you use read_csv instead.
Here’s why, and how to do it.
What’s the difference?
Sometimes in coding, the difference between a dot and an underscore is little more than a coder’s preference. In this case, however, that subtle change means everything.
The read_csv function imports data into R as a tibble, while read.csv imports a regular old R data frame instead.
Tibbles are better than regular data frames because they:
- load faster
- don’t change input types
- allow you to have columns as lists
- allow non-standard variable names (i.e. your variables can start with a number and can contain spaces)
- never create row names
There are other nuanced reasons why tibbles are better than classic data frames, but for now all you need to know is that:
- read_csv creates a tibble
- read.csv creates a regular data frame.
- you should load a tibble instead of a data frame if you’re a data scientist with better things to do other than wait for your data to load into R.
How to load read_csv()
Before you can use the read_csv function, you have to load readr, the R package that houses read_csv.
You have two options to do so.
Option 1: Install and load the readr package
If you know you just want to install readr, use:
If you’d like to install the development version from Github instead, then use:
Then, load readr using:
Option 2: Load the whole tidyverse package
Installing readr by itself can be beneficial in some specific cases. But if you know you’re going to use more than just readr from the tidyverse world — which, if you’re reading this, probably holds true — you can install the whole tidyverse package using:
Doing so allows you to load readr through:
How to use read_csv()
Now that you have readr loaded into R, you can use read_csv to import data for analysis.
To do so, all you need to do is go to your working directory and use:
read_csv("CSV file name.csv")
Of course, typically you’ll want to load the CSV into a variable when using R so you can refer to it whenever that dataset is needed. All that takes is:
variable <- read_csv("CSV file name.csv")
Voila. Now your variable holds a tibble with all your CSV data inside. It’s a straightforward process and one you should become intimately familiar with if you use R regularly.
Always remember: having data is great, but getting that data ready for analysis is the key. The read_csv function is one of the quickest and most efficient ways to do that.