We talk about R functions that will make your data science journey easier.
Having a bunch of data is nice, but the real fun starts when you load that data into a program that can interpret what’s going on. The most common way to get data into R is the read.csv function. However, I suggest you use read_csv instead.
Here’s why, and how to do it.
What’s the difference?
Sometimes in coding, the difference between a dot and an underscore is little more than a coder’s preference. In this case, however, that subtle change means everything.
The read_csv function imports data into R as a tibble, while read.csv imports a regular old R data frame instead.
Tibbles are better than regular data frames because they:
- load faster
- don’t change input types
- allow you to have columns as lists
- allow non-standard variable names (i.e. your variables can start with a number and can contain spaces)
- never create row names
There are other nuanced reasons why tibbles are better than classic data frames, but for now all you need to know is that:
- read_csv creates a tibble
- read.csv creates a regular data frame.
- you should load a tibble instead of a data frame if you’re a data scientist with better things to do other than wait for your data to load into R.
How to load read_csv()
Before you can use the read_csv function, you have to load readr, the R package that houses read_csv.
You have two options to do so.
Option 1: Install and load the readr package
If you know you just want to install readr, use:
If you’d like to install the development version from Github instead, then use: