Introduction to R for Data Science (Part Three)

This is the third introduction to R. This will cover the basics of data frame, data frame indexing and selection, operations, and more.

Ivan Huang
3 min readMar 23, 2023

*Originally published on my Substack. This is just a part of the article.

PS: Please read ‘Introduction to R for Data Science (Part Two)’ before reading this one. This is a continued version of part two.

Part Two: Introduction to R for Data Science (Part Two)

Data Frames

We will be able to organize and mix data types to create data structures with the help of data frames.

Some built-in data frames (type these in into the console):

  • state.x77
  • USPersonalExpenditure
  • women
  • WorldPhones

If you want all the data frames available in R use:

  • data()

If you want to take a peek into the top or bottom data frame use:

  • head(state.x77)

This would show the first six rows (default).

  • head(state.x77, 9)

This would show the first nine rows. You can change it to show how many rows you want. If you want to see eleven rows change it to head(state.x77,11). If you want to see four rows change it to head(state.x77,4). You get the idea.

  • tail(state.x77)

This would show the last six rows.

PS: Ignore the error, I have accidentally written a period instead of a comma.

You can create your own data frames using the data.frame function.

str(dda) would give information about the structure of the data in the data frame.

summary(dda) would give us a summary for each of the columns in our data frame. So we get minimum, median, mean, quartile values, and maximum values.

Data Frames Indexing and Selection

This is how we’re going to grab data out of our data frame.

We can use dda[1,] to get the first row back.

We can use dda[,1] to get all the columns from the first row.

We can use dda[,’rain’] to get all the values for rain.

We can use dda[1:5,c(‘days’,’temps’)] to get all the rows, but only the values for days and temps.

We can use dda$days to get all the values for days.

We can do it with temps (dda$temps) and it’ll show all the values for temps

We can use dda[‘days’] to get all the days, but the difference between this one and dda$days is it returns it in a data frame format. If I’m using a dollar sign then I’ll get back a vector.

We can use the subset function(subset()) to grab a subset of values from our data. In this case, we want to return rains that are true.

I have also used a subs

--

--