R Programming Basics: dplyr

T Z J Y
4 min readOct 17, 2021

As one of the core packages in the R programming language, dplyr is primarily a set of functions designed to enable data frame manipulation in an intuitive, user-friendly way. Data analysts typically use it in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization. In this post, I want to explore some of the commonly used functions and writing styles of dplyr.

The dplyr commands do NOT change the original dataset, they only return modified copies for us to use.

Manipulating Data

Select

This command allows us to select certain columns from the dataset. The syntax is straight forward.

select(df, Column1, Column2)

A few things that need to be taken care of during the usage of select:

  1. No quotes required for column names
  2. No $ sign required in front of the column names
  3. One can use numeric order of columns for select, i.e. select(df, 2:5) would select the column number 2 to 5 from the dataframe df
  4. After selection, we can also select which columns to not choose from the dataset, by using a syntax like select(df, 2:5, -(3:4)). This command would choose the column 2 through 5 without the…

--

--