Data Handling with dplyr

With ‘dplyr’ PACKAGE | R Programming

What is data handling?
The first and foremost knowledge needed for Data Analyst or Data Scientist is how to handle the data? Now the question is what is data handling?

Data handling means gathering and recording the information gathered and present it in a way that is meaningful to others.

Let us take an example, let’s go back to the early 20’s, do you remember of a phone directory, which consists of peoples name and their phone numbers. The names are arranged in alphabetical order, that means the names are arranged in a systematic manner that is why it is possible to find the number of a particular person. This is an example of data handling as the data is arranged in such a way that is meaningful to others.

Two different approaches towards data handling

  1. Statistical approach
  2. Non-statistical approach

Statistical approach is arranging the data in a meaningful manner and extracting some information from the data which can be used to gain information about the data. Let us suppose that we have observations on the weight of 1,000 students in a random sequence, then after looking at the data, we can`t say anything about the distribution of the weights of the students. For having an information about the above data, we have to arrange the data in a given order, we have to find the mean and standard deviation of the data. So these are some of the points which we have to keep in my mind before starting the data analysis for any data.

Non-statistical approach to the data handling simply arranging your data in a form that is meaningful to others. It can the simple arrangement of names according to the alphabetical order on a sheet of paper, so that when we want the information for a given person we can do it easily.

dplyr package in R programming

One of the most important packages in R programming is the dplyr package which is used for data handling and manipulation in the data frame. The d in the name reinforces that the package is meant to work with data.frames in R. The dplyr package can be used to extract different columns (i.e. different variables) from a data frame, extracting rows from a data frame, adding new variables to the data frame, for applying functions to different variables of data frame, splitting the data according to a variable.

In this article, we will try learning these qualities of the dplyr package using examples. We will take the “mtcars” data present in R.

About the data

The data frame consists of 32 observations on 11 variables. The dataset comprises of fuel consumption and 10 aspects of automobile design and performance for 32 automobiles.

Further reading includes sub topics (along with the respective R codes) such as:

Selecting different columns (“select” function)

Selecting different rows(“filter” and “slice” function)

Adding new columns to the data (“mutate” function)

Applying functions to different columns of the data frame(“summarise” function):

Grouping data using a factor variable and then applying a function to a column(“group_by” function):

Arranging the data frame according to a variable( the “arrange” function):

Conclusion:

From the above functions which we used above, we can say that the “dplyr” package is easy to code, faster to execute. The “dplyr” makes data handling and manipulation much easier instead of using the “base” package in R.

Do you share the same enthusiasm for Data Science, ML, Deep Learning and collaborative learning!! Go ahead and fill in your details here and we will add you as a writer on our Medium publication and StepUp Analytics. Happy writing!

And of course — don’t forget to spread the word around about our publication!.

Scale Up Your Skills with StepUp Analytics.

“Keep Learning, Keep Practicing”

StepUp Analytics

Written by

StepUp Analytics is a Community of Creative, Highly Energetic Data Science and Analytics Professionals and Data Enthusiast.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade