Day 3(Part-2): Data Frame

SaiGayatri Vadali
8 min readDec 21, 2017

--

We have seen what the data frame looks like in this article. We shall know more about data frame today as it is going to be extremely helpful in our data analysis. R has built in data frames too and we are using one among them named ‘mtcars’. Let us not view the data frame right in the beginning. We shall start gathering the information about it using various methods which I will mention below and then at the end, we shall have a glance. For those who want to view it right away, just type ‘mtcars’ in R console and can view it.

Dimensions:

From the previous article of today, you must have got an idea as to what should be done now. Yes let’s know about the dimensions of ‘ mtcars’ using dim() function.

>dim(mtcars)
32 11

So there are 32 rows and 11 columns in this data frame.

View() :

There is another option too to view the data frame using View() function. It enables us to view data frame in a clear tabular format.

Just type View(mtcars) in your console and you will find all the data placed in a table in a new window.

Finding names of rows and columns :

As already stated in the previous article, there can be names of the rows and columns of the data frame. Let’s see if there are column names for our data frame ‘mtcars’ with the following function.

> names(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt"
[7] "qsec" "vs" "am" "gear" "carb"

Yes, they are. Now, it’s time for finding out the row names.

> row.names(mtcars)
[1] "Mazda RX4" "Mazda RX4 Wag"
[3] "Datsun 710" "Hornet 4 Drive"
[5] "Hornet Sportabout" "Valiant"
[7] "Duster 360" "Merc 240D"
[9] "Merc 230" "Merc 280"
[11] "Merc 280C" "Merc 450SE"
[13] "Merc 450SL" "Merc 450SLC"
[15] "Cadillac Fleetwood" "Lincoln Continental"
[17] "Chrysler Imperial" "Fiat 128"
[19] "Honda Civic" "Toyota Corolla"
[21] "Toyota Corona" "Dodge Challenger"
[23] "AMC Javelin" "Camaro Z28"
[25] "Pontiac Firebird" "Fiat X1-9"
[27] "Porsche 914-2" "Lotus Europa"
[29] "Ford Pantera L" "Ferrari Dino"
[31] "Maserati Bora" "Volvo 142E"

So here are the 32 row names of our data frame.

Accessing rows and columns in a data frame:

What if we want to access a row with the name “Maserati Bora” and see it’s values? It can be done in the following way.

> mtcars['Maserati Bora',]
mpg cyl disp hp drat wt qsec vs
Maserati Bora 15 8 301 335 3.54 3.57 14.6 0
am gear carb
Maserati Bora 1 5 8

Row can also be accessed with row index also. Try accessing 30th row and 33rd row.

Now, I want to access the column with the name ‘cyl’. By now, you must have got an idea of doing it.

> mtcars[,'cyl']
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8
[23] 8 8 8 4 4 4 8 6 8 4

The same column can also be accessed as follows

> mtcars[['cyl']]
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8
[23] 8 8 8 4 4 4 8 6 8 4

$ operator

It comes handy to us in many ways as it can also be used to access elements of a column. Let’s see how to use it

> mtcars$cyl
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8
[23] 8 8 8 4 4 4 8 6 8 4

head() and tail():

Now that we have got a little idea about this ‘mtcars’ data, let’s dive more into it using head() and tail() functions. head() function enables us to see the first 6 rows of the data frame with it’s header.

>head(mtcars)
mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440
Valiant 18.1 6 225 105 2.76 3.460
qsec vs am gear carb
Mazda RX4 16.46 0 1 4 4
Mazda RX4 Wag 17.02 0 1 4 4
Datsun 710 18.61 1 1 4 1
Hornet 4 Drive 19.44 1 0 3 1
Hornet Sportabout 17.02 0 0 3 2
Valiant 20.22 1 0 3 1

Similarly, there is a tail() function too which gives the last 6 records.

> tail(mtcars)
mpg cyl disp hp drat wt
Porsche 914-2 26.0 4 120.3 91 4.43 2.140
Lotus Europa 30.4 4 95.1 113 3.77 1.513
Ford Pantera L 15.8 8 351.0 264 4.22 3.170
Ferrari Dino 19.7 6 145.0 175 3.62 2.770
Maserati Bora 15.0 8 301.0 335 3.54 3.570
Volvo 142E 21.4 4 121.0 109 4.11 2.780
qsec vs am gear carb
Porsche 914-2 16.7 0 1 5 2
Lotus Europa 16.9 1 1 5 2
Ford Pantera L 14.5 0 1 5 4
Ferrari Dino 15.5 0 1 5 6
Maserati Bora 14.6 0 1 5 8
Volvo 142E 18.6 1 1 4 2

Summary():

How would it be if we can get maximum, minimum values of our columns right away without using any max(), min() functions ?

Let use summary() function to get such information. Just type summary(mtcars) in your R console. It seems so simple, right ?

> summary(mtcars)
mpg cyl disp
Min. :10.40 Min. :4.000 Min. : 71.1
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
Median :19.20 Median :6.000 Median :196.3
Mean :20.09 Mean :6.188 Mean :230.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
Max. :33.90 Max. :8.000 Max. :472.0
hp drat wt
Min. : 52.0 Min. :2.760 Min. :1.513
1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
Median :123.0 Median :3.695 Median :3.325
Mean :146.7 Mean :3.597 Mean :3.217
3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
Max. :335.0 Max. :4.930 Max. :5.424
qsec vs
Min. :14.50 Min. :0.0000
1st Qu.:16.89 1st Qu.:0.0000
Median :17.71 Median :0.0000
Mean :17.85 Mean :0.4375
3rd Qu.:18.90 3rd Qu.:1.0000
Max. :22.90 Max. :1.0000
am gear
Min. :0.0000 Min. :3.000
1st Qu.:0.0000 1st Qu.:3.000
Median :0.0000 Median :4.000
Mean :0.4062 Mean :3.688
3rd Qu.:1.0000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000
carb
Min. :1.000
1st Qu.:2.000
Median :2.000
Mean :2.812
3rd Qu.:4.000
Max. :8.000

It gives the minimum value, 1st quantile in normal distribution of the column values, median, mean and 3rd quantile value and maximum values thus enabling us to understand the range of values in each column. We will know about these statistical terms in next article. We can get certain small yet interesting inferences from this summary like

  1. All the columns are of type ‘numeric’
  2. They all lie almost in similar ranges. Range tells about the minimum and maximum values in the column. Here is an illustration.
> range(mtcars$mpg)
[1] 10.4 33.9

Slicing the data frame:

Slicing comes handy when we want to view only a certain part of the data frame. Slicing can be done using column names, row indices and column indices. Let’s see them all now

Using column names:

We will use c() function with the names of columns and join them together to slice the data frame. The resulting output is also a data frame.

>head(mtcars[c("mpg","cyl")])
mpg cyl
Mazda RX4 21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710 22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant 18.1 6

Using row indices:

We can slice a data frame containing only required rows with row indices. Here, I am trying to get the rows between 6th index and 9th index values (both included).

> mtcars[6:9,]
mpg cyl disp hp drat wt qsec vs
Valiant 18.1 6 225.0 105 2.76 3.46 20.22 1
Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0
Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1
Merc 230 22.8 4 140.8 95 3.92 3.15 22.90 1
am gear carb
Valiant 0 3 1
Duster 360 0 3 4
Merc 240D 0 4 2
Merc 230 0 4 2

Using column indices:

We can slice a data frame based on columns indices also as follows.

> head(mtcars[,6:9])
wt qsec vs am
Mazda RX4 2.620 16.46 0 1
Mazda RX4 Wag 2.875 17.02 0 1
Datsun 710 2.320 18.61 1 1
Hornet 4 Drive 3.215 19.44 1 0
Hornet Sportabout 3.440 17.02 0 0
Valiant 3.460 20.22 1 0

All these operations help us to know more about the data when we have a huge data set with many rows and columns.

Now that we have learnt about data, let’s end this article after having a glance at it.

>mtcars
mpg cyl disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
Duster 360 14.3 8 360.0 245 3.21
Merc 240D 24.4 4 146.7 62 3.69
Merc 230 22.8 4 140.8 95 3.92
Merc 280 19.2 6 167.6 123 3.92
Merc 280C 17.8 6 167.6 123 3.92
Merc 450SE 16.4 8 275.8 180 3.07
Merc 450SL 17.3 8 275.8 180 3.07
Merc 450SLC 15.2 8 275.8 180 3.07
Cadillac Fleetwood 10.4 8 472.0 205 2.93
Lincoln Continental 10.4 8 460.0 215 3.00
Chrysler Imperial 14.7 8 440.0 230 3.23
Fiat 128 32.4 4 78.7 66 4.08
Honda Civic 30.4 4 75.7 52 4.93
Toyota Corolla 33.9 4 71.1 65 4.22
Toyota Corona 21.5 4 120.1 97 3.70
Dodge Challenger 15.5 8 318.0 150 2.76
AMC Javelin 15.2 8 304.0 150 3.15
Camaro Z28 13.3 8 350.0 245 3.73
Pontiac Firebird 19.2 8 400.0 175 3.08
Fiat X1-9 27.3 4 79.0 66 4.08
Porsche 914-2 26.0 4 120.3 91 4.43
Lotus Europa 30.4 4 95.1 113 3.77
Ford Pantera L 15.8 8 351.0 264 4.22
Ferrari Dino 19.7 6 145.0 175 3.62
Maserati Bora 15.0 8 301.0 335 3.54
Volvo 142E 21.4 4 121.0 109 4.11
wt qsec vs am gear carb
Mazda RX4 2.620 16.46 0 1 4 4
Mazda RX4 Wag 2.875 17.02 0 1 4 4
Datsun 710 2.320 18.61 1 1 4 1
Hornet 4 Drive 3.215 19.44 1 0 3 1
Hornet Sportabout 3.440 17.02 0 0 3 2
Valiant 3.460 20.22 1 0 3 1
Duster 360 3.570 15.84 0 0 3 4
Merc 240D 3.190 20.00 1 0 4 2
Merc 230 3.150 22.90 1 0 4 2
Merc 280 3.440 18.30 1 0 4 4
Merc 280C 3.440 18.90 1 0 4 4
Merc 450SE 4.070 17.40 0 0 3 3
Merc 450SL 3.730 17.60 0 0 3 3
Merc 450SLC 3.780 18.00 0 0 3 3
Cadillac Fleetwood 5.250 17.98 0 0 3 4
Lincoln Continental 5.424 17.82 0 0 3 4
Chrysler Imperial 5.345 17.42 0 0 3 4
Fiat 128 2.200 19.47 1 1 4 1
Honda Civic 1.615 18.52 1 1 4 2
Toyota Corolla 1.835 19.90 1 1 4 1
Toyota Corona 2.465 20.01 1 0 3 1
Dodge Challenger 3.520 16.87 0 0 3 2
AMC Javelin 3.435 17.30 0 0 3 2
Camaro Z28 3.840 15.41 0 0 3 4
Pontiac Firebird 3.845 17.05 0 0 3 2
Fiat X1-9 1.935 18.90 1 1 4 1
Porsche 914-2 2.140 16.70 0 1 5 2
Lotus Europa 1.513 16.90 1 1 5 2
Ford Pantera L 3.170 14.50 0 1 5 4
Ferrari Dino 2.770 15.50 0 1 5 6
Maserati Bora 3.570 14.60 0 1 5 8
Volvo 142E 2.780 18.60 1 1 4 2

Hope you got some new insights about data frame reading this article. In the next article ,we will see about arithmetic operations on R objects and little statistical details. Let me know if you liked it by clapping and commenting in the response section!!!

--

--

SaiGayatri Vadali

An inquisitive Machine Learning Engineer, yoga trainer, fitness freak and a passionate writer!