The Five Number Summary

Descriptive Statistics Using Box Plots


Hello!

Now that we’re done discussing what box plots are and what they tell us and how it can be done on paper, let me present to you how R could be used to replicate the graphs presented as visual aid given Life Expectancy Data of 197 countries.

https://d396qusza40orc.cloudfront.net/introstats/Data/LifeExpTable.txt

CHANGING THE WORKING DIRECTORY

  1. Using the GUI — (File -> Change Dir… -> *your directory*)
  2. Using R console commands — (setwd(*your directory*))

READING DATA FROM GIVEN FILE

data = read.table(“LifeExpTable.txt”)

This command retrieves data from our LifeExpTable.txt and places it inside variable ‘data’. Now, whenever we type ‘data’ the records of the life expectancies are produced.

ASSIGNING COLUMNS TO VARIABLES

lifeexp = data[,2]

in this case whatever number you place right after the comma symbol denotes the number of the column you would want placed in variable lifeexp

CREATING A SIMPLE SCATTER PLOT DIAGRAM

plot(lifeexp)
A simple scatter plot diagram of the given Life Expectancy Data of 197 countries

CHANGING THE LABELS OF A SCATTER PLOT DIAGRAM

It’s good practice to provide labels so that scatter plots, though raw, still present some sense.

plot(lifeexp, xlab=”Country”, ylab=”Life Expectancy”)
A simple scatter plot diagram of the given Life Expectancy Data of 197 countries with appropriate labels

In addition to labels, we could also specify limits to cater to your preferences in presentation.

plot(lifeexp, xlab=”Country”, ylab=”Life Expectancy”, ylim=c(0,86))
A simple scatter plot diagram of the given Life Expectancy Data of 197 countries with appropriate labels and customized x and y limits

SORTING SCATTER PLOTS

To sort scatter plots, we may use the sort() command on the variable holding our life expectancy values. In this case, lifeexp

plot(sort(lifeexp), xlab=”Country”, ylab=”Life Expectancy”, ylim=c(0,86))
A simple SORTED scatter plot diagram of the given Life Expectancy Data of 197 countries with appropriate labels and customized x and y limits

CREATING BOXPLOTS

All those tedious calculations and processes we’ve done on paper to produce boxplots could be accomplished using this single R command

boxplot(lifeexp, ylab=”Life Expectancy”, ylim=c(0,86))
A boxplot image of the life expectancy data of 197 countries

PRODUCING SUMMARY STATISTICS

summary(lifeexp)

-

 Min.   1st Qu.  Median  Mean   3rd Qu.  Max. 
47.79 64.67 73.24 69.86 76.65 83.39

THE COMBINE — c() FUNCTION

the combine function allows us to assigned manually some values to variables. These values could also be processed as if they were elements of plots.

grades = c(78, 68, 69, 88, 90, 74, 87, 76, 93)
sort(grades)
summary(grades)

-

boxplot(grades, ylab=”Grades”, ylim=c(60,100))
A boxplot graph presenting of the grades given to 9 students

LINES IN BETWEEN SORTED SCATTER PLOTS

Scatter plots look extremely nice when all of them are sorted and each of the plots are connected via lines.

six_grades = c(68, 84, 90, 74, 78, 93)
sort(six_grades)
summary(six_grades)

-

plot(sort(six_grades), type=”b”, xlab=”Students”, ylab=”Grade”, ylim=c(60,100))
A sorted scatter plot of six grades connected by a line

SAVING GRAPHS AS IMAGES/BITMAPS/PDFS

png(filename="your/file/location/name.png")
plot(data)
dev.off()

Notice that after executing the command the plot command, no image gets returned. This is because the png function directs all graph output to the standard image device allowing us to save the image to the disk. NB: the device remains open until instructed to close — for graphs therefore to be printed in the R Studio environment again, the standard image device must be turned off. Hence, dev.off()


Now try a few of these examples yourself and see how easy it is to create beautiful graphs in R. ☺

be sure to give this article a recommend if you found it helpful. Thanks!