R for Newbies: Explore the Iris dataset with R

data_datum
3 min readAug 30, 2018

--

In this short tutorial, I will show up the main functions you can run up to get a first glimpse of your dataset, in this case, the iris dataset.

The dataset

Particularly, this dataset is in R

data(iris)

If you want to take a glimpse at the first 4 lines of rows.

head(iris, 4)

Optionally you may want to visualize the last rows of your dataset

tail(iris)

The dimensions of the dataframe

dim(iris)

The names of the columns

names(iris)

The attributes of the dataframe

attributes(iris)

Finally, if you want the descriptive statistics summary

summary(iris)

Indexing the first 5 rows

iris[1:5,]

Indexing the first 4 columns

iris[,1:4]

If you want to explore the first 10 rows of a particular column, in this case, Sepal length

iris[1:10, "Sepal.Length"]

Basic Visualizations with Base R

The plot () function is the generic function for plotting R objects.

plot(iris2)
An exploratory plot array for iris dataset

Histogram is basically a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins. You can change the breaks also and see the effect it has data visualization in terms of understandability (1).

Histogram with hist() function

sepal_length<-iris2$sepal.length
hist(sepal_length)
Histogram of Sepal Length in iris2 dataset

If we add more information in the hist() function, we can change some default parameters.

hist(sepal_length, main="Histogram of Sepal Length", xlab="Sepal Length", xlim=c(4,8), col="blue", freq=FALSE)
Histogram of Sepal Lenght, with hist() function
sepal_width<-iris2$sepal.width
hist(sepal_width, main="Histogram of Sepal Width", xlab="Sepal Width", xlim=c(2,5), col="darkorchid", freq=FALSE)
Histogram with 20 breaks

In the following image we can observe how to change the default parameters, in the hist() function (2).

Box Plot shows 5 statistically significant numbers- the minimum, the 25th percentile, the median, the 75th percentile and the maximum. It is thus useful for visualizing the spread of the data is and deriving inferences accordingly (1).

Boxplots with boxplot() function. The boxplot() function takes in any number of numeric vectors, drawing a boxplot for each vector. You can also pass in a list (or data frame) with numeric vectors as its components (3).

irisVer <- subset(iris, Species == "versicolor")
irisSet <- subset(iris, Species == "setosa")
irisVir <- subset(iris, Species == "virginica")
par(mfrow=c(1,3),mar=c(6,3,2,1))
boxplot(irisVer[,1:4], main="Versicolor, Rainbow Palette",ylim = c(0,8),las=2, col=rainbow(4))
boxplot(irisSet[,1:4], main="Setosa, Heat color Palette",ylim = c(0,8),las=2, col=heat.colors(4))
boxplot(irisVir[,1:4], main="Virginica, Topo colors Palette",ylim = c(0,8),las=2, col=topo.colors(4))
Boxplot of three different colors palette. The R code of the graph can be consulted (4).

References

  1. Comprehensive guide to Data Visualization in R. http://bit.ly/2wnVjqY
  2. R Base Graphics. An idiot’s guide. http://bit.ly/2wqoV6L
  3. R Box Plot. http://bit.ly/2wnsRoY
  4. Exploratory Data Analysis Iris Dataset http://bit.ly/2wpZwu2

--

--