R for Newbies: Explore the Iris dataset with R
In this short tutorial, I will show up the main functions you can run up to get a first glimpse of your dataset, in this case, the iris dataset.
The dataset
Particularly, this dataset is in R
data(iris)
If you want to take a glimpse at the first 4 lines of rows.
head(iris, 4)
Optionally you may want to visualize the last rows of your dataset
tail(iris)
The dimensions of the dataframe
dim(iris)
The names of the columns
names(iris)
The attributes of the dataframe
attributes(iris)
Finally, if you want the descriptive statistics summary
summary(iris)
Indexing the first 5 rows
iris[1:5,]
Indexing the first 4 columns
iris[,1:4]
If you want to explore the first 10 rows of a particular column, in this case, Sepal length
iris[1:10, "Sepal.Length"]
Basic Visualizations with Base R
The plot () function is the generic function for plotting R objects.
plot(iris2)
Histogram is basically a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins. You can change the breaks also and see the effect it has data visualization in terms of understandability (1).
Histogram with hist() function
sepal_length<-iris2$sepal.length
hist(sepal_length)
If we add more information in the hist() function, we can change some default parameters.
hist(sepal_length, main="Histogram of Sepal Length", xlab="Sepal Length", xlim=c(4,8), col="blue", freq=FALSE)
sepal_width<-iris2$sepal.width
hist(sepal_width, main="Histogram of Sepal Width", xlab="Sepal Width", xlim=c(2,5), col="darkorchid", freq=FALSE)
In the following image we can observe how to change the default parameters, in the hist() function (2).
Box Plot shows 5 statistically significant numbers- the minimum, the 25th percentile, the median, the 75th percentile and the maximum. It is thus useful for visualizing the spread of the data is and deriving inferences accordingly (1).
Boxplots with boxplot() function. The boxplot()
function takes in any number of numeric vectors, drawing a boxplot for each vector. You can also pass in a list (or data frame) with numeric vectors as its components (3).
irisVer <- subset(iris, Species == "versicolor")
irisSet <- subset(iris, Species == "setosa")
irisVir <- subset(iris, Species == "virginica")
par(mfrow=c(1,3),mar=c(6,3,2,1))
boxplot(irisVer[,1:4], main="Versicolor, Rainbow Palette",ylim = c(0,8),las=2, col=rainbow(4))
boxplot(irisSet[,1:4], main="Setosa, Heat color Palette",ylim = c(0,8),las=2, col=heat.colors(4))
boxplot(irisVir[,1:4], main="Virginica, Topo colors Palette",ylim = c(0,8),las=2, col=topo.colors(4))
References
- Comprehensive guide to Data Visualization in R. http://bit.ly/2wnVjqY
- R Base Graphics. An idiot’s guide. http://bit.ly/2wqoV6L
- R Box Plot. http://bit.ly/2wnsRoY
- Exploratory Data Analysis Iris Dataset http://bit.ly/2wpZwu2