Analyzing and Visualizing Ecological Data in R

Rajesh Sigdel
The Startup
Published in
5 min readJun 26, 2020
Amolops formosus, a herpetofauna species found in the Annapurna Conservation Area, Nepal (Photo Credit: Bivek Gautam)

Introduction

As a geographer, I am interested in analyzing spatial data to find patterns in nature. I primarily use ArcGIS, a Geographic Information System (GIS) platform, to analyze data and create maps. Recently, my friend and his team reached out to me to help them create a map of the study site where they conducted their herpetofaunal research. They conducted their research in the Annapurna Conservation Area of Nepal, an area I am familiar with.

Their research was published in a peer-reviewed publication, and my cartographic contribution was recognized in the acknowledgment section of the article.

In the article, my friend and his colleagues used a Gantt chart to show the elevational range of the species in their study site. Traditionally, Gannt charts were used in project scheduling, but today they can be used in a number of applications including in the visualization of the ecological ranges of organisms.

The use of this graphic intrigued me. I was curious about how the chart was made. My friend revealed to me that he and his team created the chart in Microsoft Excel, and also spent several hours manually wrangling the data to make it into the right format to create the visualization. My opinion on this is as follows:

Excel is a great tool for creating basic visualiztions. However, there is no need to spend hours manually wrangling the data to put it into the right format using Excel. R not only helps you do the same work in minutes, but the work can easily be reproducible.

At that time, my friend was only aware that I use ArcMap regularly to create maps. He did not know that I am very passionate about R and am a big fan of the Tidyverse package for data wrangling and manipulation.

Over the years, many ecologists have reached out to me to help them learn how to use R in ecological data manipulation and visualization. This blog is dedicated to these people. My goal is to help ecologists learn to use R to wrangle data, create graphs, and simplify complex visualizations like the Gantt chart using real datasets collected from fieldwork.

In this blog, we will simplify the Gantt chart creation process in R using a real-world dataset. The following sections will guide you step by step through this process.

Importing the data

The first step is to import the dataset into the R environment.

setwd("D:/Rfiles/Folder/ElevationGraph") # Sets up working directory
library(readxl)
dataset <- read_excel("ExcelSheet.xlsx")

We will use the tidyverse package in this example to wrangle data and create graphs. Note that Tidyverse is a collection of packages such as dplyr, and ggplot2. We can install this package using the following code:

install.packages("tidyverse")
library(tidyverse)

After you install it, type dim(dataset) in your script to find the dimension of the data. You will see [1] 140 26 printed on your console. 140 represents the number of rows and 26 represents the number of columns in the data frame.

Peeking at the dataset

Type names(dataset) to print all the column names in R. Similarly, type View(dataset) to see the entire dataset. You need to see what format the data is stored in.

Only a subset of the data is shown here.

My friend and his colleagues stored their data in a sparse matrix. Most of the elements are zero in sparse matrices. If a species is not found in a specific elevation, the cell value is recorded as zero. Sparse matrices are a common way to store data, but we need to convert the data into tidy format. Hadley Wickham describes tidy data as:

  • Each variable forms a column
  • Each observation forms a row, and
  • Each type of observational unit forms a table

How to get data into tidy format

We can use the gather function to make tidy data. ggplot2 works very well with tidy data.

a <- dataset %>%
gather(key = Species, value,
`Duttaphrynus melanostictus`: `Oligodon erythrogaster`)%>%
mutate(SpeciesNo = ifelse (value > 0, 1, 0))
b <- na_if (a, 0) %>%
drop_na()

We first listed all the species and put them into a single column. We named this new column “species”. After this, we converted the number of species observed at each elevation into either 1 or 0 depending upon the presence or absence of a species using the ifelse function. We then dropped the empty rows from the data frame. Don’t forget to write View(a) or View(b) to see the results of the code.

Let’s view what the tidy dataset looks like. Below is only a subset of the dataset, using the code referenced at the beginning of this section.

╔══════╦════════════════════════════╦═══════╗
║ Ele ║ Species ║ value ║
╠══════╬════════════════════════════╬═══════╣
║ 1060 ║ Euphlyctis cyanophlyctis ║ 70 ║
║ 1811 ║ Fejervarya nepalensis ║ 50 ║
║ 1096 ║ Euphlyctis cyanophlyctis ║ 50 ║
║ 1981 ║ Duttaphrynus himalayanus ║ 30 ║
║ 1142 ║ Duttaphrynus melanostictus ║ 25 ║
║ 1131 ║ Fejervarya nepalensis ║ 25 ║
║ 1064 ║ Zakerana syhadrensis ║ 15 ║
║ 1064 ║ Euphlyctis cyanophlyctis ║ 15 ║
║ 2264 ║ Laudakia tuberculata ║ 13 ║
║ 1080 ║ Fejervarya nepalensis ║ 11 ║
╚══════╩════════════════════════════╩═══════╝

Elevational Graph

Now, we can finally plot the graph. Yay!!! 😂 We will use the ggplot2 data visualization package to do this. It uses the Grammar of Graphics principle, which breaks up graphs into its semantic components. The end result is a beautiful visualization, that has a high utility in ecological research. Our final Gantt chart shows the ecological range of each individual species in the Annapurna Conservation Area of Nepal.

ggplot(b, aes(Ele, Species, group = Species)) + 
geom_line(size = 1.5) +
labs(x="Elevation", y=NULL, title=" Elevational ranges of herpetofaunal species")+
scale_x_continuous(breaks = seq(0, 3500, 100))+
theme_bw()+
theme( axis.text.y = element_text(face = 'italic'))# Italic name

Are you interested in learning every nuance of this data analysis? Check out this Rpubs website.

Enjoy 😃 and don’t forget to check out the awesome article my friend and his team wrote.

Gautam, B., Chalise, M., Thapa, K., & Bhattarai, S. (2020). Distributional Patterns of Amphibians and Reptiles in Ghandruk. Annapurna Conservation Area, Nepal. IRCF Reptiles and Amphibians, 27, 48–49.

--

--

Rajesh Sigdel
The Startup

Research Assistant and Doctoral Candidate at the University of North Carolina at Greensboro