Data Science for Biodiversity Conservation: A basic project

2 min readJan 24, 2018

The development of cutting edge technologies in the fields of data science and machine learning has coincided with the largest biodiversity crisis of our times. It is estimated that ~10 000 species go extinct every year. This increases the importance of further application of those technologies not just in business, but also in biodiversity conservation.

In this article I will show you a tiny example how computational methods can be applied to an ecological problem. We will have a look at the dunes dataset, which contains data about vegetation on the Dutch island of Terschelling.

The data were collected in 1983, and published in the following form.

You can observe that the abundances of 30 different plant species were recorded across 12 different sites. Just eye-balling the data would probably yield no significant insights into underlying patterns, so we turn to several visual methods.

First we will try to plot a rarefaction curve. This plot shows how many species are discovered with increased sampling. If sampling is sufficient, those curves should eventually plateau (when it becomes harder to discover new species, regardless of how much sampling is done). On the other hand, if the curves are steep probably more effort is needed before we derive sound ecological conclusions from these data.

Probably the most important question that we might have regarding such data is what is the ecological diversity per site? Measuring ecological diversity (biodiversity) is a whole field in itself, and there is a multitude a methods available (each with their advantages / disadvantages), so here we will just focus on one. We will try to compute the so-called Shannon Index for every sampling site. This is a widely used index, and is also relatively easy to explain. It takes into account both species abundance and evenness. Let’s plot the results!

Differences in species diversity between sampling sites

For now we can observe that the sites are quite similar (at least in our eyes) in terms of species diversity. The one notable exception is Site 1.

In a future article we will go deeper and start using more advanced techniques on this same dataset, code included!

Data Science for Biodiversity Conservation: A basic project

Written by Boyan Angelov