# Understanding species co-occurrence

## With calculations in R

In ecology, co-occurrence networks can help us identify relationships between species using repeated measurements of the species’ presence or absence. When evaluating potential relationships, we might ask: Given presence-absence data, are two species co-occurring at a frequency higher or lower than expected by chance? Somewhat surprisingly, although co-occurrence analysis has been around since the ’70s, there’s no universally agreed upon method for measuring co-occurrence and testing its statistical significance (Veech, 2012). In this post, we’re going to examine the probabilistic model as seen in Veech’s A probabilistic model for analysing species co‐occurrence(2012). We’ll start by defining the model before moving into calculating co-occurrence probabilities in R. To view this article with proper subscripts, go to the original article. Photo by Wynand van Poortvliet on Unsplash.

# Defining the probabilistic model of co-occurrence

## Overview

In order to understand whether two species co-occur at a frequency greater than or less than expected, we first need to know the probability of two species co-occurring at a given number of sites. This will depend on the number of sites sampled (N) and the number of sites each species inhabits (N1 and N2). Using this information, we can determine pj, the probability that 2 species co-occur at exactly j sites, for j = 0…N. To calculate pj, we’ll count the number of ways species 1 and 2 can be arranged among N sites while co-occurring at j sites and divide that by the total number of ways species 1 and 2 can be arranged among N sites (Eq. 1).

## The math behind pj

The numerator of can be calculated by multiplying 1) the number of ways j sites can be arranged among N sites, by 2) the number of ways species 2 can be arranged in the remaining sites that don’t have both species, by 3) the number of ways species 1 can be arranged among sites that don’t have species 2. The denominator can be calculated by multiplying the number of ways species 2 can be arranged by the number of ways species 1 can be arranged (Eq. 2).

There are limitations to the number of sites two species can co-occur at. Let’s say we sample 10 sites. Species 1 is found in 7 sites and species 2 is found in 5 sites. If you were to randomly place species 1 in 7 sites, you’d have 3 sites empty sites leftover. Since species 2 is found in 5 sites, the two species have to co-occur at a minimum of 2 sites. Thus, max{0, N1 + N2 - N } ≤ j. Additionally, j can’t exceed the number of sites the species with the lowest presence inhabits. For instance, species 1 and 2 can’t co-occur at 5 sites if species 1 is only present at 2 sites. Therefore, max{0, N1 + N2 - N } ≤ j ≤ min{N1, N2 }. If j doesn’t meet these criteria, then pj = 0.

# Calculating pj: an example

Let’s say we’re interested in the co-occurrence of two different bird species across 4 different sampling sites. Both species 1 and species 2 are present at exactly 2 sites. What’s the probability species 1 and 2 are found together at exactly one site? In other words, what’s p1?

## Breaking down the numerator

We’ll start by looking at the numerator of pj. We can see from the Fig. 1 there are 4 different ways the single co-occurrence could be arranged among the 4 sites. For each unique way of placing the co-occurrence, there are three sites where species 1 and 2 don’t co-occur. That means, there are three sites (N- j) where we can arrange species 2. Since species 2 is only found in two sites, we only need to place species 2 in one more site (N2- j). That gives us three different ways of placing species 2 in one of the three remaining sites.

Now, we have two sites leftover that don’t have species 2 (N - N2). Again, we only need to place species 1 at one site (N1- j) and there are two ways to place species 1 among two sites. Multiplying these all together, we get 4 * 3 * 2 = 24 ways species 1 and 2 can co-occur at 1 of the 4 sites given they are each found in two sites.

## Breaking down the denominator

The denominator is a bit more straightforward (Fig. 2). There are six different ways of arranging species 2 across 4 sites (see picture below). Since this is the same for species 1, this gives us 6 * 6 for the denominator. Altogether, = 24/36 ≈ 0.67.

# Calculating p1 in R:

Once we’ve defined and j, we can use the choose() function to evaluate Eq. 2 in R.

`# Define the number of sites. N = 4 # Define the number of sites occupied by species 1. n1 = 2 # Define the number of sites occupied by species 2. n2 = 2 # Number of sites species 1 and 2 co-occur at. j = 1 # Probability that species 1 and 2 occur at exactly 1 site. choose(N, j) * choose(N - j, n2 - j) * choose(N - n2, n1 - j)/ (choose(N, n2) * choose(N, n1))`

# Using pj to assess significance

Assessing the statistical significance of an observed co-occurrence relies on the fact that ∑pj = 1 for j = max {0, N1 + N2 - N } to min{N1, N2}. Let’s say Qobs represents the observed co-occurrence. To assess whether or not two species co-occur less than expected, we’ll want to know the probability of seeing them co-occur at least Qobs times, ∑pj for j = max {0, N1 + N2 - N } to Qobs. If this probability is less than our significance level, say 0.05, then the two species co-occur significantly less than expected by chance.

On the other hand, if two species co-occur at a frequency greater than expected, then the probability of seeing them co-occur Qobs times or more will be less than the significance level, ∑pj for j = Qobs to min{N1, N2}. To find the expected co-occurrence, we can take the weighted sum of each j with pj as the weights. Mathematically, this is ∑( pj × j ) for j = max {0, N1 + N2 - N} to min{N1, N2}.

# Assessing species co-occurrence significance: an example

Imagine we’ve sampled 30 sites and found two lizard species co-occur at 6 sites. Species 1 is present at 10 sites and species 2 at 25 sites. Do these species occur more or less frequently than expected by chance?

To answer this question, we can use our code from above with a few modifications:

`# Define the number of sites. N = 30 # Define the number of sites occupied by species 1. n1 = 10 # Define the number of sites occupied by species 2. n2 = 25 # Number of sites species 1 and 2 co-occur at. j = max(0, n1 + n2 - N):min(n1, n2) # Probability that species 1 and 2 occur at exactly j sites. pj = choose(N, j) * choose(N - j, n2 - j) * choose(N - n2, n1 - j)/ (choose(N, n2) * choose(N, n1)) # Show table for j, pj, and the cumulative distribution. round(data.frame(j, pj, sumPj = cumsum(pj)), 4)`

The probability of the two lizard species randomly co-occurring at 6 sites or less is 0.0312 (p5 + p6). Assuming a significance level of 0.05, we can conclude the two lizard species occur less frequently than expected by chance. On the other hand, the probability of the two lizard species co-occurring at 6 sites or more is 0.9982 (p6 + p7 + p8 + p9 + p10 or 1 - p5). Additionally, the expected co-occurrence is 8 sites.

`# Expected number of co-occurrence.  sum(pj * j)`

# Using the probabilistic model of co-occurrence in practice

Now that we’ve worked through understanding the probabilistic model of co-occurrence for two species, how can we extend this to multiple pairs of species? Luckily, the R package ‘ cooccur’ can do this for us. Check out our blog post to see how to create co-occurrence networks using ‘ cooccur’ and ‘visNetwork’. Hopefully, you’ll now have a good understanding of how they calculate probabilities of species co-occurrence and can replicate their results if you desire. As always, happy networking!

## Citations

Veech, J. A. (2012). A probabilistic model for analysing species co-occurrence. Global Ecology and Biogeography, 22(2), 252–260. doi:10.1111/j.1466–8238.2012.00789.x

Originally published at https://thatdarndata.com on November 1, 2020.

--

--