# Understanding species co-occurrence

## With calculations in R

In ecology, co-occurrence networks can help us identify relationships between species using repeated measurements of the species’ presence or absence. When evaluating potential relationships, we might ask: Given presence-absence data, are two species co-occurring at a frequency higher or lower than expected by chance? Somewhat surprisingly, although co-occurrence analysis has been around since the ’70s, there’s no universally agreed upon method for measuring co-occurrence and testing its statistical significance (Veech, 2012). In this post, we’re going to examine the probabilistic model as seen in Veech’s *A probabilistic model for analysing species co‐occurrence*(2012)*.* We’ll start by defining the model before moving into calculating co-occurrence probabilities in R. To view this article with proper subscripts, go to the original article.

# Defining the probabilistic model of co-occurrence

## Overview

In order to understand whether two species co-occur at a frequency greater than or less than expected, we first need to know the probability of two species co-occurring at a given number of sites. This will depend on the number of sites sampled (*N*) and the number of sites each species inhabits (*N1* and *N2*). Using this information, we can determine *pj*, the probability that 2 species co-occur at exactly *j* sites, for *j* = 0…*N*. To calculate *pj*, we’ll count the number of ways species 1 and 2 can be arranged among *N* sites while co-occurring at *j* sites and divide that by the total number of ways species 1 and 2 can be arranged among *N* sites (Eq. 1).

## The math behind pj

The numerator of can be calculated by multiplying 1) the number of ways *j* sites can be arranged among N sites, by 2) the number of ways species 2 can be arranged in the remaining sites that don’t have both species, by 3) the number of ways species 1 can be arranged among sites that don’t have species 2. The denominator can be calculated by multiplying the number of ways species 2 can be arranged by the number of ways species 1 can be arranged (Eq. 2).

There are limitations to the number of sites two species can co-occur at. Let’s say we sample 10 sites. Species 1 is found in 7 sites and species 2 is found in 5 sites. If you were to randomly place species 1 in 7 sites, you’d have 3 sites empty sites leftover. Since species 2 is found in 5 sites, the two species have to co-occur at a minimum of 2 sites. Thus, max{0, *N1 + N2 - N* } ≤ *j*. Additionally, *j* can’t exceed the number of sites the species with the lowest presence inhabits. For instance, species 1 and 2 can’t co-occur at 5 sites if species 1 is only present at 2 sites. Therefore, max{0, *N1 + N2 - N* } ≤ *j* ≤ min{*N1, N2 *}. If *j* doesn’t meet these criteria, then *pj* = 0.

# Calculating pj: an example

Let’s say we’re interested in the co-occurrence of two different bird species across 4 different sampling sites. Both species 1 and species 2 are present at exactly 2 sites. What’s the probability species 1 and 2 are found together at exactly one site? In other words, what’s *p1*?

## Breaking down the numerator

We’ll start by looking at the numerator of *pj*. We can see from the Fig. 1 there are 4 different ways the single co-occurrence could be arranged among the 4 sites. For each unique way of placing the co-occurrence, there are three sites where species 1 and 2 don’t co-occur. That means, there are three sites (*N- j*) where we can arrange species 2. Since species 2 is only found in two sites, we only need to place species 2 in one more site (*N2- j*). That gives us three different ways of placing species 2 in one of the three remaining sites.

Now, we have two sites leftover that don’t have species 2 (*N - N2*). Again, we only need to place species 1 at one site (*N1- j*) and there are two ways to place species 1 among two sites. Multiplying these all together, we get 4 * 3 * 2 = 24 ways species 1 and 2 can co-occur at 1 of the 4 sites given they are each found in two sites.

## Breaking down the denominator

The denominator is a bit more straightforward (Fig. 2). There are six different ways of arranging species 2 across 4 sites (see picture below). Since this is the same for species 1, this gives us 6 * 6 for the denominator. Altogether, = 24/36 ≈ 0.67.

# Calculating p1 in R:

Once we’ve defined and *j*, we can use the **choose()** function to evaluate Eq. 2 in R.

# Define the number of sites.

N = 4 # Define the number of sites occupied by species 1.

n1 = 2 # Define the number of sites occupied by species 2.

n2 = 2 # Number of sites species 1 and 2 co-occur at.

j = 1 # Probability that species 1 and 2 occur at exactly 1 site. choose(N, j) * choose(N - j, n2 - j) * choose(N - n2, n1 - j)/ (choose(N, n2) * choose(N, n1))

# Using pj to assess significance

Assessing the statistical significance of an observed co-occurrence relies on the fact that ∑*pj* = 1 for *j* = max {0, *N1 + N2 - N* } to min{*N1, N2*}. Let’s say *Qobs* represents the observed co-occurrence. To assess whether or not two species co-occur less than expected, we’ll want to know the probability of seeing them co-occur *at least* *Qobs* times, ∑*pj* for *j* = max {0, *N1 + N2 - N* } to *Qobs*. If this probability is less than our significance level, say 0.05, then the two species co-occur significantly less than expected by chance.

On the other hand, if two species co-occur at a frequency greater than expected, then the probability of seeing them co-occur *Qobs* times or more will be less than the significance level, ∑*pj* for *j* = *Qobs* to min{*N1, N2*}. To find the expected co-occurrence, we can take the weighted sum of each *j* with *pj* as the weights. Mathematically, this is ∑( *pj* × *j* ) for *j* = max {0, *N1 + N2 - N*} to min{*N1, N2*}.

# Assessing species co-occurrence significance: an example

Imagine we’ve sampled 30 sites and found two lizard species co-occur at 6 sites. Species 1 is present at 10 sites and species 2 at 25 sites. Do these species occur more or less frequently than expected by chance?

To answer this question, we can use our code from above with a few modifications:

# Define the number of sites.

N = 30 # Define the number of sites occupied by species 1.

n1 = 10 # Define the number of sites occupied by species 2.

n2 = 25 # Number of sites species 1 and 2 co-occur at.

j = max(0, n1 + n2 - N):min(n1, n2) # Probability that species 1 and 2 occur at exactly j sites.

pj = choose(N, j) * choose(N - j, n2 - j) * choose(N - n2, n1 - j)/ (choose(N, n2) * choose(N, n1)) # Show table for j, pj, and the cumulative distribution. round(data.frame(j, pj, sumPj = cumsum(pj)), 4)

The probability of the two lizard species randomly co-occurring at 6 sites or less is 0.0312 (*p5 + p6*). Assuming a significance level of 0.05, we can conclude the two lizard species occur less frequently than expected by chance. On the other hand, the probability of the two lizard species co-occurring at 6 sites or more is 0.9982 (*p6 + p7 + p8 + p9 + p10* or 1 - p5). Additionally, the expected co-occurrence is 8 sites.

`# Expected number of co-occurrence. `

sum(pj * j)

# Using the probabilistic model of co-occurrence in practice

Now that we’ve worked through understanding the probabilistic model of co-occurrence for two species, how can we extend this to multiple pairs of species? Luckily, the R package ‘ *cooccur*’ can do this for us. Check out our blog post to see how to create co-occurrence networks using ‘ *cooccur*’ and ‘*visNetwork’*. Hopefully, you’ll now have a good understanding of how they calculate probabilities of species co-occurrence and can replicate their results if you desire. As always, happy networking!

## Citations

Veech, J. A. (2012). A probabilistic model for analysing species co-occurrence. Global Ecology and Biogeography, 22(2), 252–260. doi:10.1111/j.1466–8238.2012.00789.x

*Originally published at **https://thatdarndata.com** on November 1, 2020.*