CCA plot using ggplot2

Microbiome Series

Saurav Das
6 min readFeb 18, 2023

Little Background
Canonical Correspondence Analysis (CCA) is a multivariate statistical technique used to explore relationships between two sets of variables, typically species abundance data and environmental factors, in ecological studies. It is an extension of correspondence analysis (CA), which is used to analyze relationships in contingency tables or presence-absence data. CCA was introduced by Cajo J.F. ter Braak in 1986 and has since been widely applied in ecology and other fields.

CCA aims to identify the underlying environmental gradients that structure species compositions in communities, as well as the extent to which these gradients explain variation in species abundance or occurrence. By analyzing the associations between species and environmental variables, CCA can help researchers understand the ecological processes that drive community patterns.

The main steps in CCA are:

  1. Data preparation: Organize the species abundance data in a species-by-samples matrix and the environmental data in a samples-by-environmental variables matrix. Ensure that both matrices have the same number of samples.
  2. Data transformation: Standardize the environmental data and, if needed, transform the species abundance data to meet statistical assumptions (e.g., log or square root transformation to reduce the influence of highly abundant species).
  3. Perform CCA: Carry out the canonical correspondence analysis using specialized software (e.g., R, CANOCO, or PC-ORD). The analysis will generate canonical axes, which are linear combinations of the environmental variables that best explain the variation in species data.
  4. Interpretation: Examine the CCA output, including biplots, eigenvalues, and species-environment correlations, to interpret the relationships between species and environmental gradients. This may involve identifying key environmental factors that drive community composition or determining which species are most sensitive to particular environmental conditions.
  5. Statistical testing: Perform permutation tests or other significance tests to determine whether the observed relationships between species and environmental variables are significant or could be attributed to random chance.

CCA has some limitations, such as its sensitivity to outliers and the assumption of a linear relationship between species and environmental variables. Nevertheless, it remains a valuable tool for understanding species-environment relationships in ecological studies.

``` r
library(vegan)
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.6-4
library(ggplot2)
library(ggrepel)

#data
data("varechem")
data("varespec")

#extracting species and env-data
species_data <- varespec[, 1:18]
env_data <- varechem[, 2:7]

#doing the CCA
cca_result <- cca(species_data, env_data)

#sumary of the data
summary(cca_result)
#>
#> Call:
#> cca(X = species_data, Y = env_data)
#>
#> Partitioning of scaled Chi-square:
#> Inertia Proportion
#> Total 1.6122 1.0000
#> Constrained 0.8179 0.5073
#> Unconstrained 0.7943 0.4927
#>
#> Eigenvalues, and their contribution to the scaled Chi-square
#>
#> Importance of components:
#> CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CA1
#> Eigenvalue 0.3320 0.2201 0.13891 0.09574 0.02372 0.007491 0.2351
#> Proportion Explained 0.2059 0.1365 0.08616 0.05939 0.01471 0.004646 0.1458
#> Cumulative Proportion 0.2059 0.3424 0.42860 0.48799 0.50270 0.507346 0.6532
#> CA2 CA3 CA4 CA5 CA6 CA7 CA8
#> Eigenvalue 0.15887 0.10887 0.09654 0.05552 0.04798 0.02920 0.01855
#> Proportion Explained 0.09854 0.06753 0.05988 0.03444 0.02976 0.01811 0.01151
#> Cumulative Proportion 0.75171 0.81924 0.87912 0.91355 0.94332 0.96143 0.97294
#> CA9 CA10 CA11 CA12 CA13 CA14
#> Eigenvalue 0.013709 0.013228 0.009530 0.003745 0.002662 0.0005775
#> Proportion Explained 0.008503 0.008205 0.005911 0.002323 0.001651 0.0003582
#> Cumulative Proportion 0.981442 0.989647 0.995558 0.997881 0.999532 0.9998906
#> CA15 CA16 CA17
#> Eigenvalue 1.437e-04 3.222e-05 4.255e-07
#> Proportion Explained 8.915e-05 1.999e-05 2.639e-07
#> Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00
#>
#> Accumulated constrained eigenvalues
#> Importance of components:
#> CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
#> Eigenvalue 0.3320 0.2201 0.1389 0.09574 0.02372 0.007491
#> Proportion Explained 0.4059 0.2691 0.1698 0.11705 0.02900 0.009158
#> Cumulative Proportion 0.4059 0.6750 0.8448 0.96184 0.99084 1.000000
#>
#> Scaling 2 for species and site scores
#> * Species are scaled proportional to eigenvalues
#> * Sites are unscaled: weighted dispersion equal on all dimensions
#>
#>
#> Species scores
#>
#> CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
#> Callvulg 2.57168 0.15673 -0.63802 -0.06562 0.067032 0.01129
#> Empenigr -0.06935 -0.40579 0.04932 0.11667 -0.035967 0.06627
#> Rhodtome -0.07832 0.12504 0.76681 0.16541 0.757904 -0.25609
#> Vaccmyrt -0.17812 0.56869 0.40013 0.46643 0.251211 0.03065
#> Vaccviti -0.09778 -0.43264 -0.08910 -0.03392 -0.102486 -0.03499
#> Pinusylv 0.19834 -0.59590 -0.10517 -0.11337 -0.193911 -0.76024
#> Descflex -0.13505 0.91983 0.26095 0.14060 0.345271 -0.25979
#> Betupube -0.26527 -0.63923 1.13716 0.42468 1.008634 -0.20317
#> Vacculig -0.08587 -1.41072 -0.48424 1.66750 0.335998 0.03952
#> Diphcomp -0.44456 -0.91255 -0.12334 0.71868 0.147851 0.54828
#> Dicrsp -0.98043 -0.09717 -0.93823 -0.86742 0.293345 0.09486
#> Dicrfusc 0.38345 -0.12889 0.82509 -0.34473 0.028287 0.04902
#> Dicrpoly -0.48517 -0.48492 -0.13767 -0.09379 0.733020 -0.13245
#> Hylosple -0.35881 1.49975 -0.16731 0.56841 -0.483065 0.14075
#> Pleuschr -0.13764 0.43159 -0.08553 0.01777 -0.007886 -0.01821
#> Polypili -0.25346 -0.73932 0.09453 0.46744 -0.252424 -1.69510
#> Polyjuni -0.58033 -0.18238 0.40419 -0.08637 -0.271425 -0.09106
#> Polycomm -0.29703 -0.10609 0.43704 0.18878 0.826908 -0.09553
#>
#>
#> Site scores (weighted averages of species scores)
#>
#> CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
#> 18 -0.17042 -1.666843 -0.26027 1.1400710 -1.47655 4.0884
#> 15 -0.04774 0.702898 0.48115 -0.6126208 -0.72913 -1.4145
#> 24 -1.20972 0.276711 -2.55681 -2.8692413 3.61001 1.4906
#> 27 -0.36039 1.090487 0.05969 0.9617973 -0.25794 -1.0185
#> 23 -0.28438 -0.584187 0.06148 -0.0575345 -2.41279 -0.9309
#> 19 -0.38841 0.198202 -0.05343 0.4601034 -1.13593 -1.0576
#> 22 0.61633 0.064282 2.04144 -0.6458030 1.46691 1.2255
#> 16 0.76479 0.117610 1.86114 -1.3063274 0.75610 2.8142
#> 28 -0.48603 2.152827 -0.03161 1.4699053 -0.33984 0.4195
#> 13 4.64810 0.315734 -2.54115 -0.5693811 1.19388 0.5721
#> 14 1.04014 -0.289410 1.31135 -1.0750813 -0.36921 2.1125
#> 20 -0.33710 -0.221132 -0.33614 0.1862634 -2.11688 -0.4192
#> 25 -0.73728 0.054260 -0.40348 -2.0407617 0.61073 1.3170
#> 7 -0.26430 -3.475174 -1.25049 6.3683987 3.17746 3.0412
#> 5 -0.45937 -1.734073 0.42223 -0.3661606 -5.05274 -15.3930
#> 6 -0.06295 -1.804235 -0.04961 -0.0007979 -3.07454 -0.5771
#> 3 -0.21333 -2.604706 -0.42757 3.0733850 0.23372 5.5640
#> 4 3.14276 0.006144 -2.02943 -0.2827821 -0.07181 -1.1078
#> 2 -0.23381 -1.737290 -0.13906 0.4417230 -2.75502 2.0123
#> 9 -0.30783 -1.373733 -0.38349 -0.0444625 -2.95630 -2.8976
#> 12 -0.20822 -1.243113 -0.34061 0.1353851 -2.65016 -2.9294
#> 10 -0.18164 -1.763048 -0.14154 0.3011646 -2.59855 0.3740
#> 11 0.19343 0.166270 -0.79830 -0.0900196 -1.69244 -6.6981
#> 21 -0.39218 -0.599925 0.81845 1.2941353 4.05314 -2.3795
#>
#>
#> Site constraints (linear combinations of constraining variables)
#>
#> CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
#> 18 -0.63340 -0.31786 0.37974 0.21219 0.43688 0.74049
#> 15 0.46543 -0.10681 0.26531 0.67400 0.16474 1.46966
#> 24 -1.25644 -0.15953 -2.26896 -1.39064 1.20519 0.26496
#> 27 -0.23642 1.31797 0.02128 0.22874 0.39147 -0.26287
#> 23 -0.80784 0.29387 0.01364 -0.53692 0.40217 0.04258
#> 19 -0.05934 -0.26542 -0.63462 -0.05827 2.07815 0.14965
#> 22 0.62292 0.11204 1.31191 -1.27458 0.53227 -0.67300
#> 16 0.57735 -0.46180 1.15852 0.45190 -0.12442 0.35027
#> 28 -0.50250 1.84393 -0.29614 1.11375 -0.84179 0.41762
#> 13 4.33030 0.69454 -1.32804 -0.09139 0.58858 0.48974
#> 14 0.33403 -0.66906 0.55766 -0.86800 -0.31141 0.57704
#> 20 0.10136 -0.11798 -0.16873 -1.76264 -2.16595 0.10477
#> 25 -0.78884 0.01926 0.97335 -0.15003 -1.03166 -0.24249
#> 7 -0.01684 -2.33678 -1.01904 2.82201 0.54679 0.12316
#> 5 -1.06688 -1.65798 1.41339 1.08988 0.08827 0.63800
#> 6 -0.64260 -2.01548 0.35450 1.37131 -0.18129 0.29717
#> 3 -0.65633 -2.43432 -1.33752 3.06115 -0.10950 -0.02680
#> 4 1.23455 -1.08386 -2.48146 1.22948 -3.28181 -0.56056
#> 2 0.70367 -1.37183 -1.36739 0.96833 -1.53539 -0.08939
#> 9 -0.17709 -1.90878 -0.82710 -1.65822 -1.97055 0.82079
#> 12 0.32339 -0.97422 0.73354 -0.29964 -0.22904 0.11216
#> 10 0.11254 0.02724 -0.90758 -0.42922 -1.38852 0.07579
#> 11 0.28124 -0.30360 -0.78421 0.66154 -0.29780 -4.85261
#> 21 -0.29444 -0.75202 1.13629 0.60660 1.20997 -0.16244
#>
#>
#> Biplot scores for constraining variables
#>
#> CCA1 CCA2 CCA3 CCA4 CCA5 CCA6
#> P -0.3459 0.6835 -0.5316 -0.1310 -0.32795 -0.07683
#> K 0.3098 0.8293 -0.4382 -0.1034 -0.09716 -0.06394
#> Ca -0.2648 0.5588 -0.3310 -0.4201 0.09276 -0.56838
#> Mg -0.2048 0.3944 -0.6494 -0.3760 0.46608 -0.14908
#> S 0.2018 0.3644 -0.7895 -0.4298 -0.08002 0.10939
#> Al 0.5015 -0.5136 -0.6433 0.1355 -0.17028 -0.15333

plot(cca_result)
```

![](https://i.imgur.com/8heHISu.png)<!-- -->

``` r

#extracting the data as data frame; env data
veg_1 = as.data.frame(cca_result$CCA$biplot)
veg_1["env"] = row.names(veg_1)

#extracting the data; genusv
veg_2 = as.data.frame(cca_result$CCA$v)
veg_2["genus"] = row.names(veg_2)


plot = ggplot() +
geom_point(data = veg_2, aes(x = CCA1, y = CCA2), color = "red") +
geom_point(data =
veg_1, aes(x = CCA1, y = CCA2), color = "blue")

plot
```

![](https://i.imgur.com/e08Vd88.png)<!-- -->

``` r


plot +
geom_text_repel(data = veg_2,
aes(x = CCA1, y = CCA2, label = veg_2$genus),
nudge_y = -0.05) +
theme_bw() +
geom_segment(
data = veg_1,
aes(
x = 0,
y = 0,
xend = CCA1,
yend = CCA2
),
arrow = arrow(length = unit(0.25, "cm"))
) +
geom_text_repel(
data = veg_1,
aes(x = CCA1, y = CCA2, label = veg_1$env),
nudge_y = -0.05,
color = "blue",
size = 5
) +
theme(axis.text = element_text(size = 16),
axis.title = element_text(size = 18))
```

![](https://i.imgur.com/4RvDctq.png)<!-- -->

<sup>Created on 2023-02-17 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

This should give you something like this and then you can annotate, and imporve aesthetics, as you wish!

Buy me coffee, if this helps you: https://www.buymeacoffee.com/sauravdastsk

--

--

Saurav Das

A curious explorer adrift in time and space, passionate for science. If my posts have aided you in any way, consider supporting me: https://tinyurl.com/mrxbvc7x