Canonical Correlation Analysis and Neural Network Representation Similarities

Gatha Varma, PhD

Published in

WiCDS

4 min readJan 17, 2021

Part 2 of 5: Canonical Correlation Analysis (CCA) and its use to measure representation similarities of neural networks

Photo by **Designecologist** from **Pexels**

Our similarities bring us to a common ground.

In a past post, I had talked about what is an internal representation learned by the neural networks and why researchers are focused on exploring aspects of their similarities.

Now I would like to talk about the technique Canonical Correlation Analysis (CCA) and how it emerged as a tool of choice to measure representation similarities of neural networks. Introduced in the year 1936 by Harold Hotelling, CCA is a statistical method that investigates relationships among two or more variable sets, where each set consists of at least two variables. Why the condition that demands two or more variables in an individual set? It is because canonical logic applied to sets containing less than two variables would then become something in the likes of a t-test or regression analysis.

CCA is a multivariate method that simultaneously considers all the variables in a single analysis. It honors the reality that in nature all the variables can interact with each other. This results in statistically significant results and huge effect sizes. It is a multivariate form of a general linear model since a general linear model recognizes that all analyses are correlational and yield variance-accounted-for effect sizes.

So what runs under the hood of CCA? The first step in a CCA computes a bi-variate (remember two variable sets?) product-moment correlation matrix involving all the variables. It finds any linear combination of a set of variables that are most highly correlated with any linear combination of another set of variables. The resulting compact linear representations are called canonical variates. Each of the resulting canonical variates is then computed using a canonical vector, which is the weighted sum of every original variable in the set.

Let us take an example of demographic factors like X1 ( = age, sex, diet) and another set of demographic variables X2 ( = heart-rate, hemoglobin, blood pressure). CCA can be used to estimate the possible association between X1 and X2 by quantifying the correlation between the two sets of multidimensional variables.

Multiple domains of data measured for N participants. Image by the author.

A sample data can be considered for a random size of N = 50 of survey participants in an attempt to determine which factors influence health conditions in X2. For this, two collections of variables were measured. The first set X1 contained age, sex, and diet of the participant, and the second set comprised of heart- rate, hemoglobin, and blood pressure measured for each participant. CCA then sought to re-express the datasets as multiple pairs of canonical variates that were highly correlated with each other across participants as shown in the above figure.

Resulting canonical variates & correlation. Image by the author.

In each domain of data, the resulting canonical variate is composed of the weighted sum of variables by the canonical vector. In the scatter plot shown above, each participant can then be described by two canonical variates that were maximally correlated. The linear correspondence between the two canonical variates of X1 and X2 is the canonical correlation — a primary performance metric used in CCA modeling.

Canonical Correlation Analysis on Neural Network Representations

Coming back to the use of CCA to gauge representation similarities between neural networks, the underlying process is a neural network being trained on some tasks. The multidimensional variates, in this case, are neuron activation vectors over some dataset X. As explained in Part 1, a neuron activation vector denotes the outputs of a single neuron ( = z) on X.

In short, one multidimensional variate = a single neuron activation vector

Then, a set of multidimensional variates = a layer consisting of neurons

We can then consider two layers, L1 and L2 of a neural network as two sets of observations, to which we can then apply CCA to determine the similarity between the two layers. Most importantly, it also enables comparisons between different neural networks which is not naively possible due absence of any kind of neuron to neuron alignment.

Sources:

Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis

Insights on representational similarity in neural networks with canonical correlation

Canonical Correlation Analysis and Neural Network Representation Similarities

Canonical Correlation Analysis on Neural Network Representations

Published in WiCDS

Written by Gatha Varma, PhD