Warning! Unsupervised Neuroscience Ahead
The unruly offspring of machine learning and neuroscience
If there’s one thing we humans are good at, it’s finding patterns. Maths is awash with them — Fibonacci sequences, triangle numbers, the golden spiral. You’re doing it right now, these black marks on a screen making patterns you recognise as letters, words, sentences.
If there’s one thing we humans are bad at, it’s finding patterns where none exist. Balls of gas light years distant are not tracing the outline of a hunter, a plough, or a crab for the sole benefit of a bipedal primate on a blue dot near a middling orange star. Nor is the number of people drowning in a pool in any given year caused by how many films Nic Cage starred in that year, no matter what statistics tells you:
Neuroscience is entering an era where our amazing ability to find patterns will be tested to its limits. The era of unsupervised neuroscience, where we will throw ever more powerful tools at our data to find the patterns within. There are exciting times ahead. But that we love to find patterns in anything means we have to redouble our guard, lest we fool ourselves.
But if it holds such clear and present danger, why are we going unsupervised?
Brains are absurdly complex. All brains. The 302 neuron brain in the tiny nematode worm C Elegans. The smallest vertebrate brains we study, the roughly 100,000 neurons in the baby zebrafish. The mouse brain. The rat’s. Ours.
From those absurdly complex brains, we can now record absurdly complex data. Hundreds or thousands or more neurons at the same time; and the number we can record keeps growing exponentially. We can trace the wiring between regions of the entire brain; the wiring from many single neurons in one region to everywhere they send their axons; the entire wiring diagram within a bit of brain no bigger than a speck of dust. We can take samples of the genes expressed throughout the brain; or drilled down to individual regions or even layers. And we can track the behaviour that those brains produce, at ever finer resolution, in ever more complex, more natural tasks.
It would be foolhardy indeed to assume we know what we’re looking for in these data. Because the chances that we have guessed correctly how a part of the brain codes or computes are practically zero. The chances that we know the wiring motifs made by every neuron in a region of brain are practically zero. The chances that we have a correct understanding of how behaviours are organised, what triggers them, and what sequences them are practically zero.
Perhaps one thing we can agree on: we don’t know what we need to know. So how do we find what we’re looking for when we don’t know what it looks like?
We go unsupervised: we take tools from machine learning and elsewhere that are designed to find patterns in data, and which can do so without any kind of feedback. For, after all, we don’t know what those patterns should look like. So we reach into the computational toolbox and pull out clustering and dimension reduction and community detection and other ways of finding structure in data. Tools that can tell us when the most interesting stuff happens while an animal is behaving, which neurons are firing together, what genes are expressed together. And those patterns are then, potentially, “it” — the thing we need to know, the basis for our ideas, hypotheses, and theories.
An unsupervised neuroscience movement is afoot. An ever-increasing roster of labs are buying into this idea, are going unsupervised to find the elusive “it”.
Many studies now use unsupervised approaches to group neurons by their activity, to find which neurons are active together, and so seem to be computing or coding the same thing. One way of grouping them is to cluster together the moments in time when a population of neurons reproduces a similar pattern of activity, trying to find when a population returns to similar states.
In one spin on this approach, Alon Rubin, Yaniv Ziv and their colleagues came up with the idea of looking at each neuron’s tuning for those states, by measuring how active the neuron was in each state of the population’s activity. You can think of this as a purely internal tuning curve: a tuning curve for the states of the population which that neuron cares about. When Rubin, Ziv and friends worked out these internal tuning curves for neurons in the hippocampus of mice, they found something remarkable: the internal tuning curves were the same as the neuron’s tuning for a location in the outside world, for that neuron’s “place field”. Simply by clustering together similar states of population activity, they could recover one of the key codes of the hippocampus.
Other studies use unsupervised algorithms to group neurons by the similarity of their responses to events in the world. In doing so, they are seeking what those neurons are coding. Hirokawa and colleagues in Adam Kepecs’ lab used just this approach to look at how the variables in a decision-making task were encoded in the orbitofrontal cortex of rats. They characterised each neuron by its activity in response to 42 different properties of the task they set, including different mixtures of two odours that mice sniffed to make a decision, the three different conditions in which odours were delivered, and whether a small or large reward was given for choosing the correct response after sniffing the odours. By clustering 485 neurons by the similarity of their activity across these 42 different properties, they found nine discrete groups of neurons — nine groups that in principle encode different things.
And they did. One group’s activity correlated with the confidence in a decision; another group’s with the previous trial’s outcome; another with the size of the reward. The key insight of Hirokawa and colleagues was simple: by using unsupervised clustering they could show evidence for genuinely discrete handling of a decision in orbitofrontal cortex, a division of labour where each element needed to make a decision is seemingly assigned to a dedicated group of neurons.
The unsupervised mentality is encroaching deeply on how we analyse behaviour too. Sandeep Robert Datta’s lab have come up with smart unsupervised ways to divide the spontaneous behaviour of mice into its elementary components, finding a library of behavioural motifs that mice seem to draw upon when behaving freely. They and others have then gone looking for the neural correlates of these elementary components, finding correlations between these components and activity in the striatum. And its not just what mice do in their spare time. Recently, Valerio Mante’s lab used a neat combination of unsupervised approaches — nearest neighbours and t-SNE — to measure and track the development of songs in zebra finches, revealing that what seems a complex garbled mess of song syllables is actually deeply structured, made up of renditions and regressions as the birds practice and target mistakes. (A paper published in Nature, incidentally, which was entirely about behaviour, and nothing more). And as I type, Nature Neuroscience have just published a review entirely about the unsupervised analysis of behaviour.
These are the tip of a swelling movement. What was a small coterie of people concocting unsupervised algorithms to group together neurons that had similar activity over time — to detect neural ensembles or cell assemblies — has swollen into an industry. Now it seems that every time I crack open a copy of Nature, or other such journal that holds itself in absurdly high regard, the systems neuroscience papers therein are doubling down on unsupervised analyses of their data.
The warning: Care is needed. We are experts at finding patterns in noise, and so are our algorithms.
Take clustering. The problem with clustering is that it returns clusters. I mean, I know that’s what it’s supposed to do, but that’s actually the problem. Give a clustering algorithm the phone numbers of everyone in Llandudno, ask it to find four clusters in the data, and it will. Now you have four clusters of Welsh people, and are none the wiser as to what to do with them, and neither are they. The mere existence of clusters does not mean there is actual cluster structure in the data.
Unsupervised algorithms are about making sense of data for us, the observer. What they find does not have to coincide with reality. Reality doesn’t have a ground-truth, because it does not have clean, neat carves at the joints except in special cases. And most of those special cases are from artificial systems, with the joints built in. Run a different clustering algorithm on your data, and you’ll get different clusters; run a different dimension reduction algorithm on your data, and you’ll get different dimensions. As Ulrike von Luxburg and friends have so clearly argued, clustering is an art.
The unsupervised organisation of data is just a description of that data. Just because we can cluster neurons into groups doesn’t mean there exists actual meaningful groups of neurons in the brain; just because we can cluster behaviour into discrete elements — into states, motifs, syllables, or whatever term you prefer — doesn’t mean behaviour is actually discrete. To find that it means something, we have to link that discovered organisation to reality, show it has meaning. In neuroscience, that typically means we have to link that organisation of data to something happening in the world, or elsewhere in the brain, or both.
And that acid test is passed by the best attempts at unsupervised neuroscience. In the paper from Alon and friends, they showed the neurons’ “tuning” to the hippocampus’ internal dynamics had meaning as location in space (and repeated the same trick for neurons that code for head direction in the rodent thalamus). In the study from Adam Kepecs’ lab on the orbitofrontal cortex, their discrete groups of neurons in turn each encoded a meaningful variable in the decision-making process. Even better, they re-did the whole analysis with another cohort of animals with more neurons, reusing all parameters from first cohort, and ended up with the same results. These studies could show us a mapping between the unsupervised structure of the data and the real world.
Terrific work, but those are the “easy” ways of doing unsupervised neuroscience — by relating what we found to what we already know. We already know that hippocampus has place cells, and that there is a head direction system in the rodent thalamus. We already know the neurons in orbitofrontal cortex are heavily implicated in decision making, and to work out what their groups of neurons were encoding Kepec’s lab interpreted their activity as the variables within a mathematical model of decision making. If that model is wrong, the mapping between variables and activity is of little consequence in building our confidence that the clustered neurons are really there. And others, of course, may find different answers: when Anne Churchland’s group went looking for discrete groups of coding neurons in posterior parietal cortex, for example, they found none.
The ultimate test for unsupervised neuroscience is discoveries that could not be found any other way. There are some examples of that too. For example, we took large-scaling recordings of neurons in the motor system of the venerable sea-slug Aplysia. Using a fully-unsupervised pipeline to analyse that data we discovered its motor system was doubly discrete: on one level, groups of neurons with correlated activity were laid out contiguously in the motor system, beautifully tesselating the bit of brain they were in. On another level, mass populations of neurons with clearly different dynamics were in different parts of the motor system, including a distinct population of oscillating neurons in one spot that were most likely the pattern generating network for movement — a discovered hypothesis waiting to be tested. Joshua Vogelstein and friends discovered a richly detailed map of the relationship between neural activity and the resulting behaviour of Drosophila maggots, by individually stimulating each of 1054 types of neurons, videoing the behaviour, and clustering it. They thus revealed 29 different types of behaviour and which neurons drove each. But these are discoveries of relationships, of structure; they are not yet that final step of unsupervised discovery of a theory for how a bit of brain works.
Many will feel uncomfortable with this view of the world. For many, science means: test hypotheses. It doesn’t mean: fish around in the data until something interesting pops up.
But what hypotheses? Science is an exercise in bootstrapping, of scrounging meaning from meagre data, to build an hypothesis to test. How do we find the data to suggest hypotheses in the first place? There’s only so far we can get with theories based on extant knowledge; unsupervised neuroscience promises a way to get those data, and get them in a neatly structured package waiting for interpretation.
Systems neuroscience is hardly the first to take this “look mum, no hands” approach to science. Geneticists have for years routinely used unsupervised algorithms to group things (people, animals, cells) by the genes they express; those things can just as easily be neurons as anything else. Whatever your feelings about unsupervised analysis of big data sets, whether you fervently believe we are in a brave new world of knowledge discovery, or that we are in the midst of a world-beating exercise in noise mining, unsupervised neuroscience is here to stay. You have been warned.
Want more? Follow us at The Spike