Crossover: neuroscience meets data science

Published in

MCD-UNISON

7 min readJan 8, 2024

In this post, we’ll explore the crossroads of two important scientific disciplines: neuro and data science. Specifically, we’ll explore certain data science methods and tools that have enabled significant advancements in neuroscientific research.

Picture a typical day where you find yourself contemplating the dataset of electrophysiological recordings you obtained from an animal subject, maybe one of our closest relatives: the macaca Mulatta. Your statistical background informs you that in order to understand the activity of the neuron population in a specific brain region, it’s essential to study the activity of a representative sample of it. Based on this, you opted to obtain extracellular records. This involves the implantation of microelectrodes, with tip diameters of 2 to 3 micrometers, in such a way that their final position often ends up in the spaces between various small neurons, allowing for the simultaneous recording of their activity.

**Extracellular recordings.** A) Neuronal activity in this area is considerably higher than the background noise, and are therefore easily detected. While the colored neurons are very active, the gray neurons are those whose activity is scarce and therefore are not usually detected. B) Neuron activity in this area is not large enough to be clearly detected. Image taken from this article.

Therefore, you can’t stop thinking about the fact that the voltage recorded by your electrodes actually contains the diffuse activity of several neurons. Again, your statistical background tells you that you can only get an idea of the population’s activity by analyzing the individual activity of the neurons that constitute that population. This leads to the question central to this post: How can you discern and segregate the activity coming from different neurons?

Deep in your heart, your great motivation comes from the fact that the recordings were made during a behavioral task that took you months to train the animal to perform. Perhaps the task involved choosing between various options, and as a result you suspect that this work will provide insights into the relationship between the brain area activity and the decision-making process. Or maybe, the task involves retaining information over a period, and you hope to get clues about how this neuron population participates in memory processes. Regardless of the specific behavioral task, you know that if you execute your work correctly, someday people might see your name in cognitive neuroscience books.

**Example of a memory and decision-making behavioral task: Bimodal Discrimination Task.** A trial begins with the mechanical stimulator applying sustained pressure to one of the fingers on the monkey’s fixed hand (probe down or pd). After this signal, the animal positions its free hand on a touch-sensitive lever (key down or kd) and maintains it in that position until the end of the trial. Followed by a period of 2 to 4 seconds, the primate receives a first stimulus lasting 0.5 seconds. This stimulus can be tactile (T), through the mechanical stimulator, or acoustic (A), through a speaker. Separated by a 3-second delay period, the monkey receives a second stimulus of modality A or T. After 2 seconds, the stimulator stops applying fixed pressure to the animal’s finger (probe up or pu). This is the signal for the primate to release the lever (key up or ku) and indicate which stimuli had the highest frequency by pressing one of the two buttons (press button or pb).

So, coming back to the important question, let me inform you it was answered long ago through the development of a technique called Spike Sorting. However, before I rush into explaining this tool, allow me to rephrase the question as: is there some type of marker that allows us to distinguish one neuron from another? Fortunately, the answer to this question is a resounding yes! Just like people in society can be uniquely identified by some specific markers like fingerprints, neurons in the brain also have their own identifying markers.

In order for me to continue the explanation, let me invoke your neuroscience background for a moment. As we know, neurons are cells characterized for their capability to produce action potentials or spikes, in answer to certain stimuli. A spike is a voltage oscillation caused by ephemeral structural modification in the cell membrane, generating a momentaneous change in the flow of ions between the inside and outside of a neuron. As we might imagine, the specific course of this voltage oscillation depends on the original membrane structure in the first place. But also there are some other factors like the cell shape and size, the ionic composition of the outer space, the distance between the cell and the site of recording, among others. All these factors together cause each neuron to be identifiable by the specific shape of the action potential it produces in response to stimulus. This is the principle of the aforementioned Spike Sorting technique.

fsdsdf — **Identity markers.** A) Fingerprint examples that could allow us to identify the people to whom they belong through some forensic analysis. B) Action potential examples that could allow us to identify the neurons to whom they belong through a Spike Sorting procedure. Images taken from this web page and this article.

The Spike Sorting technique is composed of several ordered steps, all related to the standard procedures data scientists employ for insightful data analysis. These steps are: filter of data, spike detection and alignment, feature selection and extraction, and clustering. It’s worth mentioning that the specific algorithm(s) used in each step can vary from one technique version to another. The remainder of this post delves into the description of the Spike Sorting technique, but specifically using as a reference the version that I used during my bachelor’s thesis, which you can consult here.

**Theoretical bases of Spike Sorting.** A) A microelectrode receives the signal from a small set of neurons near the recording tip. B) These signals are made up of both action potentials and slow oscillations. i) A high-pass filter is applied to focus only on action potentials. ii) This new dataset contains spikes with different shapes, depending on their neuron of origin. iii) The features of the action potentials that will allow them to be differentiated from each other are extracted. iv) Based on the extracted characteristics, the spikes that putatively come from the same neuron are grouped. Image taken and adapted from this publication.

The first step is a simple high pass filter. Given that the spike shape serves as the marker and the recordings encompass a spectrum of voltage oscillations, the process involves discarding slow oscillations and retaining only rapid oscillations that may be action potentials. A lower cutoff frequency of 300 Hz is commonly used. That is, we retain only voltage oscillations over the 300 Hz.

Following the isolation of putative action potentials, just like when you line up a group of children to take their measurements, the next step is to align the spikes within a unique temporal window. This alignment facilitates the analysis of their features without the disturbances of the temporal distance between them. This results in a bunch of overlapped putative action potentials. Occasionally, plotting these overlapped spikes is enough to reveal distinct shapes, implying the presence of multiple neurons. To enhance comparability, this stage typically includes noise elimination to discard aberrant voltage oscillating during the spikes. Also, to ensure continuous signals throughout the window period, interpolation algorithms are commonly employed to fill gaps caused by recording imperfections.

**Example of spike shapes.** A) Putative raw action potentials obtained after the high pass filter. B) Spike shapes after the interpolation and elimination of the aberrant voltage oscillations.

Continuing with the children’s example, suppose after we collected all their measurements, we are asked to form the fairest possible teams for a basketball game. What we would need to do first is to identify the most relevant measured variables for this goal. While features like hair color or favorite music band turn out to be unimportant on the playground, we can’t ignore the age and height of children if we don’t want to end up with a team of 1.5-meter 10-year-olds against a squad of towering 2-meter 15-year-olds. A similar process is implied in the next Spike Sorting step. We need to ask ourselves: what are the most important features of spike shapes in order to differentiate them? Fortunately, the answer to this question is that we don’t exactly know, but also we don’t exactly care. This is because of an old friend named Principal Component Analysis or PCA. This tool constructs a set of new, highly relevant features known as principal components, achieved through specific linear combinations of the original features. So, although we might not know exactly which and how the features go into each component, we can be sure that their linear combination is forming a very informative feature.

**Example of PCA projected data.** Spike data projected in two dimensional spaces formed by all the possible combinations of the first three principal components (named Features). Data seems to cluster into two groups, indicating the presence of two spike shapes, and therefore two neurons.

Once we have the features to take into consideration, it is time to group those spikes sharing similar measures. To achieve this, a cluster algorithm such as k-means can be employed. This is an unsupervised learning algorithm that requires a number of groups to look for as an input parameter. Therefore, we now face the question of how many neurons we suspect there are in the analyzed recording. Although at this point we potentially have visual insights from the overlapped spikes plot and the behavior of data in a principal components space, it is important to be sure if we don’t want to consider extraneous voltage oscillations as neuron activity. Luckily, there are algorithms to estimate how good different group numbers are. An example is the Silhouette metric, which simulates grouping with various cluster numbers and outputs a value indicative of the efficacy of each number. The computation of these values is grounded in the similarity of elements within a group (cohesion) and their dissimilarity with elements of other groups (separation). This resulting value falls within the range of 0 to 1, where a higher value implies a better grouping for that specific cluster number.

**Clustering example.** A) Plot of the Silhouette values for the different number of clusters, with a maximum value for 2 clusters. B) Spike shapes of each group after selecting 2 as the number of clusters. By common practice, spikes that are more than 4 standard deviations from the cluster centroid are considered noise.

After this last step, you have segregated your recording into the activity coming from different neurons. Once you apply this technique to all of your recordings, then you’ll have a whole sample of the neuron population activity obtained during the behavioral task. From this point on, to get your name into a cognitive neuroscience book, it will depend on the results of the statistical analysis of this data. But that… will be for the next chapter.

References

Barreras, J. C. (2021). Papel de la corteza premotora ventral en el procesamiento de memoria de trabajo somatosensorial y auditiva. México: UNAM, Facultad de Medicina. Tesis de Licenciatura en Neurociencias. https://ru.dgb.unam.mx/bitstream/20.500.14330/TES01000833713/3/0833713.pdf

Paraskevopoulou, S. E., Barsakcioglu, D. Y., Saberi, M. R., Eftekhar, A., & Constandinou, T. G. (2013). Feature extraction using first and second derivative extrema (FSDE) for real-time and hardware-efficient spike sorting. Journal of Neuroscience Methods, 215(1), 29–37. https://doi.org/10.1016/j.jneumeth.2013.01.012

Pedreira, C., Martinez, J., Ison, M. J., & Quian Quiroga, R. (2012). How many neurons can we see with current spike sorting algorithms? Journal of Neuroscience Methods, 211(1), 58–65. https://doi.org/10.1016/j.jneumeth.2012.07.010

Quiroga, R. Q. (2012). Spike sorting. Current Biology, 22(2), R45-R46. https://doi.org/10.1016/j.cub.2011.11.005

Crossover: neuroscience meets data science

References

Written by José Carlos Barreras Maldonado