Computer Vision EEG Pipeline Tutorial: Part 1

Adolfo Ramirez-Aristizabal
Labs Notebook
Published in
10 min readMay 11, 2023
1 second of brain data decomposed into Mel-Frequency Cepstrum Coefficients as power in decibels

Processing brain signals has traditionally involved researchers with carefully cleaning data and extracting features, but recent techniques have demonstrated the capability to develop AI models that take on that responsibility in their neural architecture. In this tutorial series I demonstrate the efficacy and how-to of implementing Computer Vision techniques for the purposes of retrieving information from brain signals. In the previous tutorial, I demonstrated how to retrieve, unpack, and visualize an EEG dataset. Here we begin with learning the steps to simple feature extraction and 1-Dimensional modeling for the purposes of understanding the contrast to Computer Vision.

The Big Picture

The research and development of brain sensing is moving past proof-of-concepts and closer to consumer grade implementations. Researchers from Accenture Labs have begun to bring competitive innovation in the space. Such industry perspectives outline Neuroscience-as-a-service which references developments in Digital and Precision Health services. Accenture’s partners such as Immersion Neuroscience and BehaVR invest in a future of mental health and personalized training services powered by non-invasive, precision bio-sensing. On the ground, novel technical methodology for training AI systems have focused on balancing naturalistic data collection with the efficiency of end-to-end processing. Computer vision techniques can improve the performance and efficiency. We will explore how in this tutorial series.

Downloading the Data

To get us started, go ahead and access the notebook link above and follow the code along with this blog. Set your Google Colab session to run with a GPU for ease of use. Every free account should come with a limited GPU access.

The first step will be to run the first three sections which will install dependencies, download, and load the EEG dataset into memory. In the previous tutorial we saw how we could gather the dataset, unpack it and organize it ourselves, but here we start with direct access to it.

This dataset is a portion from the Naturalistic Music Electroencephalogram Dataset — Tempo (NMED-T) published by Losorelli et al., (2017). The data is of people’s brain signals while they listen to music.

Visually Inspect

Next lets run the code cell that helps us visually inspect our dataset. With every dataset, you will want to do this to not only check that you have downloaded the right dataset, but to also have a visual understanding of what that looks like. Here we see 1 second of brain data while the participant is listening to ‘First Fires — Bonobo’. In the code cell you can manually explore this by changing the index number from ‘10’ to inspect other samples. This is recommended to understand what the variety of brain signals looks like.

Generate Target Labels

In this next section, the code cells are used to generate target labels for ‘Songs’ and ‘Participant’ classes. These labels are what we use to map our training data during supervised learning. With this we can train our models to classify the data based on the song that they were listening to or a specific participant’s brain. Here I show what code is used to generate the labels, but in the next tutorial this will be accessible to us through a simple download.

Feature Extraction

Now we are ready to explore some feature extraction. This is both a process of transforming and understanding the data by looking at how the information is represented through abstraction.

1 second of EEG at recording channel 5 while participant listens to music

Let’s start by taking the Hilbert envelope of our selected recording channel. This is a common signal processing step when dealing with time-series such as from seismic quakes, music, and brain data. Simply put, the Hilbert envelope gives us an average of the signal’s amplitude over time. This is often used to make the signal easier to process in machine learning and for simple denoising.

Hilbert envelope 1 second of EEG at recording channel 5 while participant listens to music

Here you can see that the signal looks very similar. The main changes is that the new signal is now centered and that amplitude values are only on the positive axis. The code cells following this visualization in the notebook will then transform all your EEG data into a copy of Hilbert envelope transforms. This will be used to test machine learning models.

Simpler machine learning models cannot handle complex data and need for a signal to be reduced to a single but informative number. In the next code cell we get a single informative number by taking the Root Mean Square (RMS) of a recording channel. This metric is the approximated power dissipation of a time varying function. In other words, it is an average power metric for the electrical current of interest e.g., brain signals.

Distribution of RMS values across all recording channels for 1 second examples

Next we plot the distribution of RMS values from a 1 second time slice to see how power dissipation varies across recording channels. There are a total of 125 recording channels on a participant’s scalp and here we see that their RMS values are normally distributed with a strong skew to the left with some channels providing high values. If the distribution was more rigidly uniform then it could be indicative that the dimension of channels is not very informative and that it might be easier to average out the data into a single channel. Because we see some interesting variability in RMS frequency across channels, we will use the code cells in the notebook to then create an RMS version of our data.

1 second EEG example transformed into cepstrum coefficients at a mel-frequency scale.

Lastly, lets explore feature extraction that is commonly related to complex auditory signals such as music. Here we have a way of getting the Mel-Frequency Cepstrum Coefficients (MFCC), which outputs a 1-dimensional array. This is a function that approximates the short-term power spectrum of a sound to a mel scale. You can think of this as a second order frequency measure like going from a first order speed to a second order velocity measurement. Mel scale refers to transforming the signal to how sound is perceived by humans where certain frequencies have stronger representation than others, referring to what we think is an audible and perceptible signal.

The MFCC function that we get looks to have enough complexity for it to serve as a feature of the training data. This type of feature extraction faces one main limitation. It is dependent on the sampling rate of the signal which is the recording resolution available. Here the sampling rate is 125 Hz, which is on the lower end, so we must note that a lack of recording resolution may limit the efficacy of the features that can be extracted.

These coefficients can be visualized as a type of bar code when looking at the power in decibels. This is ultimately a good analogy for what happens with machine learning and feature extraction. We take something that is of higher dimensionality and reduce it to simpler but informative representations that specific algorithms can then interpret.

Load MFCC Data

The code in this next section lets you download the brain data transformed into MFFCs. There is also optional code if you want to process the data yourself, but this can be computationally intensive and take anywhere from 10–20 minutes.

1-Dimensional Machine Learning

First, lets try some simple machine learning modeling. The goal is to get an initial idea of what simpler and traditional methods can do for us. Traditional machine learning usually depends on researchers performing feature extraction before hand. This makes it easier for models to compute as we shrink the dimensionality of the data and identify salient features that we care about. We already explored several analyses that do this by denoising the signal, reducing dimensionality, and measuring frequency components.

First lets start by exploring univariate model training. For this we will use Root Mean Square values as our predictor to the song name target classes with a simple Logistic Regression model for classification.

Running the univariate predictor using an RMS score per recording channel to predict what song someone is listening to via a Logistic Regression model gives us poor performance. Random chance for classification is 20% and the model validation shows 21.34%, which is at least better than random chance but not a desirable result.

Next let’s try doing the same training procedure but instead of trying to predict the song names lets try to predict which person’s brain the data comes from.

Again we see poor performance, where random chance is 5% but our model validation performance is 6.4%. The good news is that the model is at least trying to learn but it simply can’t do it well. This could be for several reasons, but the most obvious hypothesis is that boiling down a complex signal like brain data to 1 number erases too much information.

Next lets try classifying our data using vector arrays. Here we will use the Mel-Frequency Cepstral Coefficients as predictors and the same Logistic Regression models. Let’s see if having more than 1 number, but still not modeling it as a 2 dimensional representation, can give better results.

It looks like even our MFCC arrays are not giving better results. Validation performance is still around random chance.

1-Dimensional Deep Learning

We saw that it was difficult to train simple/standard machine learning models with features from brain data. Next, lets try leveraging the power of simple deep learning. The strength of deep learning is that it allows the model to learn more on its own. This means that it usually requires less feature extraction. Therefore, lets test a simple deep learning approach using the Hilbert envelope from the brain data, as it still holds more of the original signal’s dimensionality.

The first two code cells in this section set our previous dataset copies as empty lists to free up space. Then it turns our targets into categorical representations for our neural network’s multi-class classification.

Next, we use Tensorflow with Keras api to define our neural network. The architecture provided here is a simple 1-Dimensional Convolutional Neural Network (CNN). This model has 3 CNN layers which do the job of feature extraction and is followed by a Global Average Pooling (GAP) layer used to condense the dimensionality of the data. Lastly, this is then mapped onto the output layer which is simply the size of however many classes we want to classify. In our case, we have a total of 5 songs that a brain signal corresponds to.

In the next code cell the training parameters are set and running the code will start the process of deep learning. Make sure to have set your colab session to a GPU, otherwise this process can take much longer.

The graph below shows the accuracy increase across epochs of model training. We only trained it for 25 epochs as a demonstration. Typical deep learning requires training for hundreds to thousands of epochs. But even with 25 epochs we see that the training is starting to plateau. This means that we can expect performance to increase after hundreds of epochs but that the increase will not be much higher. In our case, the deep learning model did better than the Logistic Regressions but not significantly so.

The graph also demonstrates a classic training-validation divergence. This is to be expected as models will always try to find the easiest solution to the training data, which it usually involves trying to memorize the patterns in the data. So when you test it on unseen data, the performance will be lower. Optimization of deep learning is another topic outside of the scope of this tutorial but here a simple version of trying to regularize the prediction of the model was done by adding a dropout layer. This randomly turns-off network connections to a set percentage, which helps in trying to learn patterns in the data with different neurons learning at different epochs to have a diverse perspective in its memory.

Conclusion

In this tutorial we learned how to explore feature extraction and compared simple machine learning with simple deep learning. Representing our data in a single dimension, either as a univariate metric or vector array, has not yielded strong performance. Our efforts so far have been for instructive purposes of Data Science skills and exploration of the topic. In the next tutorial we will finally get to more advanced deep learning methods and see how representing a time-series like brain signals as an image can improve accuracy.

--

--

Adolfo Ramirez-Aristizabal
Labs Notebook

Associate Principal Researcher at Accenture Labs — Digital Experiences