Computer Vision EEG Pipeline Tutorial: Part 0

Published in

Labs Notebook

5 min readDec 1, 2022

Extracting information from brain data may sound hard, but here in this tutorial series I will demonstrate that it doesn’t have to be. The code will be implemented using Google Colab with data storage using Google Drive so that anyone can easily access it, and the data modeling will be done through Tensorflow-Keras. In this first tutorial we will go through the how-to on unpacking EEG data into easy-to-use Numpy arrays. The end goal will be to have the data into Train-Validation-Test sets that are ready to put into a deep learning model.

Dependencies: Access to Google Drive + Google Colab
Colab Notebook: https://colab.research.google.com/drive/1jfAtkz90mja3ck7db3mug1WFoUcOda34?usp=sharing

Data Procurement

In this tutorial we will use the Naturalistic Music Electroencephalogram Dataset — Tempo (NMED-T) published by Losorelli et al., (2017). We will go through the common hassle of procuring the data, for tutorial purposes we will only use part of the data. This will be done once as an exercise. For the following tutorials I will provide an easy to use link of the data we unpack in this tutorial.

First, proceed to their website: https://exhibits.stanford.edu/data/catalog/jn859kj8079

Then click to download each of the first 5 files from “song21_Imputed.mat” to “song25_Imputed.mat”. They provide a total of 10 files but because this is an exercise, we will simply use half the data. These are the cleaned and ready to go brain recordings. The files on their site are separated by songs participants listened to, so each file has data for all 20 participants at the full song’s length of EEG recording.

Data Storage

You can use these files locally on your own computer, but here the focus will be using Google Colab, which best pairs with Google Drive for data storage and management. Upload your downloaded data into a folder on your Drive. Next, change the files names for interpretability. For example, turn “song21_Imputed.mat” into “1.mat” and so on until you have “1.mat…5.mat”.

Loading Data

Now we can start using the Colab notebook that is provided in the link at the beginning of the article. Open the Colab notebook and run the first two cells.

The first cell imports the necessary libraries. The second cell will mount your personal Google Drive onto the notebook. Files on your Drive will now be able to be directly indexed and pulled.

Then run the next cell as shown above. You will need to define the directory path of your files in the ‘path’ variable. The code will then pull file names from that folder into a list called ‘files’. Then using loadmat() those file names will be used to open and store the brain data in the list ‘eeg’. We use loadmat() because most researchers collect and store their data using Matlab, which produces .mat files. Our cell output prints out the names of the files we pulled.

The last two cells in this section then simply separate the actual numerical data and only consider the first four minutes. All brain recordings vary in length depending on what song someone was listening to. Therefore, we will only use the first four minutes to have equal array lengths.

Organize Data

Now that we have the data in easy to use numpy arrays, the next thing to do is to organize it into Train, Validation, and Test sets. The first cell in this section first extracts the data into 15 second chunks. This is important because we can then use those chunks to manually shuffle the data across the recording time domain. In some deep learning training procedures researchers prefer to have a training set from the first half and test on the second half of recordings. In other training procedures, like we will perform here, it is important to balance the distribution of the data across time. The second cell does this by allocating chunks from the start, middle, and end of the recording across the Training, Validation, and Test sets.

The last cell in this section then finally splits the data into 1 second examples. You can see in the cell output above that this gives us 24K training examples of brain data and 12K for validation as well as test.

Plot Data

If you have been able to follow along with the Colab notebook so far, then you now successfully have Train, Validation, and Test sets. In the previous cell output we saw the dimensionality of these arrays using ‘print(X_train.shape)’, but to really see the fruits of our labor we should plot the brain data.

The first two cells refer back to array X_4min and plots a single EEG channel of a 4 minute recording.

The following plot indexes the ‘X_train’ array and plots a 1 second example accross all 125 channels. We also import ‘vapeplot’ to select an aesthetic color palette.

Finally, the last plot demonstrates 1 second from 125 channels as an image instead of a time-series like in the previous plot. This will be how the data will be processed in the deep learning training that is to come in the following tutorials.

Computer Vision EEG Pipeline Tutorial: Part 0

Written by Adolfo Ramirez-Aristizabal