Music in Your Head: My First Step towards Neuroscience

What I’ve learned about neural decoding at Brain Code Camp (2023) in 10 weeks

Mongkud Klungpornkun (Paul)
7 min readNov 5, 2023
Image By vecstock

TL;DR

  • I’ve learned Neuroscience to decode EEG to music in 10 weeks.
  • MLP model is unstable while Bi-LSTM Model can’t learn much.
  • I can’t reproduce it from the original paper yet.
  • Talk is cheap. Show me the code.

On the first day at Brain Code Camp, we were told to dig into real data and let's make a project within 10 weeks. It reminds me of what happened in my last year of university when I took an individual course to study machine learning for the first time. With some experience in Data Science, I accepted this challenge and tried to learn how to preprocess brain data without hesitation.

Finding Dataset

After skimming through a hundred public datasets on OpenNuero, I found that lots of MRI datasets come with really huge sizes (+100GB) or literary unknown tasks. I noticed that there is an ‘auditory’ tag and I suddenly picked it as a main topic, then I realized news of researchers who can decode a song from brain activity. So I wonder what if we can also decode from imagery.

EEG by Channels with Event Annotations on the Top Axis

Fortunately, I found an interesting dataset that was reported with successful decoding of music using a joint EEG-fMRI, and for the imagery part, there is open-source data named “OpenMIIR”. Both can be downloaded and only have total sizes of around 20GB.

fMRI with Connected Regions of BOLD signal

I really enjoyed coding visualization on fMRI for a week. However, after trying to extract its image features, I didn’t think of my chances of finishing because the data contains time-series 3D images. So I proceeded to decode it with EEG data only.

Data Preprocessing

The most fun part of the project is here. As we know, the longest and most crucial step in data science work is data preprocessing and It takes 8 weeks of my time.

1. EEG Preparation

Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain using small metal discs (electrodes) attached to scalp.

The processes include filtering bad channels, down-sampling the data, and repairing artifacts. These can be learned within the MNE library with many practical tutorials and best practice guides except for repairing artifacts.

Note about down-sampling: Library provide an resampling function but we should noticed that it will also change recorded IDs in Stimuli channels.

When recording EEG data, not only task activities are collected but all of the tiny physical movements such as blinking (called Electro-Oculography or EOG), tilting the head, and any muscles including a beating heart.

So brain signals comprise all operations and we can separate them by Independent Component Analysis (ICA). Unfortunately, identifying components needs more domain knowledge in Neurology which we can learn and practice in this site.

Example of Eye Blinking Component from ICA

After I tried to manually label all 32 components of each participant, I was really lost with weird patterns from ICA and I tackled this problem with an alternative way, an ICA Labeling model which is easy to use with some required data preparation.

2. Music Preparation

For outputs, I prepared 2 different representations of music. The first one is just sound waves. It’s averaged from stereo to mono and scaled to the [0, 1] range for matching with a tanh activation.

Spectrogram of a song with Onset Detection

The second, Short-Time Fourier Transform (STFT) is used to represent partitions of a song in the frequency domain. I’ve learned a new library called “Librosa” for creating the spectrogram above and also detecting onsets (a starting frame for each music note).

P.S.: During this preparation, I also measured each onset to music note notation, hoping it could be used for prediction, but varying shapes for input is too difficult to do in this project.

Transforming frequencies to musical notes with Librosa

Data Analysis

In this section, I tell only an analysis from OpenMIIR only due to a different result from a “Joint EEG-fMRI” dataset which I can’t find an exact time of the start of trial.

I’ve checked on the EEG at the start of each trial to see what the data looks like, as we see below, the EEG won’t tell much meaning from what we hear but the most obvious sign is after exactly starting (at 0 ms), It shows a cue beep!. (For those who want to hear real audio in the experiment and others, please see the final presentation)

Averaged Brain Signals at the Start of Trial

By averaging all short signals at the beats, there is a significant change in EEG. It is also a signal that the model could learn the music from brain activity.

Averaged Brain Signals at each beat in Listening Trial

In contrast, the imagination experiment won’t show any sign except for fluctuating EEG data. So I investigated further but couldn’t see any clues yet.

Averaged Brain Signals at each beat in Imagining Trial

Models

For a baseline model, I proposed a simple MLP network that consists of 4 layers with batch normalization layers. Another comparison model, I created a bi-directional LSTM model which is mentioned in a “Joint EEG-fMRI” paper.

Sample Output from Baseline Model

After training on prepared inputs, both models can’t learn on a “Joint EEG-fMRI” dataset. So I tried to switch a sampling size of input and change output from wave to STFT data and It still won’t work. I concluded that my data might not be clean enough or missing some steps because a stimuli channel (recorded data of when songs played) seems broken in my project.

Last two weeks, I coded new preparation scripts for “OpenMIIR” (Luckily, they provided pre-calculated ICA components) and a new setup for experiments with many more parameter controls. So far its performance seems somewhat unstable and It can perform only 8–15% of Structural Similarity.

Baseline Performance in an OpenMIIR Dataset

Experiment Results

Last but not least, I would like to show some results from the baseline model. I have created testing datasets purposely with completely new subjects, and songs by instruments from a training set.

The OpenMIIR data come with 3 pieces of music that contain an instrument only which is a Celesta (Harry Potter’s Theme), a Trombone (kinda mixed, Star Wars theme), and a Violin (a classical piece by Mozart, Eine Kleine Nachtmusik). These three pieces are on both datasets and models perform well on Trombone and Violin but not in Celesta.

I further investigated an output wave and saw that these results related to waveforms of instruments which is a continuous sine wave in Celesta and a fluctuating wave in Trombone and Violin (see more in presentation). So I conclude that models can’t perform well on Celesta sound because an input EEG is also fluctuating.

Structural Similarity by Each Instrument

Last, an unexpected report came after hours of training. The figure shows that it could learn unknown songs without any knowledge before. These predicted outputs won’t be investigated yet due to my last day of the project.

Structural Similarity on Different Test Datasets

Acknowledgment

Thanks to all professors and assistants in Brain Code 101 project which is sponsored by PMU-B. This camp was inspired and started by the one of founders of Neuromatch Academy. I am really glad to learn from many experts who give many pieces of advice about Neuroscience with hands-on workshops.

Links

References

--

--