Neural Networks for Brainzzz

7 min readJun 26, 2019

Neural Networks came a long way since their invention in the early 1960s (see my short history article), and today I want to share with you an aspect of neural networks that just recently came into the focus of research:

How can we learn from neural networks?

You see, weak artificial intelligence is remarkable at engineering tasks like distinguishing cats from dogs or steering robots. It is so awesome indeed that it surpasses human skill in an increasing number of tasks, and the first approach for this was every time:

Have a data set that includes the features and labels
Train a network
Predict on similar data

However, wouldn’t it be super exciting to learn what features the neural network utilized to make its decision? Especially in tasks where it outperforms humans? Did we miss some relevant information?

In the visual domain exists a straightforward approach: you would input an image with random pixel values and iterate these for as long until one specific neuron in the DNN is maximized, and the rest of the neurons in that layer is minimal. Now you have an image of what this particular neuron is looking for.

But this method depends on a straightforward assumption:

We as humans have a basic understanding of the input data and can make associations.

However, what happens in cases where the input is not so clear? Like brain activity, for example?

Brain-computer interfaces are still in the early stages of development and far from ready for the market. Nevertheless, as soon as they reach production, they will turn the world upside down.

Do you want to know what happens under the hood in the brain?
Then read on, and I’ll give you an insight into my research on auditory brain-computer interfaces that I’ve done during my academic career.

My first attempt in this field was my bachelor thesis in physics.

I played several tone sequences from different directions to my subjects at the same time and recorded their brain activity with the help of electroencephalography (EEG). Based on this data, I wanted to find out which direction their auditory attention was focused on. Back in the days, neural networks were not the way to go, and instead, I measured the strength of neural activity at a specific latency after a tone occurred. For attended tones, this activity is typically stronger than for unattended ones.

red is brain activity at the central electrode for attended and blue for unattended tones

While planning this bachelor thesis, I dreamed of listening to the brain and hopefully extend my paradigm to words, maybe even thoughts!

Oh, how naïve I was. Let me put it briefly and cruelly:

EEG is noisy as fuck and you can’t see shit.

The typical signal-to-noise-ratio (SNR) you will get by recording EEG is around -60dB and is comparable to listening to someone speaking (60 dB) while a fighter jet (120 dB) is taking off right next to you.

You can optimize your recordings by measuring hundreds and hundreds of occurrences and average them together. However, these were the days.

I left brain-computer interfaces alone in my master thesis and made a little detour about digital signal processing and hearing aid algorithms.

Surprisingly and absolutely by chance, I ended up in my doctoral thesis with an experiment design like this one:

Yes. This is the same paradigm; with speech.

Also, while doing my Master of Science, I found my love for automation, robotics and (you guessed it) neural networks. So my approach was clear.

Even the task was the same: Which story does the subject listen to?

Due to my painful previous EEG experience, it was apparent to me that the test subjects should not hear every sentence, every word a hundred times just so that I can extract a representation from the EEG. No, I was determined to make this an interesting experiment, with two naturally spoken audiobooks.

This prerequisite posed the question of how to connect the noisy EEG with the continuous speech data from each direction. I did what any competent physicist would do; I simplified my world view so drastically until I could create a model that would solve the problem to my satisfaction:

Speech is modulated loudness.
Loudness is sound pressure level, is about energy.
More energy must generate more brain activity.

Speech ~ Brain Activity

Fortunately, someone smarter than me has found out that this connection exists (Aiken and Picton, 2008) by using the speech envelope as the target feature. The envelope is like a carpet draped over the high-frequency time course of speech

Envelope (black) on speech signal (blue). [source]

So the neural net I came up with looked like this.

A lot is going on here, but the critical essential is the network’s pyramidal architecture. It takes a matrix of 84 EEG channels and 27 time-frames as input and outputs one single value. After that, it is shifted by one sample and then returns the next point of the envelope and so on.
After a certain number of shifts, the piece of the envelope is compared with the attended envelope, and the weights of the net are adjusted so that the prediction will fit better next time. Given you trained the network long enough, you would be able to predict an envelope from a specific EEG time-frame. The envelope is compared then against both speech envelopes that were present and the better fitting one “wins” and is the attended direction/source.

Enough details. More fascination, please!

Fast forward: The network is trained to predict the attended envelope, and all weights are fixed now.

If we now predict a sample of the envelope, we can measure which input neuron contributed to this prediction and to what relative proportion. This works similar to a backpropagation run.

The relevance analysis is calculated recursively from the output neuron. The initial assumption is: The output neuron has the relevance 1. The relevance of a neuron is then calculated from the relevance of the neuron from the underlying layer weighted with the activation strength.

Okay, math. What about science?

Since we use the EEG channels as input neurons, we can now learn something about their relevance.

Relevance analysis for the input neurons of the network. Blue = not relevant for good speech envelope reconstruction; red = strong relevance.

You can immediately see that most of the EEG channels (plotted on the y-axis) are irrelevant for a good speech envelope reconstruction (blue). Besides, only specific points in time seem to contribute to the reconstruction. To be accurate, a point around 50 ms and one around 170 ms is essential.

The interesting channels applied to the head position result in a consistent image.

Relevance analysis of EEG channels. The network chose small clusters of 1–2 electrodes as informative channels for envelope reconstruction. Red = High relevance; Blue = Low relevance.

Three clusters above and behind the ears seem to be particularly useful for reconstruction. This is an extraordinary result in two ways: First, the spatial resolution is surprisingly accurate. Typical localization approaches are based on Principal Component Analysis (PCA) and provide clusters of about a quarter of the head surface. On the other hand, the clusters are at physiologically, not surprising locations, namely near the auditory cortices. This is good because it shows that the algorithm reflects physiologically meaningful processing.

If the 12 best channels are averaged, the time course of relevance becomes clear.

Twelve most relevant channels averaged — clear relevance peaks at 50 ms and 170 ms.

These two significant peaks show that the information for speech reconstruction can be tapped at specific locations along the auditory pathway.

An even more interesting effect can be seen when the net is trained to predict not the attended but the unattended speech envelope.

Doing the relevance analysis the same way and overlaying both results in:

The second relevance peak at 170 ms disappears.

What does that imply?

This means that the neural network is not able to identify and use information in the EEG at this specific time delay for the reconstruction of the speech envelope.

So why isn’t it able to do that? Presumably, because no processing of the suppressed speech signal takes place at this depth; and since the EEG time delay is related to the anatomical hierarchy, we, therefore, get information about the speech processing in relation to the anatomical location. In particular, our unique ability of auditory attention may (hierarchically) be on the auditory path shortly after 50 ms, because after this point, the identifiable information diverges between attended and unattended processing.

This is an exciting finding and will lead future researches on a more precise path to understanding human brains and speech processing in particular. By showing these results, I would like to take up a plea for the use of neural networks in research. It is possible to deduce information about the underlying processes from a neural network trained on the task and by this gathering meaningful insights.

Today I shared with you my first Ph.D. Paper. After my second one gets accepted (somewhen in the future), I would like to show you how to turn the paradigm around and predict brainwaves/EEG corresponding to incoming speech; and see if we can find these aspects of physiological auditory attention again.