Brain Computer Interface for Decoding Speech in a Paralyzed Person — Paper Summary

Published in

the last neural cell

4 min readMar 7, 2022

🧐 At a glance:

Anarthria (the inability to articulate speech) makes it hard for paralyzed people to interact with the world. The opportunity to decode words and sentences directly from cerebral activity (ECoG) could give such patients a way to communicate.

Authors build AI model to predict word from neural activity. They achieve 98% accuracy for speech detection and 47% for word classification from 50 classes.

Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria — paper 2021

🤿 Motivation:

Authors divides speech decoder into two submodules: speech detection and word classification. The first predicts offset of speech and the second predicts “imagined words” probabilities.
Then they construct 50 sentences from 50 words and train language model to predict next word from previous ones.
Combining predictions with language model allows to achieve low word error rate.

🍋 Main Ideas.

Experiments settings

Set up ECoG with 16 * 8 electrodes on the left sensorymotor region.
Ask patient to produce in the head a word which he sees on the screen. (isolated word setting). Thus we are labeling the data to train the model.

Data preprocessing

Common average rereferencing.
Extract high gamma ( 70–150 Hz) from each electrode and get its envelope. ECoG gamma frequencies are suggested as good features for movement and speech prediction.
Apply z-score for each electrodes with large window (30 sec) — this is done to account for signal shifts during the time course of the experiment.

🤖Models

Speech detection model.

Use LSTM model for prediction probability of imagined speech being produced (detect the speech attempt). Then the apply thresholding for getting t*

Word classification model.

Authors used convolution + LSTM for 50 class prediction.
This model gets input window with size [t*-1 , t* +3] for further classification.
Kaggle trick was applied: train 10 such models and average outputs of them. As a result they improve performance. Unfortunately I did not find information about accuracy for each model separately.

📈 Experiment insights / Key takeaways:

Investigate influence of brain regions for prediction each model.

Speech detection: Dorsal part of sensorimotor cortex. It is related in speech intention.
Word classification: Ventral part of sensorimotor cortex. It is language specific region.

Results:

Speech detection: 98 % , Word classification: 47.1 %
Word error rate in sentence prediction: 60 %
Word error rate in sentence prediction with language model 25.6 %

✏️ My Notes:

Good idea to use dictionary of the 50 most frequent words because it allows people to cover basic communication needs
Using pretrained language model only on 50 possible sentence seems like a trick and hack. Model just overfits on these sentences. We should consider accuracy w/o language model!
Use ensemble of 10 ANN is “Kaggle” like trick.
It is hard to say about stability if we compare performance of models which was trained on different amount of data (they compared stability adding more historical data to train the model)

Further investigation:

For checking stability we should train model on some data and then use these weights for prediction after some time.
We can use different architecture for speech detection. End to end binary classification. Fully Temporal convolution architecture. Obtain the same thresholding.
Use advanced transformer models instead RNN for temporal aggregation in word classification stage.
It is essential to combine speech detection and word classification because I think they may same features.
It is interesting to use true language predictive model that could operate not on limited 50 sentence set, but dynamically suggest most probable words based on history (1, 2, … last words)

This review was made with Alexey Timchenko