Verbalize

Published in

Bootcamp

5 min readMay 20, 2023

Brain-Computer Interface for Imagined Speech Recognition

This story was originally published in my portfolio: Pradhyumnaa G’s Portfolio. I’m currently looking for an internship so if you’re interested in my work, please contact me at pradhyumnaag30@gmail.com.

1 The Goal

Verbalize is a machine learning model that aims to predict imagined speech based on brain signals. I plan to utilize electroencephalography (EEG) data to create a model that can learn from these EEG recordings and is able to map the brain signals to corresponding labels.

2 The Target Audience

Our main target audience are those who have a temporary or permanent condition that does not allow them to speak. For example:

People who are Mute.
People with Multiple Sclerosis.
People with Muscular Dystrophy.
People who suffer from Spinal Cord Injuries.
Stroke Survivors.
People who have undergone laryngectomy or suffered vocal cord damage.

3 Comparisons with AAC

How does this product compare with Augmentative and alternative communication (AAC) devices that can help people “talk” artificially?

Pros:

A Lot of no-tech, low-tech, and high-tech options available.
Less expensive.
Easily Accessible (Such as a Text-To-Speech Mobile Application).

Cons:

Some options are only viable for those whose hands are still functioning.
Imagined Speech makes the process more “natural”.
Can be cumbersome to carry around and set up.
Requires time and effort to learn how to use them. (Eye-Tracking AAC have a steep learning curve for example)

4 The Dataset

The dataset that I will be using to create an imagined speech model is from the 2020 International BCI Competition. This dataset contains 5 labels which are Hello, Help me, Stop, Thank you and Yes which are commonly used phrases.

This means the EEG data for each trial is taken across 795 time units (Approximately 2 Seconds). All 64 Channels are recorded. There are 300 trials in total, 60 for each class.

The rows represent the class and the columns represent the trial number. This means that trial 1 and trial 2 has the label 2. Trial 3 has the label 1 and so on.

4 Extracting the Data

I can’t work with a MATLAB file when trying to create a Machine Learning Model. I used a library called pymatreader which can be used to import a MATLAB file and read its contents.

5 Practical Limitations

Expecting our users to wear a 64 channel EEG cap all the time isn’t a realistic option. We need a better approach.

Principal Component Analysis?

This may sound like a good option, but we need to keep in mind the hardware limitations. If the channels that contain the highest/most relevant information are not located in the vicinity, then it would be impractical to create an EEG headset that can capture all this information.

The Alternative?

Look at real-life EEG headsets that are portable. Two options are:

Neurosity Crown (8 Channels)
Emotiv Epoc X (14 Channels)

I will be thoroughly experimenting with both 8 Channel Dataset and 14 Channel Dataset and will be choosing the one which resulted in a higher accuracy.

6 The Model

After a lot of experimentation, this was the model I settled with. 68% on a Test Set consisting of 50 trials.

The Model I settled with after a lot of experimentation.

7 The Results

Model Consistency

Classwise Performance of the Model

8 Final Reflections

s I have mentioned before, this is only the starting point to demonstrate a working prototype as the dataset only contains 5 labels. As the number of labels increases, the complexity of the model will also increase. It may require more advanced methods such as transfer learning or using ensemble models to create a fully working imagined speech model that can replicate regular speech. Building such a model will require a lot of expertise, experience and research. It can also be expensive to acquire and process the data required to build the model. Regardless, I am confident that my initial prototype can serve as a good starting point to find a working solution.

The best model is tailored for each individual, but it is not a practical approach to bring in a specific user to collect their data and design a model solely for them. One method to circumvent this is by increasing the number of participants from whom the EEG data is collected and thus, somewhat generalizing the model, but this process can be expensive and can be time-consuming. It is better if the participants of the data collection are part of the intended audience for the EEG headset. This is because the EEG signals can vary significantly across different populations, and therefore using data collected from individuals who are not representative of the target audience can lead to a less accurate model. In addition, if the participants are part of the intended audience, they may be more motivated to participate in the data collection process as they may be eventual users of the product.

References

https://www.nidcd.nih.gov/health/statistics/quick-statistics-voice-speech-language#:~:text=By%20the%20first%20grade%2C%20roughly,disorders%20have%20no%20known%20cause.&text=More%20than%20three%20million%20Americans%20(about%20one%20percent)%20stutter.

https://www.degruyter.com/document/doi/10.1515/jisys-2022-0076/html

https://www.ncbi.nlm.nih.gov/books/NBK499849/

https://www.who.int/news-room/fact-sheets/detail/spinal-cord-injury

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8848641/#:~:text=The%20global%20prevalence%20of%20muscular,%E2%80%937.8%20per%20100%2C000%20people)

https://osf.io/pq7vb/

https://osf.io/ymvjz/

https://www.emotiv.com/product/emotiv-epoc-x-14-channel-mobile-brainwear/#tab-description

https://neurosity.co/crown

If you liked the project and want to hire me for a UI/UX Internship, please contact me at pradhyumnaag30@gmail.com. Additionally, I’d love to hear your feedback regarding the project.