Silent speech interface is a device that allows communication without using the sound made when people vocalize their speech sounds. The main goal of a silent speech interface is to accurately capture speech without in need of vocalization. The end result is similar to “reading someone’s mind”. It’s an exciting and growing technology as it is very suitable for human-machine interactions.
This Medium post will be about how does science accomplish the task, and what can be done using silent speech.
Not all silent speech interfaces are created for one general purpose. The goal of a silent speech interface can be generating actual sound (e.g. for larynx cancer patients), or generating text, or to be used as a mean of an interface between humans and computer systems… As the goal differs; methods and even tools differ as well.
Brainmab aims to create a brain-computer interface (or human-machine interface) to interact with Brainmab Platform which people can spend money to peers using Brainmab Token, or send/receive information from cloud. Silent speech interface is our go-to technology in addition to EEG (for gathering data from brain itself) and EMG sensors for extremities (for detecting motion gestures).
The first step to obtain silent speech is to find the right tool to monitor activity. The source for data is elements of human speech production, neural pathways or brain itself. Here’re some popular tools to gather data from human speech system:
Electroglottograph: Measures how much electricity flows across larynx. It can be used to measure the distance between vocal folds.
Electromyograph: Measures skeletal muscle electrical activity. It can be used to measure electrical activity in facial muscles and tongue to gather important information about what a person intends to say.
Photoglottograph: Observes glottal movements and vocal fold vibrations.
Raw data do not mean anything by itself. It must be processed and analyzed to be used. Thanks to the developments in machine learning and statistics, it is now more possible than ever to analyze huge amounts of seemingly noisy and random data and extract meaning from them.
Training machine learning models for silent speech is not different than training them for image recognition by concept. The algorithms may differ, but the idea is the same. Either labeled or unlabeled, data is given to the learning model and it is expected to establish a link between the data and its label, or to cluster inputs into categories if the label is absent. After the training session, the model can accurately predict the results for given input.
Not all silent speech interfaces aim to control a computer system but as Brainmab does, we’ll explain how the process works.
People use computer based systems everywhere. The control mechanism differs but the at the core, they are alike. People use their fingers to tell something to their smartphone and receive the respose either from their eyes or ears. Personal computers get their instructions from mouse, keyboard or their touchscreens (nowadays) and people get their response with the same way they do while using smartphones. Voice assistants use voice to receive queries and output the response via sound. You have to give instructions to the device, and the device has to give response to you. Anything between those two are automated by software. This is the essence of computer software.
A silent speech interface does not use mouse, keyboard, voice command or touchscreen. It uses people’s inner voice which they don’t have to take effort to generate. The device parses and analyzes raw silent speech input to transform it into useful answers, queries or requests. The rest is automated by computer software. All you do have to do is to think. And you’ll receive answers. Again, at the core, it’s the same experience but the I/O differs. It comes to a point where the following scenario can take place in reality:
Silent Speech Interface: Do you want to pay your USD 50.00 electric bill right now?
User: Oh, yes. Confirm.
Off it goes. And where did the request come from? It is programmed in the past to push you notifications as you receive payment requests. Possibilities are limitless as the software technology goes. You could’ve paid for the groceries, stocks, cryptocurrencies etc.
User: Brainmab, send 100 Brainmab tokens to Alice.
Silent Speech Interface: “Send 100 Brainmab tokens to Alice” do you confirm?
User: Yes, confirm
And again, off it goes.
The video below is a remarkable example for silent speech interfaces. Coming from MIT, name of the project is AlterEgo. It uses EMG electrodes to measure electrical activity of specific facial muscles and then uses a machine learning model in order to control the software inside the device. The device can send information to the user through bone-conduction headphones.
AlterEgo is a closed-loop, non-invasive, wearable system that allows humans to converse in high-bandwidth natural language with machines, artificial intelligence assistants, services, and other people without any voice — without opening their mouth, and without externally observable movements — simply by vocalizing internally. The wearable captures electrical signals, induced by subtle but deliberate movements of internal speech articulators (when a user intentionally vocalizes internally), in likeness to speaking to one’s self.
This is another example of silent speech interfaces. It differs from the last one as this one requires visible oral movements to generate text from silent speech. The presence of visible movement increases the quality of output.
This demo shows how a silent speech interface can be used to generate sound. Coming from Thomas Hueber from CNRS/GIPSA-lab, it uses ultrasound imaging for tongue and an infrared camera for the frontal mouth view. The system then analyzes the image input and tries to mimic what it has learnt.