When we were first making the Ubi, one of the projects that we started was implementing algorithms to help with far field interaction. We knew we had to do a few things:
- Augment the gain of the voice signal (i.e. turning up the microphone volume)
- Isolate voice from other noise in the signal
- Remove reverberation
If we could accomplish these three things, then it could prove to be very advantageous for far field interaction. Maybe we could extend the effective distance for interaction with the Ubi from 2 to 4 m in a quiet environment and from 1 to 2 m in a loud one. Those are not trivial improvements.
It was a black hole.
We spent close to a year on different projects and sunk in a small fortune (for a startup) into trying to implement our own DSP. The real challenge was taking theoretical far field algorithms, implementing them in simulation, then porting the code to run on low power DSP chips or even in non-real time OS’s. The latter two steps were excruciating.
After all of that effort, it wasn’t uncommon for us to ship audio to Google for speech to text only to have it return either as a null result or with worse performance. Frustrating! It turns out that there could be artifacts that we were removing from the audio signal that might have been used by some machine learning algorithm.
While at the time the technology wasn’t on the market, today we’d take a very different approach.
We’d likely use an available DSP chip with far field technology and focus on tuning. We’d run experiments to test at different distances for different tuneable variables of the DSP and at different microphone distance placements. A huge savings in time, headaches, and frustration over trying to implement from scratch!