Improving Voice Experiences

Julien Odent
Wit.ai
Published in
2 min readNov 11, 2021

Since we launched the Wit-Unity SDK, we’ve been deeply focused on improving voice applications in AR and VR. Today, we’re excited to announce the experimental release of the Voice SDK, as part of the Meta Presence Platform. If you missed the Voice Interactions in VR session at Facebook Connect, you can watch it here. You can download the Voice SDK today, in the Oculus Integration SDK v34 release. Read more about the Voice SDK on the Oculus Developer Blog.

When deploying voice applications, a key factor in the quality of the experience is the perceived latency. Using voice is natural, but it’s only efficient if it’s fast.

We’ve upgraded our POST /speech API to deliver transcriptions as they come. For versions above 20210928, intermediate speech recognition results will be sent in chunks. The last chunk will return the final transcription with the intents, entities and traits as usual. Refer to the latest API documentation for further details.

You can now use the partial transcriptions to build your own attention system! We believe this is an important update toward richer voice experiences.

We’ve also significantly improved the overall latency of our POST /speech API. For better performance, please consider sending 16kHz raw, wav or flac audio. The other audio encodings are still supported, incurring a higher processing time.

Finally, we’ve upgraded our Automated Speech Recognition (ASR) English model to a better architecture, with personalization support. This means that the utterances you’ve validated in your app will help improve the ASR accuracy.

Note that POST /speech is still optimized for short voice inputs.

As you know, the context of a natural language request matters. We’ve enabled dynamic entities support for GET /message and POST /speech. You can now extend your defined keywords entities with new keywords and synonyms for each request! Dynamic entities will help inform the ASR processing in English as well.

To help you get familiar with NLP in Unity, we’ve enabled Built-In NLP in the Voice SDK. We’ve shared 16 new built-in intents for English, and we’ve extended the support for built-in entities with 59 new entities in 16 languages. Thank you for all your contributions to Duckling!

The Oculus Voice SDK is a drop-in replacement for the Wit-Unity SDK. It offers the same API and utilizes the Wit-Unity SDK under the hood. We encourage early Wit-Unity developers to upgrade to the Oculus Voice SDK.

As always, we welcome your feedback, comments and suggestions. Please reach out to us on GitHub!

Wit.ai Team

--

--