Siri Gets a Barista Job: Adding Offline Voice AI to a SwiftUI App

Ian Lavery
Apr 29 · 3 min read

Siri has access to the world’s knowledge, but what if all I want is my favorite cup of coffee? The Speech framework offers cloud-computed transcription, but it seems like overkill to send audio data over the internet to a server farm that understands the entirety of the English language when all I need it to understand is how I take my coffee. Let’s see how we can achieve this with the offline voice recognition platform from Picovoice.

1— Create a Simple GUI

Before we dive into using the Picovoice platform, we need a UI to work with. SwiftUI has made it easier than ever to create visually appealing stateful UIs. In about a half-hour I was able to mock-up a GUI with a coffee maker image, some text prompts, and a collection of stateful buttons. This is what I came up with for a Barista app:

2 — Add the Picovoice Cocoapod

If you’re new to iOS development, you may be unfamiliar with Cocoapods. Cocoapods bring modern package management to iOS, allowing developers to add powerful extensions to their apps with minimal effort.

To install the Picovoice pod using Cocoapods, add the following to your Podfile:

source 'https://cdn.cocoapods.org/'

3 — Initialize the Voice AI

The Picovoice Platform contains two speech recognition engines: the Porcupine Wake Word engine and the Rhino Speech-to-Intent engine. The combination of these two engines allows us to create voice interactions similar to Alexa, but without requiring an internet connection. For instance, we could say:

Hey Barista, could I have a medium coffee?

The phrase ‘Hey Barista’ will be detected by Porcupine; Rhino will interpret the rest of the request that follows. Rhino uses a bespoke context to decode the command, without transcribing it to text. When the engine has made an inference, it returns an instance of an Inference struct; for the above sample phrase, the struct will look like this:

IsUnderstood: true,
Intent: 'orderBeverage',
Slots: {
size: 'medium',
beverage: 'coffee'
}

In order to initialize the voice AI, we’ll need both Porcupine and Rhino model files. The wake word model file (.ppn) tells the Porcupine engine what phrase it is supposed to continuously listen for, while the context model file (.rhn) describes the grammar that the Rhino engine will use to understand natural voice commands specifically related to ordering coffee.

Picovoice has made several pre-trained Porcupine and Rhino models freely available under Apache 2.0, which can be found in the Picovoice GitHub repositories[1][2]. For our Barista app, we’re going to use the trigger phrase Hey Barista and the Coffee Maker context, which understands a collection of basic coffee maker commands.

After downloading hey barista_ios.ppn and coffee_maker_ios.rhn, add them to the iOS project as a bundled resource so that we can load them at runtime. Then we can initialize the Picovoice Platform:

The method picovoiceManager.start() starts audio capture and automatically passes incoming frames of audio to the voice recognition engines.

To capture microphone audio, we must add the permission request to the Info.plist:

<key>NSMicrophoneUsageDescription</key> 
<string>To recognize voice commands</string>

4 — Integrate Voice Controls

The best way to control SwiftUI from code-behind is to create a ViewModel and have the UI observe it. Our UI controls are simple: we want 1) some indication that the wake word has been detected and 2) to display our drink order. Create a struct to represent each button state and state variables to show and hide text; the UI will then be bound to these parameters because they use the Published keyword. After adding these, our ViewModel will look like this:

We can now issue voice commands that alter the UI automatically, making our app entirely hands-free. Finally, Siri knows how I like my coffee.

The full source code from this tutorial can be found here. For more information regarding Picovoice SDKs and products, visit the website, docs or explore the Github repositories. If your project requires custom wake word or context models, sign up for the Picovoice Console.

Picovoice

Edge Voice AI Platform

Picovoice

Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate.

Ian Lavery

Written by

Polyglot Programmer and Multimedia Artist

Picovoice

Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store