Add Voice Interfaces to Python Apps, in Minutes!

Published in

Picovoice

3 min readJan 26, 2021

There are already several approaches for adding speech recognition to Python apps. In this article, I’d like to introduce you to a solution that is offline, private, cross-platform, and highly accurate — setting it apart from alternatives. Enter Picovoice! Picovoice is an end-to-end platform for building voice products on your terms.

Picovoice enables developers to add voice recognition to existing Python apps within minutes. I’ll prove this to you by showing you how to build voice-controlled radio buttons using Picovoice’s Porcupine wake word engine and the Tkinter GUI framework. The demo is cross-platform and runs on Linux, macOS, Windows, and Raspberry Pi. The code is open-source and available on Porcupine’s GitHub repository.

Porcupine Wake Word Engine — Python SDK — Radio Buttons

1 — Install Porcupine

Porcupine is a wake word engine. It can detect utterances of phrases within the stream of audio in real time. Install Porcupine from a terminal:

pip3 install pvporcupine

2 — Create an Instance of Porcupine

Construct an instance of the Porcupine engine that can detect utterances of Alexa and Jarvis:

import pvporcupineppn = pvporcupine.create(
    access_key=${YOUR_ACCESS_KEY},
    keywords=['alexa', 'jarvis'])

The above loads the default wake word models that are shipped with Porcupine’s PIP package. The set of default models can be retrieved using pvporcupine.KEYWORDS. Sign up for Picovoice Console to get your free `AccessKey`. `AccessKey` is used for authentication and authorization when using Porcupine SDK.

It is possible to instantiate the engine to track other (custom) phrases as well. The following tracks two phrases defined by keywords files located at keyword_paths:

keyword_paths = [
    "/absolute/path/to/keyword_file/1",
    "/absolute/path/to/keyword_file/2"
]ppn = pvporcupine.create(keyword_paths=keyword_paths)

3 — Process Audio with Porcupine

Once the engine is instantiated it can monitor a stream of audio for utterances of phrases. Simply pass frames of audio to the engine:

keyword_index = ppn.process(audio_frame)
if keyword_index >= 0:
    pass

The keyword_index is -1 if no phrase was spotted. Otherwise, it corresponds to the index of keywords passed to the factory method.

4 — Read audio from the Microphone

Install pvrecorder. Then, read the audio:

from pvrecoder import PvRecoder# `-1` is the default input audio device.
recorder = PvRecoder(device_index=-1)
recorder.start()

Read frames of audio from the recorder and pass it to Porcupine’s .process method:

pcm = recorder.read()
ppn.process(pcm)

5 — Create a Cross-Platform GUI using Tkinter

Tkinter is the standard GUI framework shipped with Python. Create a frame (window), add radio buttons to it, and launch the app:

import tkinter as tkwindow = tk.Tk()keyword_var = tk.StringVar(window)

for x in KEYWORDS:
    tk.Radiobutton(window, text=x, variable=keyword_var, value=x)

window.mainloop()

6 — Putting it Together

There are about 100 lines of code altogether for GUI, audio recording, and voice recognition. I also created a separate thread for audio processing to avoid blocking the main GUI thread.

If you have technical questions or suggestions please open a GitHub issue on Porcupine’s GitHub repository. If you wish to make modifications or improvements to this demo feel free to submit a pull request.