How to Build a Voice-Controlled Speaker that Protects Your Privacy

A step-by-step guide to building a privacy-enabled, open-source, voice-controlled speaker using ready-to-buy hardware

Quentin Zibra
Snips Blog
12 min readApr 21, 2017

--

We’ve built an open-source smart speaker for streaming Spotify music, focusing on music playback, making it easy to control the music you are listening to just by saying what you’d like to happen out loud. It’s purely a demo project, but we’ve grown used to the convenience, and so we wanted to make it as easy as possible for anyone interested to replicate it at home.

Making a voice-driven product using Snips AI technology — our natural language processing Snips SDK — is actually very quick.

How quick? We’ll show you.

We learned a lot throughout this project about the state of music playback on the Raspberry Pi, Arduino, and various IoT technologies, and would like to share the most interesting parts. We’ll go through every part of the making of the speaker. But to make it as simple as possible, we’ve broken it down into 5 parts:

Step 1: The Raspberry Pi image
Step 2: Bill of Materials
Step 3: Setting up Software & Drivers on the Pi
Step 4: Adding Voice Control to your speaker
Step 5: Lights & Sounds
Step 6: Assemble everything

So, here’s a short account of the making of this speaker, with descriptions of the main components and key learnings. Some parts are more technical than others, but we hope that it may be of use to anyone attempting to build a similar project themselves. It certainly would have been very useful to us a couple of months ago when we made our first steps on this project! So let’s get started.

Step 1: The Raspbian image

You will find here the steps to install an all-in-one Raspbian image dedicated to sound control. If you want to add other functionalities to your speaker (say, a weather forecast) you can check a more advanced configuration here. Let’s get started:

  • Download the Snips Spkr Raspbian image
  • Burn it to on a micro SD card that will go into your Pi
    (some great tools are available online, for instance Etcher)
  • Plug your SD card in your Pi, turn it on and wait a minute
    for the Pi to boot
  • SSH into your Pi (hostname: sprk, username: pi, password: raspberry)
$ ssh pi@spkr.local
  • Now onto the interesting part! Here’s how easy it is to add voice control to your speaker. 1/Retrieve the NLU engine from Docker
docker pull snipsdocker/platform
  • Copy your Spotify credentials in a config file, in your home folder (refer to step 3.a)
  • Install dependencies
cd home-python
sudo -H pip install -r requirements.txt
/home/pi/mopidy_setup.sh
  • Run the Snips NLU service at boot
sudo systemctl enable spkr.snips.service
  • Reboot your Raspberry

Ok so now you have the brick of AI on your raspberry pi ready to be connected to its communication organs: ears (microphone), vocal chords (speaker), and lights (leds)…!

Any questions, improvements ? Check the forum

Step 2: Bill of Materials

Here’s what hardware you need to have our voice-activated speaker ready to work in your home. We’ve put links to find it at good prices. If the links are not valid anymore please send us an email we’ll help find replacements.

Important parts:

1. Raspberry Pi Model 3 — link / 35€ ($37)
2. Hifiberry AMP+ & 12V Adapter — link / 50€ ($53)
3. Speaker (25W, 4 Ω, 10mm) link / 13€ ($14)
4. USB Microphone — link / 20€ ($21)

Fancy parts:

These part are not mandatory, but they make your voice-activated speaker stand out! 😎

5. Arduino Micro — link / 18€ ($19)
6. Neopixel Ring 24 — link / 15€ ($16)
7. Passive radiator — link / 11€ ($12)

The Case:

To make it easier for you to assemble those parts we made a 3D printable case.

The Plans are here.

If you don’t have a 3D printer, that’s fine, we don’t either! We used 3D Hubs services to print our version (it was fast and the price made sense). The price will depend on the material you use. It cost us 95EUR (100$) to print at our hub in Paris.

Miscellaneous:

DC Female plug — link / 3€
HotGlue Pistol — link / 6€
Screws 2mm x 6mm
Wires (Audio and Electronic)

Any questions, improvements ? Check the forum

Step 3: Setting up Software & Drivers on the Pi

a. Setting up Spotify

The first step in making a voice activated speaker was to set up music playback. For this, we decided to use Mopidy, an open-source music playback platform that provides easy ways to connect multiple sources of music and multiple ways of controlling playback. It has built-in support for Spotify, a number of existing web-based controller apps, and an extremely well documented API for controlling the playback programmatically.

To enable streaming music from Spotify, you need a premium account that has e-mail authentication. If you created your account with Facebook, there is an option to add an e-mail password authentication.

/!\ This account must use a username and password to authenticate, and not the Facebook authentication.
  • Create a Spotify application
  • Retrieve the client_id and the client_secret from the application
  • Keep this information at hand, and copy it in the “home/pi/config” file

b. Connecting the Hifiberry

To power the speaker and to improve the sound quality (the default sound quality on the raspberry Pi is … not the best) we are using a Hifiberry AMP+. There are many sound cards that you can use. Some are made specifically for the Raspberry Pi, although most sound cards ought to work. We decided to go with the HiFiBerry because it seemed like the easiest option to get started with at the time, and we’re happy with the result.

The good thing is that you don’t need and extra alimentation for the hifiberry, because it powers the Raspberry pi

The HiFiBerry sits on top of the Raspberry Pi, as a shield. It requires a 12V power source to power itself and the Pi, and has a connector for hooking up speakers. In order to be recognized by the system, some minor modifications must be made to the ALSA config file, but the HiFiBerry tutorials make the process fairly straightforward.

From the /boot/config.txt file, remove the line:

dtparam=audio=on

And add the following line (this varies depending on the model of card you are using, here we are using a HiFiBerry AMP+)

dtoverlay=hifiberry-amp

And you’re set !

Any questions, improvements ? Check the forum

Step 4: Adding Voice Control to your Speaker

Using the Snips SDK, we were able to easily add voice control to our music player via the web interface. Once you’ve told the Snips SDK what kinds of phrases you want it to understand, all you need to do is pass it a voice command in text, and it will return its meaning.

There are a few separate components of the SDK that make the transition from saying a command out loud to what you expect to happen happening on the Pi: In order to have a fully integrated voice control, you need a microphone, a way to determine when to start recording the voice command (this is called hotword detection), and a way to transform the audio of the command into text.

At Snips, we believe that you shouldn’t have to sacrifice your privacy for the convenience of using AI, which means that there has to be a way that doesn’t have a constant audio live-stream from your house going to the cloud, which is how the most popular similar products work today 🤔. So for now we are not fully private by design but as soon as we solve the Speech-to-text on device, we will just need to swap the two code bricks and to claim it loud and clear!

a. Adding the Hot Word to the mix

Detecting hot words from the ambient noises of your house is crucial to the voice control: Once a hot word is detected, it triggers the recording of the voice command.

Fortunately, there is a way to do hotword detection on device thanks to Snowboy. This open-source framework allows you to use your own hotword, and thus allows you to start recording at the right time. Once a certain amount of time has passed, the recording is stopped, and the audio file is transformed to text.

By default, spkr will react to the “Hey Snips” hotword. You are free to use whatever hotword you’d like though:

  • Train your own hotword model by heading over to: https://snowboy.kitt.ai/
  • Retrieve the parameters file (with the .pmdl extension)
  • Load it to your raspberry in the folder: /opt/snips/config

b. Configuring the Speech-to-Text

To enable speech to text using Google Speech to Text, you need a Speech Service access. You can get account and credentials file there:

https://cloud.google.com/speech/. You can have free access for limited monthly usage. Don’t worry the data is only sent to google when the hotword has been detected, so it will ensure privacy when the hotword is not detected + you wont exceed the monthly limit of 60 minutes.

  • Enable Google Speech API
  • Download the credentials in json format
  • Copy this file at the right place on the raspberry:
$ scp <PATH_TO_GOOGLE_CREDENTIALS_JSON> pi@spkr.local:/opt/snips/config/googlecredentials.json

Any questions, improvements ? Check the forum

Step 5: Lights & Sounds

Without some form of feedback, it is impossible to know whether the speaker is listening or not, and whether your command worked! 🤖 So, we wanted to add both sounds and some LEDs to reflect that confirmation from spkr to the user. For the LEDs, there are multiple options available. The simplest is to hook up some LEDs to the GPIO pins of the Raspberry Pi. There is a built in framework for command of the pins in Python, and we had this running within a few hours.

While this gave us some fun results, we wanted more LEDs (they are limited to 8 on the GPIO). So we choose to go withe a NeoPixel ring controlled by an Arduino. The ring we used has far more lights, and has a great deal of expressivity even without the same intricacy of animations.

a. Configure the Arduino and the NeoPixel ring

We made a library to use the arduino micro as an actuator of the Raspberry Pi here are the steps to configure it:

  • You install the Arduino IDE (here) it will help you upload the code into the Arduino.
  • Clone/Download the Neopixel Ring library into the Arduino library folder.
*/documents/arduino/libraries
  • Clone/Download our Snips_lights library into the same Arduino library folder
  • Open the Snips_lights library example called Home.ino
  • Plug your Arduino Micro and upload the exemple via the Arduino IDE.
Now that the Arduino is set-up you might want to test the ring right away, go to the step 6 to find out how to wire everything...  😎

b. The sound library

In order to communicate the state of the speaker to the user, such as “started listening” or “error”, we wanted to play some custom sounds, without disrupting the music. We explored various text-to-speech solutions, and found some very impressive offerings, all of which were too expensive. Instead, we found some royalty-free sounds, similar to those made by R2-D2, that communicated the state well enough. And that, of course — was cool! 🙊

Any questions, improvements ? Check the forum

Step 6: Assemble everything

You are almost done! In this step we will guide you through all the wiring. You will need a soldering iron for some parts, but its not super complicated otherwise. You got this!

If you choose to go with the 3D-Printed case you will receive these 3 parts:

  1. The top part will be holding the Neopixel Ring, the Raspberry Pi, the USB Microphone
  2. The body part is only for the speaker and the passive radiator
  3. The base is screwed to the body to close the case.

a. The body

You will want to start with the body because it will simply become to hard to handle after.

  • First you need to attach two 30cm wires to the DC female plug.
  • Then you need to glue the DC in its deserved hole (see picture)
    making sure that te two wire pass through the little hole just behind it and go up to the top hole. Tape them on the top surface of the body so they dont fall during the next steps.
  • Next you need to solder two 30cm audio wire on the speaker.
  • When done, you need to glue the speaker in its support (1 on the above image) opposite to the side of the DC plug, with the wire getting out of the body via the upper hole.

The key here is to have an enclosure as sealed as possible, so don’t hesitate to add a lot of glue to attach it. ✌️

  • Do the same for the Passive Radiator, on the other side.
You should end up with something like this :)
  • Then screw the Base to the bottom of the Body part.
  • You can now add a blob of glue where the cable go out of the top part. to seal it and avoid air to go through.
We also did a Y wire soldering so the left and right channel are sent into the only speaker.
  • Next, add the Hifiberry on the top of your raspberry Pi and connect the audio-wires into the left and right output. Connect the DC wires in into the Power Input.

b. The top

Now that the base is done, the best part is coming: The Brain.

  • First you need to solder the Neopixel ring and the Arduino Micro. Start to solder red and black wire to the 5V and Gnd pins of the Neopixel Ring then solder a color wire to Data pin.
Do the soldering on the front of the ring, it will be easier to glue it later. Although be careful to not create short circuit.
Neopixel soldering
  • Then solder the red and black wire to the 5V and GND of the Arduino Micro, and the Data on pin 6.
  • Then glue the neopixel inside the TOP Part of the case. (1)
  • The next part involves the extraction of the USB microphone from its case. Keep just the mic, the wires, the USB card and plug. Then glue the mic on the TOP part of the case where the holes are. (2 on the image above).
  • Then plug the Arduino Micro and the USB mic on the USB port of the Raspberry Pi
  • Close the top part, and power it using the Hifiberry power supply.
  • You are done! 🎉 Congratulations !

Your spkr should work when you turn on the Rapsberry, and you should not have to manually run command to have it working.

To accomplish that, we created two system services:

  • spkr.snips.service: this service is responsible for launching the docker container that contains the NLU engine and the communication bus.
  • spkr.python.service: this service is responsible for launching python scripts that controls lights and communicates with the container.
You can find the unit files in the /lib/systemd/system folder.

You can test it by saying “Hey Snips!” Wait for the light to become blue and then say, “Play me some jazz.” or “Play Supertramp”.

🎉 🎉 🎉 🎉 🎉🎉 🎉🎉 🎉🎉 🎉🎉 🎉🎉 🎉🎉 🎉🎉

So, there you go! Congratulations! You now have your own Smart Speaker!

It’s not perfect, but it’s open source — so if you want to contribute to the project please join the community, and post your contribution.

We’d be happy to hear from anyone who attempts to recreate this and are happy to help and answer any questions on the forum!

Ps: Here are the sound commands you can use after saying the hot word. Might come in handy.

- Play / Pause / Stop
- Next Song
- Play some Jazz/Rock/Dirty South/… you name it- Play some Beyonce/Eminem/etc….
- and other secret one you’ll have to find out

If you enjoyed this article, it would really help if you hit recommend below :)

Follow us on Twitter @zibra_ , @anthooo, @michaelfester, @murdix,

@nebuto, @superlopuh and @snips

If you want to work on AI + Privacy, check our jobs page!

--

--