How I Built a Voicemail Server on Raspberry Pi

Sameer Mehra
9 min readOct 24, 2023

--

In this world of cloud hosting, it is easy to forget just how much you can do with a good old Raspberry Pi. So, as a challenge to myself, I decided to build a secure voicemail server that can accept audio messages and respond in real time using the OpenAI transcription and chat completion apis.

You can visit the site at sammy.sytes.net and the code is viewable here .

Note: most browsers will not let you access mediaDevices (including the client’s microphone) unless you are accessing via localhost, or a secure site with end-to-end encryption. Therefore, you will need to run certbot or OpenSSL in order to allow your site to be accessed outside of your network. Those steps are outlined below.

Technologies used: Node/Express.js, yarn, OpenAI, Mailgun, ejs, nginx, SSL/TLS.

Step 1: Set up Raspberry Pi

Photo by Harrison Broadbent on Unsplash

Set up your Raspberry Pi. I’m using the Raspberry Pi 4B running Raspbian.

Note: I had difficulty using nvm for managing node versions on my device, so I decided to use the ‘n’ package manager instead. A relatively up-to-date node version is required for for interacting with OpenAI apis.

I highly recommend using the free tier of VNC viewer with raspberry Pi, since I did most of my development on a laptop, and I could remotely control the Raspberry Pi with VNC Viewer.

Step 2: Create Express Server

You can set up an express server fast and quickly with the following commands:

npm init -y
npm install express

I created a file called server.js and set up the following routes:

const express = require("express");
const app = express();

// Enable /public directory to serve static files
app.use(express.static("public"));

// Start server
app.listen(config.PORT, () => {
console.log(`http://localhost:${config.PORT}`);
console.log(config.PORT);
});

// Home
app.get("/", (req, res) => {...});

// Recording page
app.get("/record", (req, res) => {...});

// Get the messages. Hidden endpoint
app.get(`/${config.VOICEMAIL_ACCESS_URL}`, (req, res) => {...});

// Clear all messages. Hidden endpoint
app.post(`/${config.VOICEMAIL_ACCESS_URL}delete`, (req, res) => {...});

// Upload an audio file message, and get a response from OpenAI
app.post("/upload", express.raw({ type: "*/*", limit: "6mb" }), async (req, res) => {...});

I have hidden the voicemail listening endpoint so that only I can have access (sorry!). This is not the ideal way to hide this site, and I may add password protection in the future.

First thing’s first: let’s render the homepage. We will use ejs for this.

// Home
app.get("/", (req, res) => {
...
let data = {
browser: req.headers["user-agent"],
};

ejs.renderFile("./index.html", data, {}, function (err, str) {
if (err) {
console.log(err);
}
res.send(str);
});
});

If we have template variables to pass, we can pass them in the data parameter. Just as a proof of concept, I added the client browser info as part of the response, so that we could see it working.

No longer a static site!

Step 2.5: Record and Upload Audio

Next, let’s add the ability to record a voicemail. You can follow the MDN guide here, or you can do what I did and borrow (read: steal) the code from a demo website. This demo uses WebAudioRecorder.js, which makes it easy to record multiple types of audio as a blob which can be easily uploaded.

Once you get the audio file blob from the user, you can upload it like this:

let response = await fetch(`${window.location.origin}/upload`, {
method: "POST",
body: blob,
headers: { "Content-Type": "audio/wav" },
});

Next, let’s handle this upload in the /upload endpoint. First, let’s limit the number of messages that can be present on the server. A bad actor (my girlfriend) could easily leave multiple messages (love notes) and cause my hard drive to fill up quickly. So let’s first check to see that there are not too many voicemails already present:

// Upload an audio file message, and get a response from OpenAI
app.post("/upload", express.raw({ type: "*/*", limit: "6mb" }), async (req, res) => {
console.log("received upload");

readdir("./public/messages", async (err, files) => {
if (err) {
console.error(err);
}

// Limit to 20 messages
if (files.length > 20) {
res.json('{"message":"Message inbox full."}');
return;
}
...

One important note is the limit parameter in the /upload route. This allows a limit on the size of the request. I set this to 6mb, which I estimated to be around 30 seconds of .wav audio.

Now here is where the fun begins. Once the user uploads the file, we want to do a few things:

  1. Save the file
  2. Respond with a witty remark
  3. Notify me of a new message by email

1 — Save the file — To save the file, we will use writeFileSync in the fs package.

...
const audioBuffer = Buffer.from(req.body);
const dateTime = new Date().toISOString();
writeFileSync(`./public/messages/${dateTime}.wav`, audioBuffer);
...

Now you have a voicemail server! You could stop here, but I decided to take it to the next level with AI responses.

2 — Respond with a witty remark — We want to give the impression that not only was the message received, but that it is important enough to merit an immediate response.

// Convert audio to text
const text = await getAudioText(openai, audioFile);

// Get completion from OpenAI
const completion = await getOpenAICompletion(openai, text);

Each of these functions will be calling a different OpenAI endpoint. First we convert the audio file to text using the audio.transcriptions endpoint, then we come up with a relevant response using the chat.completions endpoint.

/**
* Convert audio to text using OpenAI
* @param {OpenAI} openai - OpenAI client
* @param {File} audioFile - Audio file
* @returns
*/
async function getAudioText(openai, audioFile) {
const audioText = await openai.audio.transcriptions.create({
file: audioFile,
model: "whisper-1",
language: "en",
temperature: 0.9,
response_format: "json",
});

return audioText.text;
}
/**
* Get an OpenAi chat response based on the user input
* @param {OpenAI} openai - OpenAI client
* @param {string} text - User input as text
* @returns
*/
async function getOpenAICompletion(openai, text) {
const completion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
max_tokens: 200,
temperature: 0.9,
top_p: 1,
frequency_penalty: 0,
presence_penalty: 0,
messages: [
{
role: "system",
content: "You are a happy-go-lucky person. Come up with a funny quip about the last thing the user said.",
},
{ role: "user", content: text },
],
});

return completion.choices[0].message.content;
}

As you can see, I chose a system prompt that would let the completion service know what tone of response I am looking for. I wanted the responses to be kind of whimsical without trying too hard, so I came up with “You are a happy-go-lucky person. Come up with a funny quip about the last thing the user said.” — which seems to work decently well.

Voicemail: Hey can you call me back? I need to know if you want to go to the game tonight.

Response: Sure, I’ll call you back faster than a baseball player stealing second base! Just make sure you don’t drop the call like a fly ball, okay?

Oh GPT, you always know JUST what to say

Using this response, I returned it to the user and rendered it on the page.

// In server.js
return res.json({message: completion});
// Get post response
const json = await response.json();
alert(json.message);

3 — Notify me of a new message by email — We want the ability to be notified when a voicemail is received. There are many services that will send an email based on an API request. For this project, I decided to go with Mailgun for the simple reason that their JS and Python example requests were well written and the APIs are well documented (sometimes the fastest solution is the best!).

Using the mailgun.js library:

/**
* Send an email notification
* @param {string} text - Input of the user as a string
* @param {string} completion - Completion response from OpenAI
*/
function sendNotificationEmail(text, completion) {
const mailgun = new Mailgun(formData);
const mg = mailgun.client({ username: "api", key: config.MAILGUN_EMAIL_API_KEY });
const today = new Date().toISOString();

mg.messages
.create(config.MAILGUN_DOMAIN, {
from: `RPi Message <mailgun@${config.MAILGUN_DOMAIN}>`,
to: [config.MY_EMAIL],
subject: `RPi Message Recieved ${today}`,
text: "Text: " + text + "\n\nResponse: " + completion,
html: "<p>Text: " + text + "</p><p>Response: " + completion + "</p>",
})
.catch((err) => console.log(err));
}

Note that some email services will mark automated email servers as spam. You can follow the instructions here to whitelist an address in Gmail.

After all of this, you should be able to record voice messages and save them through localhost or while on the same network as your server. As mentioned at the beginning, browsers will not let you access mediaDevices (including the client’s microphone) unless you are accessing via localhost, network IP, or a secure site with end-to-end encryption. Therefore, the following steps detail how you can secure your site with SSL-type encryption, as well as host it publicly so that friends and family can leave messages.

Step 3: Set Up NGINX

We will use port forwarding with nginx so requests can be proxied to our express server. I mostly followed the instructions in this video, but the highlights are:

sudo apt update
sudo apt install nginx
sudo /etc/init.d/nginx start

Then run:

sudo nano /etc/nginx/sites-available/default

And add configuration to allow requests to be served by localhost:

server {
...
location / {
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
proxy_pass http://localhost:5000;
try_files $uri $uri/ =404;
client_max_body_size 25M;
}
...

client_max_body_size 25M; allows for larger upload size, which may be necessary for uploading large wav files.

The configuration above is all you need if your application only runs on a single route. If you have other routes, you will need to add rules for those routes manually, or use regex like I did here:

server {
...
# Home page routing
location / {...}

# Allow routes that use letters, digits, slashes,
# question marks, and colons to redirect to my application.
location ~ ^/([a-zA-Z\.\/\:\-0-9\?]+)$ {
proxy_pass http://localhost:5000/$1;
}
...
}

After making any changes to the default file, you will need to restart nginx with:

sudo /etc/init.d/nginx restart

In step 4 (Secure Domain with SSL) changes will automatically be made to this file, but you won’t have to edit it directly anymore.

Step 4: Register Domain

I used noip.com and it is completely free for a single domain. To get your network IP, you can look in your ISP’s admin console (for me — verizon — I had to click on ‘manage your internet’ and scroll down to where it listed my IPV4 Address). You will need to use this value when registering a noip domain so it knows how to forward the request.

After this step you should be able to view the site on the world wide web!

Step 5: Secure Your Domain

There are a few ways to go about securing your website, but if you already have a web server up and running, the best and easiest option is certbot. I found it very easy to use, but note that it requires you to allow a bit of software to run on your server at all times.

Assuming you are running Raspberry Pi with the Raspbian operating system, you’ll likely need to choose Debian as your OS, since Raspbian is a variant of Debian. You can follow the instructions here. Be sure to run the command specified for nginx:

sudo certbot --nginx

After this step, you should automatically see changes to your /etc/nginx/sites-available/default file.

One important step that is not mentioned in the certbot steps is ISP configuration. You will need to visit your ISP’s admin console and add https port forwarding if it not already present. In my case, I had to set up IPv4 port forwarding thorugh verizon. I selected my Raspberry Pi as the device, and specified HTTPS, TCP1 -> 443. For certbot to work, you may also need forward HTTP requests on port 80 to your pi. Furthermore, as time goes on your ISP may occasionally change the network ip address of your device, so you may have to re-open your device to HTTP on port 80 for certbot to work its magic and renew your certification.

That’s it! After restarting the server you should and reloading your domain, you be able to see the lock icon when you visit the site. From here, you won’t have problems recording audio.

Last couple suggestions:

  1. Set NODE_ENV to be prod — Not strictly necessary for such a lightweight, but it may make logging a bit cleaner. You can do this by executing: export NODE_ENV=production
  2. Run server on startup. This is useful because if you ever want to restart the server, all you have to do is turn it off and on. This can be done by adding the following lines to the very end of your .bashrc file:
echo "Starting ngnix and server..."
# Run nginx and server
cd /path/to/server/
node server.js

And that’s it! Now you can automatically respond to all your friends’ messages without taking your eyes off TikTok. Next steps are make sure your code is written to be fault tolerant, and work to prevent attacks like DDOS and injection. Oh yeah, and limiting requests per day so that nobody is running up your tabs on OpenAI and Mailgun!

--

--