How We Got Alexa to Join Our Band at TechCrunch Disrupt

Capital One Tech
Capital One Tech
5 min readDec 14, 2017

--

By Ardon Bailey, Software Engineer; Nagkumar Arkalgud, Software Engineer;
Timothy Street, Associate Software Engineer, Capital One

Music production in a Capital One blog? Absolutely. We have a number of software engineers at Capital One who are also passionate about using their technical skills to produce great music.

Earlier this year, we (Tim Street, Nagkumar Arkalgud, and Ardon Bailey) attended the TechCrunch Disrupt Hackathon in San Francisco in hopes of designing something that allows recording artists to create music — using, wait for it…only their voice. The result was Odis, a platform that interfaces with Amazon Alexa and converts natural language instructions into MIDI (Music Instrument Digital Interface) commands. With Odis music producers can record, playback, and edit tracks (songs) in studio recording software in real time.

Odis was so well received at the hackathon that it was not only featured on TechCrunch.com, but also won the Amazon Alexa award.

There Has to be a Better Way

Before Ardon Bailey became a software engineer at Capital One, he was a DJ/Tablist who regularly competed in DJ battles and performed at concerts and parties. When creating certain tracks, he found it awkward to record with an instrument in hand. For example, when playing guitar, the steps to record a sound are as follows:

  1. Hold guitar, ready to play
  2. Begin recording by physically triggering record in recording software
  3. Reposition as quickly as possible to play guitar
  4. Play until you’re satisfied with the sound
  5. Press stop recording as quickly as possible

The back and forth routine between the computer and instrument can be quite tedious, especially during long sessions. And if that wasn’t enough, often times the instrument physically gets in the way. Unless there’s someone around to lend a helping hand, these steps are done yourself.

Enter the idea for Odis. As we brainstormed for our hackathon project, we knew we wanted to create something that made the track creation process as easy as possible. Our number one priority: make it more convenient for artists to create content in their chosen environment. To satisfy this constraint, we needed something that was easy to integrate into already existing recording setups. So, our solution was to create a hands-free, voice-controlled recording assistant.

Using Odis, recording sounds on a guitar is simplified:

  1. Hold guitar, ready to play
  2. Say: “Alexa, ask Odis to start recording”
  3. Play until you’re satisfied with the sound
  4. Say: “Alexa, ask Odis to stop recording”

Because the recording session is entirely voice controlled, artists can focus on what matters: the music. Control of the studio recording software is literally taken out of the artists’ hands and offloaded onto Alexa, who does all the heavy lifting.

Side note: Odis also implements other common features seen in studio recording software, such as sound playback, adding and removing MIDI effects, etc. Theoretically, anything supported in the MIDI library is fair game.

The Architecture

The architecture behind Odis is simple, and can be implemented across many different DAWs (Digital Audio Workstations). Since Odis’ Alexa voice commands resolve down to MIDI instructions, it is easily integrated with most DAWs.

Sketch of the Odis Architecture

Odis has three main components:

1. Alexa Skill (Python)
We decided to use an Alexa Skills Kit triggered Lambda as our interface to Odis. The Lambda is written in Python and handles interpreting natural language instructions, formatting HTTP request payloads, and then sending them to the client application (running on the user’s computer) using Pusher.

2. Pusher
Pusher was central part of the architecture behind Odis. At a high level, it basically translates RESTful web requests into messages over a WebSocket. Because Pusher handles all the logic associated with opening and closing connections over WebSocket, all we had to do was add calls to Pusher in the Alexa Skills Kit Lambda and add functionality to the MIDI client so it can listen to incoming requests over the Internet. Because Pusher is free and does most of the work for us, it was an easy decision to make this our communication tool between Alexa and the client.

3. MIDI Client (Node.js)
The MIDI client runs alongside studio recording software on the user’s computer. The client wraps the MIDI library and listens over a WebSocket for incoming requests from the Alexa Skills Kit Lambda. When a request is received, depending on the content, a specific MIDI instruction is sent to whatever recording software the client is using.

Overcoming Challenges

We faced a number of roadblocks when creating Odis.

1. Creating a deployment package for AWS Lambda
While Amazon’s documentation to create deployment packages for Lambda seems simple and straightforward, it wasn’t quite thorough enough for us. We had trouble including some of the 3rd party libraries and configurations necessary to run Odis the way we wanted. Thankfully the team was able to contact a colleague who had already worked with creating deployment packages in Production for Lambda. We called him up and he was happy to help us crush this roadblock. Thanks Chey!

2. SSL issue in Python 2.x for AWS Lambda
When we created the deployment package and tried to send a push notification via Pusher, (i.e. send out a RESTful HTTPS request from Lambda) the request could not be completed. After troubleshooting, we found out that Lambda uses its own security modules instead of the ones uploaded in the deployment package. We solved this issue by converting our 2.x Python codebase to 3.6, which follows different standards for dependencies in deployment packages. After the change, everything “magically” worked!

3. Add cool
In our opinion, one of the coolest features within Odis is the ability to add predefined effects to previously recorded music. We wanted it to work on the command “Alexa, ask Odis to add effect”. The triggers for that phrase wouldn’t work no matter how much we tested it. As it turns out, Alexa gets confused between the pronunciations for ‘affect’ and ‘effect’ in its language detection algorithm and we had to change the phrase, last-minute, to “Alexa, ask Odis to add cool”.

The Future of Odis

Moving forward, we hope to explore other DAW’s (Digital Audio Workstations) and possibly a desktop application that will run a script for different DAW’s to configure the mapping of MIDI controls.

These opinions are those of the author. Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners. This article is © 2017 Capital One.

--

--

Capital One Tech
Capital One Tech

From our founding, we’ve used tech to change the banking industry. Today, our innovations are making banking better for tens of millions of our customers.