Back in late 2017, when Amazon launched their C++ SDK for Alexa, we thought it would be fun to port the SDK on ESP32. It not only will showcase the powerful development tools and execution environment that ESP32 currently has, but it will also be a great way to run Alexa on ESP32.
We recently launched a beta preview of this Alexa on ESP32 on github.
The Alexa C++ SDK is targetted for micro-processores, and is quite heavy to be run on micro-controllers. We wanted to see what kind of a load would it generate on ESP32. Thankfully, the hardware (ESP WROVER module) and the software development framework (IDF) were very robust to support such performance intensive use case. Here are some details:
In a normal Wi-Fi connected state, the ESP32 typically forks about 13 threads for its operation. These include threads for Wi-Fi, the network stack, application threads among other things.
In the normal Alexa operation, the SDK forks a whopping 47 threads (inclusive of the 13 threads above) to get the job done. All these threads merrily co-ordinate with each other on the ESP32’s two cores performing audio record, transmit, receive, decode and playback operation.
All these threads need to have their stacks in memory. Additionally, we need significantly large ring buffers for audio record and playback. And then there’s 2 TLS connections (one for HTTP2 for the primary Alexa communication, and the other for HTTP1.1 managing OAuth).
The SPIRAM (external SPI memory) is heavily used for many of these buffers and thread stacks. Although being accessed over SPI (and hence relatively slower than the main memory), the caches on the ESP32 ensured that we did not see an end-user visible degradation.
In terms of memory, we try to keep around 15–20KB of free main memory, and the SPIRAM is about half-way (2MB) full.
Given the footprint constraints of the platform, and the size of the Alexa CPP SDK, we had to make sure we know what component added how much to the static memory footprint. And then optimize components that added too much. The idf_size.py utility in IDF was a very important tool to identify and check which libraries are adding to the static footprint, and what can be optimized out.
A bountiful path
Additionally, we made a few minor modifications to better support some of the usecases. Listing them down here, if you find them useful.
Alexa requires an HTTP/2 connection with their cloud. So we started with the HTTP/2 client (nghttp2) that is part of IDF. The nghttp2 library is very flexible with its multitude of callbacks. Because of the flexibility though, as you start using it, it is easy to miss the forest for the trees. So we created a tiny layer on top of that called sh2lib (simple-http2 library). As with any simplifying layer, it does offer simplicity at the cost of flexibility. But by using this simplification we could keep the code more organised, as in this example. Maybe that simplicity-flexibility tradeoff is not for everyone, so it’s kept into the IDF’s examples/ section for now.
The next stop was TLS. We created a layer esp-tls on top of mbedTLS. This layer encoded the common tasks of setting up a TLS session and performing data exchange on this session. Apart from simplicity the layer should try to ensure that it chooses the default secure configurations with minimal scope of error. This was to avoid situations like, Oh I forgot to perform server certificate validation, or Oh I didn’t setup CN verification. This layer is also now a part of IDF.
IDF already includes C++ development support. The Alexa CPP SDK extensively uses features (C++11 included) like threads, shared-locks, smart-pointers, futures and lambda expressions from IDF.
All in all, the hardware and software platforms have been robust and comprehensive to meet these demands thrown at it. We will continue to improve it even further. It’s been an exciting project to work on.