One order of NLP, to go!

Published in

Slalom Build

8 min readJun 5, 2020

Natural Language Processing: Voice ordering with Amazon Transcribe and Amazon Lex

Slalom was recently approached by a Quick Serve Restaurant (QSR) client that wanted to leverage voice-to-text and Natural Language Process (NLP) technology in a way that would increase customer throughput. The environment was expected to be a high traffic flagship store and we had to integrate this new technology without sacrificing customer experience. Oh, and it had to be done in five weeks.

The Slalom team was beyond excited (perhaps even a little blinded by said excitement) to take on this HUGE challenge. These are our experiences over those weeks.

The _builders

As the Solution Owner, it was my job to quickly align the team on how we would tackle the next five weeks, with a stout team that included a Solution Architect, Experience Designer, Senior Engineer, Cloud, DevOps & Security Engineer, and Quality Engineer, we were up and running. And fast! Given the size and scope of this project, this team had to assemble, assess, and execute the proof of concept within the given timeframe using our Product Engineering Methodology (PEM) to deliver a great proof of concept.

The stake out

The core team of Experience Designer, Solution Architect and Solution Owner settled in at a few restaurant locations to quickly asses the current customer experience. As we observed, from entry to exit in the store, the customer decision time was the longest part of the process. Once the customer decided what they wanted, they would approach the counter and place an order. An employee would then fill that order and pass it to the cashier to confirm and complete the sale. What we learned pretty quickly was that the average order time was 3 minutes 45 seconds, with some taking a staggering 6 minutes to complete! The scope of our work didn’t allow us to address store operations, signage, or even in-store flow management, so we had to be hyper-focused on how the order could be placed verbally by the employee in order to speed up the ordering time. The on-site observations were critical to ensure our proof of concept fit within their current operational processes.

The Work

Our charter was clear: use Amazon Lex and Transcribe to produce an end-to-end voice-to-text proof of concept application that would reduce order time for our client. We time-boxed the build to three weeks and began tackling the work.

And our solution was simple, yet complex. The simple: use Amazon Lex and Transcribe in our solution. The complex: enable the order taker to repeat the customer order into noise cancelling headset, Transcribe would then ‘hear’ and process the order. The customer and order picker would see a visual representation of the order and on a submit command, the order would queue in the cashier’s point of sale terminal for tender.

“We had to consider a lot of nuance in the user interactions. After our observational visits, we created a journey map and user flows that synthesized the barriers around the order process and identified critical interaction points from logging in and pairing their headset, to adding and editing an order and deploying it to the cashier.”
Samantha Ingram, Experience Design Lead

Samantha’s big area of focus was to solve the problem of providing feedback to both the customer and the order taker in a meaningful way?

“The answer was a thoughtful yet non-intrusive visual interface and conversational design” said Ingram. “Both Employee and customer had similar visual interfaces, which needed to be glanceable and allow a real-time view of the order. For administrative tasks, we designed a different interface for activities like logging in, adding headsets, and reviewing logs and sales stats. Focusing on the order taking process, the most critical work was around keeping the language simple enough for a variable user base and thinking through order logic. Some of our largest challenges included combining items and editing, deleting, and scrolling solely by voice — while ensuring the companion UI reflected all these events clearly to the customer.”

All told, Samantha’s ability to understand the multiple human interactions and the technology being used was critical to creating visually engaging representations customer orders.

Our Engineering team, led by Andrew Duncan, had the audacious task of building an architecture from multiple disparate AWS services, creating a language processing model, developing a user interface and connecting to the point of sale terminal. Spoiler Alert: If you think standing up the solution was the hard part, think again. But more on that later!

Duncan lead the effort by identifying the technology stack. He chose Angular 8 / TypeScript for the front-end and .NET Core with AWS Lambda using a Serverless Framework to manage the deployment lifecycle.

“Things get interesting when we look at the design of the backend, beginning with the API Gateway. The implementation behind this design uses a websocket from the client to the Amazon Transcribe real-time streaming API. The results from Amazon Transcribe are sent back to the client and piped to Amazon Lex. Finally, based on a specific intent, Lex uses the fulfillment lambda to send the order to the client’s point of sales system.”
Andrew Duncan, Solution Principal

Senior Engineer Lynn Kitchner tackled the build out and integration of Lex, starting, of course, with a fair amount of research. Like most engineers, she hit the blogs to help formulate her strategy for the build out, reading everything she could in a short amount of time. Then, as is customary at Slalom, she enlisted her fellow engineers to expound on whatever she was still lacking.

Lynn says “It was after talking to other Slalom engineers that it became clearer how we could design our Lex Bot to be used on the backend. It was at this point in the process when I felt most excited about this build.”

Because the bot needed to distinguish between transactional and non-transactional conversation, the team discussed how to create commands that would work from Transcribe to Lex. We landed on fairly rigid, intentional set of commands that ensured any communication spoken to Transcribe and sent to Lex would be clear in that they were for ordering, not just conversation with the customer.

Lynn continues, “I very quickly learned that Lex needed to be configured with two lambdas per command to Lex — also known as an intent. It was labor intensive, but setting up those intents would enable Lex to pick up which intent the order picker was giving and execute a codehook lambda. This lambda would then validate the user input that existed within the command, ensuring that both a quantity and a product were indicated. If this returned as complete, the fulfillment lambda would execute and process the data. From there I built Lex into a state machine, moving through the order processing states and to get to the final step: sending the order to the POS system.”

Quality Engineer, Jason Hill, took the reins with Amazon Transcribe. The location of the store that would use this technology was the driving force behind how we worked with Transcribe. Specifically, the customers would have varying speech patterns, accents, and speech speed (think slow southern “hey ya’ll” to a speedy “here’s my order.”) Jason quickly assembled a sample order script that would cover 90% of the words that a customer would utter during an order — including product and quantity variables — and we recruited fellow Slalomites from across the country to record themselves reading from the script. Jason was also able to tune the model with a custom vocabulary and key word filtering that mitigated Lex and Transcribes inclination to error when there were long pauses in the ordering process.

This is where our excitement turned into a lot of impromptu desk meetings complete with whiteboard hieroglyphics. The deeper we got into how the application processes language, the more we realized that there was significant work required around tuning the model. That work required more time and synapses than we had for the 5 week POC, so we addressed what we could, and logged the rest. Engineers, typically aren’t satisfied with good enough, but for now they found peace with it.

A non-software complication that we knew we’d have to contend with were the headsets. While the inclination from the client was that we’d need fancy high-tech headsets, what we found was that $3 spent at a local store delivered for better accuracy than the $100+ price point that we tested. What’s more, the design of the headset mattered! Because the end state of the order was visual (remember, speak the order first, then confirm it visually) the $3 headset was less obstructive. Ultimately, the headset would have a higher price point, but because we were intentional in the variety, we were able to manage operational cost along the way.

Last but not least we were fortunate to have Tyler Sims, our Cloud, DevOps and Security Solution Architect lead the way on implementing CI/CD and security, while ensuring our infrastructure was secure and flexible. Tyler used Terraform as the infrastructure-as-code solution because of how easy it was to handle and import resources, as well as its ability to query attributes of existing resources. For securing scanning we needed to check our Terraform templates for any potential issues, which also allowed us to perform many of the recommended security checks before building our resources in AWS. Finally, we used Github for version control and Codebuild to build and test our code as it’s easy to integrate with other AWS resources.

The conclusion

We did it! We built a solid proof of concept, voice-to-text application using AWS services, a 3rd party point of sale system, and a lot of ingenuity and grit. The highlight of the five weeks was the in-person demo we facilitated with our client. We turned our test lab into a fully functional storefront (minus the food, sadly)to provide an immersive experience with the technology. Simply talking through the technology wouldn’t be enough, we had our clients give and take orders to experience all that the proof of concept was expected to do. We held our breath as the CTO began to order and smiled broadly as his order began to display on the screen. In the end, the client loved the work. As with any demo, there were of course hiccups along the way; from the nuances of how to repeat an order, to the always fun “whoa, where did this error come from” we managed our way through. When the executives left, we lifted a whiskey in celebratory fashion, not only did we put this thing together, it worked.

(Oh, and that spoiler alert? The hardest part of this proof of concept wasn’t building the app, but getting the intents, conversational language, and the always-on nature of the application to work together in harmony. If I never hear “Start Order” again, I’ll be ok.)

One order of NLP, to go!

Written by Kat Judge