“Okay, Google. Build a voice app.”
Taking a look at the process behind building a conversational interface designed to make the web more accessible for all.
Not long ago we wrote about a proof-of-concept we worked on in collaboration with CNIB (formerly the Canadian National Institute for the Blind) that would empower users with a voice-activated calendar application. Today, we’re going to break down some of the more technical aspects of how we created the prototype and discuss some of the limitations we encountered along the way.
The CNIB Foundation is a non-profit organization delivering innovative programs and powerful advocacy, dedicated to assisting Canadians impacted by blindness to live the lives they choose.
CNIB has a Drupal-based website that is used by its employees. One part of some employees’ jobs is to update the CNIB calendar, which has details about all the events that CNIB hosts across Canada. These events are then published on CNIB’s website and can be accessed by members using desktop, web, or mobile devices.
Myplanet partnered with CNIB to run a series of experiments with the objective of developing a two-way prototyped solution that would both enhance the experience of employees trying to create events in Drupal and improve a member’s experience when using mobile to access information related to CNIB events.
Having previously built other products that offer workplace voice assistance, we were keen to tackle a new challenge in building accessible workplace products.
CNIB employees, some of whom are blind or partially sighted, are responsible for content entry on the CNIB website. One key part of content entry for the team is creating events via the Drupal admin interface. These events go through approval and are then published on their website, which can be accessed by community members. With the aim of improving the experience and making it more accessible, we leveraged voice technology to build a more accessible solution for content creators.
For the prototype to be effective, we needed users to be able to publish events to the production site without moderation, which meant that we needed a voice assistant with an extremely high level of accuracy when it comes to natural language processing (NLP).
For our solution, we mapped out the event creation process, developed a conversational flow, and then converted it into an Alexa skill. The Alexa skill allows CNIB employees to create an event and add event details such as an event title, a brief description of the event, a start and end date as well as a start and end time, and a classification of the type of event they are creating. And all of this can be verified by using just voice interactions.
The next step in building out our test solution was to integrate the Alexa skill with Drupal. We enabled a restful JSON API module on CNIB’s Drupal instance. Some other parameters, like event image and event coordinator, were automated based on the event type selected by the employee.
The employee side solution was enabled on the AWS Alexa for Business instance, which allowed the skill to be added to a shared device that needed to be physically restricted and could be accessed by any employee. The skill could also be enabled on an individual employee’s personal device.
For employees, we built a solution that allows them to add event information via Alexa which is then stored in Drupal.
We found the system did a great job when recording fields like date, time, and trained keywords, but to be sure we were using the right natural language processor for our needs, we knew we needed to run an additional experiment using sample free-form speech input for both event title and summary fields.
For this, we ran a session internally, scheduling 30-minute sessions with multiple users and asked them to speak each of the sample inputs to all the Voice Assistants we wanted to test. This was done to account for different speaking styles, speech rate, distance from the smart speaker and high background noise scenarios. For a production application we can take this a step further through our friends at Pulse Labs, who specialize in fielding larger and more diverse groups of sample users for voice interfaces.
What we found was that the accuracy level of prediction varies from 60% to 90%, depending on the voice assistant being used. That level of variation also exists when recording names, letter capitalization, punctuation, and so on, which posed a serious challenge for creators wanting to publish without moderation.
For the other half of the solution — the event attendee side — we created a similar flow to the employee one. We mapped it out, created a conversational design, and then converted it into a Google Action.
The member-side solution is built using Google Assistant. The Google Assistant interacts with restful JSON API to fetch the event information. Members authenticate themselves to Google Assistant on their phones (Android or iOS) and enable the CNIB action.
Once this action is enabled, they can then ask the Assistant what events are happening by specifying a single date or a date range. The assistant then fetches this information from Drupal via restful JSON API and informs the user of all the events that might be happening on the specified date.
Members have the ability to register for the event just as they could on the CNIB website, however this solution uses only voice interaction. Once registered, an automated email is then sent out to the member confirming their registration and another one is sent out to the event program lead informing them of the updated registration information.
An additional feature we were excited about was an integration of Google Assistant with Drupal via Restful JSON API. This feature gave the users, when interacting with Google Assistant, the ability to find out what events are happening on a specific date or across a date range.
Outcomes and Next Steps
Partnering with CNIB, Myplanet believes that greater accessibility for both content creators and content users is possible through voice interaction. The complexity of event creation meant this was a fascinating project to work on: from the complications in calendar accessibility (like date and time ranges and how best to share that information via voice) to the ongoing evolution of the voice interfaces themselves, especially as concerns NLP and the advances of the technology behind the systems, there were several opportunities to apply pieces of previous work we’ve done to a greater whole in creating this prototype.
One feature under consideration for the future is user authentication when interacting with Alexa. Users should be able to authenticate to a shared device using voice (this is currently not available to Alexa third-party developers, though both Alexa for Business and Microsoft are increasing the number of options available for authentication of a voice experience). Alternatively, a two-factor authentication system can be deployed where the employee speaks a code to Alexa that has been sent on their mobile device or generated using an authenticator app.
Using Alexa for Business for the content creator experience and Google Actions for the consumer side, we were able to spin up an interaction that showed how both creating and accessing event content could be easier and more accessible for all through the power of voice interactions. As natural language processing and voice interfaces advance, these kinds of experiences will continue to evolve and make the web more accessible for all.
Take a look at the solution we created in action in the video below: