Beyond responsive design: approaches to render a website for a smart speaker

Several month before Amazon released the first smart speaker Echo with Alexa, I had already designed a paper prototype of such device. I was very surprised as most of the features I imagined was implemented by Amazon. The most impressive feature is an ability to extend Alexa with new skills indeed. Hell yeah!

But there is no skill for browsing Internet. You can teach your website to talk to Alexa, but you need to learn how to program an interaction with a user using Alexa API. It is not as simple as writing responsive HTML code. And what about adding support of Siri, Google Assistant, and Cortana which requires knowledge of more APIs.

Let me ask you a question: how often do you use a mobile device to browse web? And what if I asked this question ten years ago? See a difference? Mobile traffic reached 52% in 2017, but it was only 1% in 2007. I am sure a similar trend will be for the voice traffic, e.g. one generated by humans using voice assistants.

There might be many moments in your life when you can use voice assistant: while driving a car, walking, taking a shower, or just chilling at your sofa. The problem right now is that there is no good way to browse a website using this piece of technology. Most of the people I have talked to admitted that they had thought about such use case, but they was not sure if it was possible to implement web site browsing with a help of voice assistants.

Well, there are several ways to do this! But which is a good one?

Why do screen readers suck?

The first thought I had, was to integrate a screen reader with Alexa, but I quickly realized it was a bad idea and I can explain why. The modern tendency in accessibility is to make a whole page available for a screen reader. However it would take ages to navigate through the page using this way. I think this is not a good usability solution.

Alternatively, a responsive website made for a mobile device has a minimal amount of content on a page and that is because a smartphone screen is so small, so user should only see what is essential for him/her. All the information that is not much relevant to the web page is hidden. This approach is actually 100% opposite to the one used in screen readers which try to read everything that appears on the page.

I think we should reduce an amount of content that is played to the user trough voice assistant rather than expose it all.

Second shoot: rest API to voice

Sometimes I think that all good ideas are already taken in this world. After doing a very quick research I have found ApiToBot service that provides an easy way to convert your web application to Alexa Skill or Google Assistant bot.

I like this approach. Many web content management systems such as Drupal or WordPress provide REST APIs that can be easily used to create a bot for your site. However this solution has some limitations: a) it is not scaled well across website and across the web b) it is further from the web design and is closer to the chat bot development.

Actually, I want web developers to keep their current jobs and not to develop chat bots, so I started researching a better solution.

What about reading RSS feeds?

RSS is a very simplistic and minimalistic way to define feeds for data. It is supported by almost every website. It should be quite easy to make a voice assistant to read an RSS feed.

The most common use case like listening to updates of your favorite websites will be automatically covered by this approach. So… profit!

However this approach won’t work well in case if an RSS feed contains descriptions instead of full stories. But a solution to this problem would be making a voice assistant visit a website and read a page. For an inspiration we can take a look how the reader mode works in Safari.

The only downside of RSS feed to voice approach is that it does not solve a problem of navigating through the website.

So let’s use this approach as a starting point and develop more complex solution.

Bingo! Navigate HQL : human query language to navigate through RSS feeds

Let’s say you are playing an RSS feed on your device. By default it shows a single page with ten items, but what if you want to play a next page?

Or what if a site has multiple RSS feeds for different categories. How to tell a voice assistant to chose the right one?

Or what if you want to do a complex query to filter items by multiple criteria?

The solution would be to use natural language processing tools like Dialog Flow from Google to understand human requests and return proper RSS links.

To simplify natural language processing we can introduce a structured way to make queries, Navigate HQL, so it is consistent across the web. Remember this term, hope it will be a next big thing soon!

Technically it may work as an end-point defined in the website header. This endpoint can accept a GET request with several parameters:

  • Current URL of RSS feed to understand a context
  • Text with a query from human

As a result it will output either a text or a link to an RSS feed.

Man does not live by RSS alone

More formats to support:

  • Atom
  • JSON Feed
  • Attributed HTML

Attributed HTML is a regular HTML with a specific attributes telling voice assistant what exactly to read on this page. It is kind of aria labels, but more focused on the content that users really need.

Search engine capabilities

And yeah, there should be a good way to search across websites that support RSS feeds and provide Navigate HQL endpoint. And I am talking about a voice search.

Cross platform is a key

And of course this approach should work for many platforms: Alexa, Google Assistant, Siri, and Cortana. So the best way to do it is through the new app that runs on every platform.

This year I am planning to build a proof of concept app for Alexa and potentially extend it to to other platforms. People need something simple to start with.