Hey Google, I would like to buy a book at Bol.com

Willem Veelenturf
Flock. Community
Published in
6 min readMar 15, 2018

Since the rise of the virtual assistant, developers have new ways of connecting existing apis to conversational interfaces. In this blog post we want to share what we have learned when we used the Google Assistant in conjunction with the Bol.com public api. We used the standard tools offered by the Google Home Mini and the online documentation in addition to the resources available on the developer portal at Bol.com. The experience described here is not limited to Google or Bol.com and we hope to inspire fellow software engineers.

Virtual Assistants

Interactions between humans and machines have mostly been driven by the capabilities of the machine. With the invention of buttons and levers came educational programs where we needed to learn to handle these. Subsequently since the birth of modern peripherals like the mouse and keyboard people need to gain experience with them to be able to be effective users. Not anymore. Since the rise of the virtual assistant we can interact with our devices in a more natural way. We have already communicated over thousands of years through the use of speech. Even though voice recognition in software is around for a few decades, deep learning ai algorithms in natural language processing are disrupting the market. Devices like the Amazon Echo, Google Home, or assistants like Siri and Cortana are expected to understand us when we talk to them.

When using these devices you almost forget that you are talking to a machine. Unless of course it answers in a typical: “I’m sorry I don’t understand this yet”. Notwithstanding the obvious fail scenarios, this way of human machine interaction feels way more natural. Since companies like Amazon and Google need a large adoption rate, augmenting the capabilities of virtual assistants should be a breeze. This opens the way for other companies to piggyback on the service and offer customers new ways of interacting with their products. Since virtual assistants are extensible by design, we can add functionality to tailor to our clients needs.

For our exercise we developed an extension for the Google Assistant to interact with the open API provided by Bol.com at https://developers.bol.com where its documentation can be found too.

Bol.com API

In order to connect our Google Home to a service we opted for the bol.com open API. Bol.com is an online retailer in the Netherlands. They stimulate 3rd parties to connect to their platform through apis. This requires outstanding documentation and is therefore a great candidate for our experiment. We used the api that exposes the catalog for us to search through. In this way we can add a natural language interface to an api that is already well suited for machine interaction. We explored different ways of writing an app that integrates well. Here we will share our lessons learned and how to connect the Google assistant to bol.com or any api of choice.

Lessons Learned

Actions SDK vs DialogFlow

With new technology and when developing software, failure lurks around the corner. But when failure is embraced we can learn, and share. To this end, we want to give a few pointers to how we solved some of the problems we encountered. We started out with the Actions SDK as it enables full control as a developer. For one, ‘intents’ were stored in a JSON file, making it easier to share with other developers through version control. However, Google Assistant never returned our custom defined intents (e.g. buy a book). It was always a default text intent. Moreover, Natural Language Processing (NLP) cannot be taken advantage of with the Actions SDK. Alternatively DialogFlow can be used. Here intents are entered in a web app and NLP can be used. Since we wanted to build an app with the added benefit of NLP we switched to DialogFlow.

From no Response to State Machine

Next to the intents and NLP we encountered some problems with different versions of the DialogFlow API. It turned out many features were not supported in version 2. Some features from the documentation did not work. For example methods such as askWithList. Here the user can select from a list of options. Subsequently, we observed the following with DialogFlow’s follow up intents:

1) When Google Assistant misheard the user 3 times the app would return to the start. Then the user needed to start from the beginning. It turned out contexts have a certain time to live. Fortunately these can be increased.

2) ‘Yes’ and ‘no’ are intents for various flows and can mean different things depending on the context. That is why we changed to have a flat intent structure and rich context. An object can then be passed between requests and keep the context of the conversation. Using this, the app became a state machine, where the context represents state and the intents transitions. The same intent can carry different meaning for different states. For instance, a ‘yes’ input can semantically be: ‘I would like to order that book’ or ‘I would like you to list all books from the start’.

Structuring the Conversation

The first version of our app would ask what book you would like to buy. Then after providing a book title it would list all the information about books it found on Bol.com one after another. We quickly decided that the assistant should talk less and understand more. Instead of listing all information the user should be able to ask for it. For example when it: ‘Found 34 results for Ready Player One’, one could respond with: ‘Ok, give me the cheapest one’, or ‘Tell me more about the first one’. This feels better than: ‘Found 34 results for Ready Player One. The first one is Ready Player One by Ernest Cline as Paperback in English published in 2012 for 9.99 euros. Do you want to buy this one?’. In this case the user can only respond with ‘yes’ or ‘no’ followed by the next long-winded result. A valuable insight here is to try and avoid the assistant asking closed questions yet to keep the dialog as open as possible.

A specific problem to the Google ecosystem, but perhaps more pertinent in an Amazon environment would be the first intent to activate the app. This can be done implicitly by saying ‘I would like to buy a book’. However, this triggers Google to redirect the user to its own shopping space or to your payment plans. We can mitigate this by asking for the specific app, but ideally we are routed to a default app that we could define in our settings.

Data Quality

When adding a natural language interface to any api interesting new scenarios can arise since apis are traditionally designed for machine to machine interactions. For instance when we connected the dialog flow to the Bol.com api we found, in hindsight, an obvious issue. A lot of books or items are available in in different versions or languages. When asking for ‘The Lord of the Rings’, it also returned ‘The Similarion’ (because LOTR is also in the description). Traditionally, from an API point of view, the backend should provide a rich response where the client can easily filter the output. Yet in the case of a voice command assistant the response should be clear and to the point. Data quality is always very important, but accuracy in the response is tantamount to a good natural language experience.

--

--