"OK Google, scroll down." How we integrated Google Assistant VUI with a website
With so much buzz around Voice User Interfaces (VUI), I decided to give it a go and experiment a little bit around it. The idea was to use Voice UI to accomplish something unusual, but accessible to a wide audience at the same time. So, as much as I love all the IoT experiments (such as this one), I had to give up on them, as they require hardware to play with.
It had to be something available online and the decision was made to use our company’s website as a playground.
If you just want to play with it and not read about how it’s made, skip to the bottom of this article now.
What if you could navigate the website with voice?
Oh yeah. Sit back and talk to your screen. What a fantastic way to showcase what VUI can do! Gimmick? Yes. But as a digital agency, we have a right (if not a duty) to play with latest technology — and who knows at what point it will flourish and monetise our efforts.
And yes, things like that already exist - using SpeechAPI you only need HTML and JS skills to build it. But as it quickly turned out, the SpeechAPI support is very poor at the moment:
Google Assistant to the rescue
Having to look for an alternative solution, I moved my research towards smart speakers. Recently at our agency, we’ve been experimenting a lot with conversational interfaces (chatbots), and an extension to those are voice assistants (Google Assistant, Alexa, Siri). It didn’t take long to pick Google Assistant as a primary voice platform for the project — it’s available on every Android device, allows for building conversations with voice and visual interface at the same time, comes with great SDKs and documentation — and is really fun to play with.
So, what do we want to achieve? For starters, let’s consider a classic user journey on a corporate website:
- User lands on a homepage
- Scrolls down a little, to see what’s going on
- Starts browsing to areas of interest. Clicks on navigation items.
- User reads a bit more about particular area of interest, perhaps plays some videos.
- Finally, wants to get in touch and fills in the contact form.
Awesome, let’s do all that with voice.
Now it’s getting techy. Here’s what we used:
- Dialogflow — to create conversational entry points, intents and manage everything nice and easy in Actions on Google.
- NodeJS app — to create custom functions and allow for handling complex queries, otherwise impossible to do in Dialogflow.
- Heroku — to host our nodejs app. A paid plan is needed to not let our app fall asleep
- Website — ideally build in Single Page Application manner— to allow for smooth page transitions and to make handling sessions easier
- Socket.io — to connect VUI client and the website and allow for real time communication
It didn’t take long to put everything together and create the first function, triggered with voice. The most time consuming thing is testing and error handling. It’s incredible how many new functions, journeys, intents and flows you will create by simply observing people using your apps.
One of the challenges was to introduce a synching mechanism to couple 2 devices together, and prevent from broadcasting events to other clients. Thankfully, Stackoverflow is full of good people giving away snippets of code for pretty much every use case. Randomly generated number (server side) checked against occupied combinations and served back to website frontend did the trick. Voice agent then asks to provide this unique number to synch devices, and upon successful operation, socket.io welcomes agent and website in the same room.
Publish your Action
To make our action discoverable, we had to go through the approval process. It takes around 1–2 days for Google to do that.
Attention! Everytime you want to publish even the smallest change done within Dialogflow, you must go through approval process again. It’s quite annoying to be honest, but hey, quality comes first.
The biggest trick was to make one intent discoverable by saying: “OK Google, ask Greenwood Campbell to do something amazing”. That way, we could take the user on our special journey and use this as a context in Dialogflow setup. Context also serves as a kind of memory of the voice agent.
As it turned out later, Google also indexed our intent with implicit invocation: “OK Google, control a website with voice”. You don’t have to mention our agency name Greenwood Campbell at all (it’s almost like getting a website domain for free!) More on actions discovery and invocation types here.
You can still trigger Greenwood Campbell action by saying “OK Google, talk to Greenwood Campbell”, but that way you will never find the website controller. Our experiment was planned to begin by interacting with a website first.
Miś, not meeshh
The most fun (when I say fun, I mean we struggled badly…) we had with the contact form.
First we allowed users to say their name to fill in the fields on the form. But — if your name is not English, or not very common, Google Assistant either didn’t get it, or horribly misspelled it. My last name is Miś — Polish for “teddy bear”, but Google never got it right. And I wasn’t satisfied sending wrong data through the contact form.
But — every Google Assistant user is somehow identified in Google’s database. And there’s potentially all sorts of data to retrieve from your account, right?
Well, not all kind of data. And you must allow Google Assistant to use it. But it works, and provides accurate results. Check out here what you can get access to when building your Actions. Contact form problem solved.
Launching this experiment live boosted traffic on our website by few hundred percents plus a massive decline in bounce rate and great improvement on average time spent on site. It also brought us some recognition on website galleries (and awards!) And we have a great demo of VUI capabilities, even if it’s just a little playful thing. Best takeaway though is learnings of variety of systems and confidence in delivery of Actions on Google to our clients.
Here it is: https://www.greenwoodcampbell.com/
Use a decent browser and a laptop/desktop screen to initiate VUI controller. You will need your phone with Google Assistant or Google Home around you to interact with it. Have fun :)
If you too are experimenting with VUI and integrations, let the me know about it in comments. I’m always happy to hear about exciting ideas. Cheers!
EDIT: since this integration no longer is part of greenwoodcampbell.com website, I thought a video of how it worked would be useful. Here it is (the sound isn’t great, sorry!)