My First Google Home App: Prague Public Transport

I’ve had my Google Home device since November and I was curious how to build apps for it. Actions on Google API is made just for that. It’s been open for developers since December and it will eventually work not just on Google Home, but also on phones, watches, and in cars. I wanted to create an app as simple as possible to learn the process. At the same time, it had to be useful in our home. I was thinking about catching a tram in Prague — we are often in a hurry and it’s a hassle to look up trams on your phone. Why not just ask?

Let’s see the finished product in this 30s video:

The whole solution is open-sourced here:

https://github.com/davidvavra/prague-public-transport

How did I do it?

First, I researched if I can get data about trams in Prague. And I was surprised that the infamous data provider CHAPS s.r.o. has a REST API documented in Apiary! You can even play with the API in Apiary console and it’s free without registration for Prague public transport and Czech trains. Which is enough for my use-case.

Then, I read the documentation for Actions on Google and decided to use API.AI for building my conversation action. It simplifies the development a lot. It introduces you to the concepts in a user-friendly console. It supports an easy export to Google Assistant, but also to other systems like Slack or Facebook Messenger. But the best part about it is built-in machine learning. You never specify exact phrases that the user needs to say. You just specify some examples and it will work even when the user says it differently. It even extracts parameters like numbers, dates or custom data from the sentences. If the machine learning doesn’t match something the user has said, you can correct it and next time it will do a better job. Finished API.AI agent is open-sourced here.

If your app needs some information from the internet, you need to create a webhook. API.AI just sends a POST request and your webhook should reply with JSON. That means that you can host your webhook anywhere and use any language for programming it. Most examples are written in Node.JS, since Google has official Node.JS library which helps with that. I wanted to host it on Google Cloud Platform, which has an advantage of lower latencies with other Google services. Node.JS has to be hosted on AppEngine Flexible Environment and there is no free tier for that. So I used AppEngine Standard Environment. My server can comfortably fit into the free tier. For the actual code I used my favorite Kotlin, which is fully compatible with AppEngine Java environment. I couldn’t use any SDK, but it’s just reading and writing JSON, so not a big deal. I only had to study webhook request/response format.

I want to highlight one issue when building an app outside the US. Google Home supports only English, that’s OK. But the tram stop names are in Czech and when English speech synthesis reads Czech names, it may not be recognizable. I solved it with mapping Czech characters to English pronunciation:

Say hello to ‘naadrazhee visochani’! It’s not 100% accurate, but much better than just removing accents. I hope Google Home will support other languages properly in the future.

Building the first prototype was pretty fast, Google Home has a web emulator which speeds things up. But then I got stuck on getting the user’s location. Google Home can ask the user for permissions for things like the user’s name or location. And I needed the location to find out nearby tram stops. Documentation was pretty scarce without using the Node.JS SDK. I contacted support and we figured out the correct JSON response. The permission request worked, but I couldn’t get back to my app after user granted the permission. After some head-banging, I found out that API.AI’s “smalltalk” feature is the culprit. When user is granting the permission, he says something like “Yes” or “It’s ok”. And that’s a common phrase for smalltalk to take over. When I disabled smalltalk, it all started to work.

An app works on your Google Home just for 30 minutes for testing. But if you want anybody to use it, you have to go through an approval process. And it’s very thorough! Which is good, because conversation UI is very different from visual UI. They really test it by talking to your app and specifically test all the edge cases when the user says something wrong, irrelevant or is silent. The process made me rework the dialog logic several times. In the beginning, I wanted to create just a simple dialog where the user launches it, it gives you what you need, and finishes. But you should really create some “persona”, to which the user is talking to. It even has a different voice so users know they are not talking to the Assistant. The persona should greet the user, ask for what the user needs, give relevant information and say goodbye.

The hardest part was picking the invocation name. It’s a trigger to launch your app: “OK Google, let me talk to <invocation name>”. It needs to be unique, not too general, tied to your brand and easily pronounceable. You should really talk to your device and test if it always matches. A good tool for that is Google My Activity, which records everything you said and how it was transcribed into text. I discovered that longer names work better. And it’s not such a big deal to say more words (as opposed in visual UIs where we tend to make app names as short as possible). My invocation name evolution was as follows:

  • “Next tram” — denied as “too broad”
  • “Czech public transport” — denied, because it often transcribes into “Check public transport”
  • “Prague public transport”

When publishing your Action, you also have to specify some promotional texts and icons so users can discover your app. That was not that hard, here is the result:

That’s it, my first Google Home app is live and if you are one of the few Google Home owners in Prague, you can try it. I would be glad to receive some feedback. It’s great to live in the future where we can control apps by voice. And to build your own Star Trek computer in your living room!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.