How I Built a Google Home Voice App

Published in

Journalism Innovation

5 min readJun 10, 2019

One of the most fun courses in the Tow-Knight Center program for Entrepreneurial Journalism — of which I was a fellow in 2019 — was the prototyping class, led by the journalist, maker, and teacher John Keefe. For a semester, we saw new ways of using machines, code, and technology to make journalism and create useful and cool products.

Keefe talked about APIs, data collection, chatbots, sensors, Arduino and voice interfaces. Throughout the semester, our mission was to use some of the tools taught in class to create a project, a prototype. It could be related to the product that each of us was developing within the fellowship but it didn’t have to.

I am the creator of Me Explica?, an explanatory journalism platform that contextualizes the news in a light and fun way, like a chat. When I began to think about what I would do in my final assignment, I soon came to the conclusion that it would have to be something that generated the same closeness and conversational feel that Me Explica has. I decided to make an application for voice assistants.

You can see the final result on the video below. The rest of the post is about the journey that got me there.

Voice assistants grow in Brazil and the world

Although it was a prototype, I learned that this is a product that makes sense in Brazil. Android is present in 89% of smartphones. And 68% of those who have such a device say they use voice assistants.

This market has grown a lot around the world. Google announced earlier this year that more than a billion devices use its assistant. According to Amazon, more than 100 million Echo devices (which comes equipped with Alexa), are in use. In all, around the world, there were 2.5 billion devices using this type of technology by the end of 2018. A British research firm estimates that we will reach 8 billion by 2023.

The prototype’s first step: interaction roadmap

To start developing my application, I needed to decide how it would work. First, I thought of making a very simple interaction model: (1) the person invokes the app saying “open Me Explica”; (2), the app responds by welcoming and saying what theme it will explain that week; (3) then the app starts to speak; (4) as soon as the first passage ends, it gives the person the option of closing or going deeper; (5) after two or three times, the app closes itself.

I did the first test in Botsociety. My first lesson was that the text was too long. When a robotic voice reads three or four sentences the result is a tedious explanation. The first thing I did was edit the text. The maximum for each response was two very short sentences.

The edit was a great improvement but I was still not satisfied. I wanted to try to reproduce as much as possible the experience of reading and watching Me Explica videos. To do this, I would need to make the interaction a little more complex.

Creating a Basic Framework for Dialogue

I explained to Professor John Keefe what I wanted to do. He told me that it was possible to create what I imagined but the process would be a bit more complicated. It would be necessary to connect the speech processing platform to a database where the various possible responses would be. We were using Dialogflow, an excellent platform that can be applied on several types of bots that work with text or voice. However, I would need to use Airtable to store the responses.

Dialogflow works in a reasonably intuitive way. You program “intents,” which are the triggers of actions to be performed by the application. For example: to open the Me Explica voice application, you need to create an initial “intent” that will start the whole process.

Each menu of intents in Dialogflow can have endless sequences. With

each new “crossroads” the user encounters, he needs to trigger a different intent. It is the creator’s mission to try to predict what the person might say in each case.

The structure I wanted to create was not very complex: Me Explica would welcome the user, who could choose between two different news. Each one would have its unfolding, allowing the deepening in the news with simple “yes” or “no”.

A database for text: Airtable and Glitch

But to make this work, I would need to link each intent to a different line from an Airtable spreadsheet. It was crucial to connect the sequence of responses to each other in Dialogflow. Only then could I make sure the explanations came in the correct sequence. The great thing about this method is that you can update the spreadsheet in real time. That means you can create new content or update information on the go — which is very useful for developing stories.

There was a problem midway: you cannot integrate the Airtable table directly into Dialogflow. It was necessary to make a workaround for the two of them to communicate. At this moment (not only this one) John Keefe’s expertise was key. He used Glitch to create a web app that pulls Airtable information and serves it for Dialogflow. See below:

Once this is done, the app worked perfectly. With Dialogflow’s natural language processing, you open the possibility for people to interact with your app in a much more natural and intuitive way. For example: when the application gives you two options, you can choose the second saying “Sri Lanka”, “the second option”, “number two”, etc.

Another important lesson is to create “fallback intents”. They are the wildcards and are triggered when the person says something that is not on their word list. You can program the assistant to say, “I did not understand what you said,” “Can you repeat it?.” Or, you can create a fallback intent that terminates the application every time someone says “Goodbye”, “Enough”, “No more”, etc.

Conclusion: I’m Not Going to Stop There!

After I finished the project, I presented it to my classmates, some professors, and to the dean of the Newmark Journalism School Sarah Bartlett. Here are the slides:

I want to invest more time to translate this application into Portuguese and to launch it as one of the products of Me Explica. It can be updated every week, every day or even every minute, depending on the need. Another thing I want to do is study how to use a recorded human voice to say the answers. The robot is great when the text is short — giving the weather forecast, responding to commands given to Netflix — but it takes nuances to talk about the news.

If you want to know more about how I made this application or any other subject, feel free to contact me: diogorodriguez.com.