Google Home App with Node.js — A Song of API-ce and Firebase
Recently employed by an IT consulting company, one of my first clients was a very well known and established newspaper company in France. Since the famous Google Home was FINALLY coming to our country, my client wanted an app for it, and applied to be one of the first apps released with the device.
The main concepts were simple. The app had to be able to:
- Play an audio news flash created by journalists
- Give stories of the day and milestones for everyday of the year
- Allow the user to play a trivia game about famous personalities.
Nothing too fancy, but indeed some fun to come. I was dispatched to this client after the project definition, so a Proof of Concept had already been produced based on Google’s boilerplate app.
The base setup looks similar to the one this article, “How to create a custom private Google Home Action with API.AI and Google App Engine.”
The app is an Actions on Google Assistant app, linked to a Dialogflow (the new name of API.AI) project. Our client’s project, however, didn’t use App Engine, but used Firebase. Thus, we had a Firebase Function where the Node.js code lives, directly linked with a Firebase real-time database.
At this stage, the projects were set up, and a basic Node.js app was deployed on Firebase that could return responses either containing the news audio file url, “history of the day” stories, or allow the user to play one game. That was a great base, but all other actions were making it crash, and chaining actions was not possible.
The other main problem was that the code had been produced based on a good boilerplate, but was incomplete. It wasn’t fully modularized, made little use of callbacks, and didn’t have enough error handling. Finally, the codebase was only a proof of concept, and so we had to be complete it to make it fully work as intended.
And that’s where I come in. *cue triumphant jingle*
It’s off to work we go!
First, don’t get me wrong here. The original developer on the project had done a great job. They simply had to move to another project where their expertise was needed. Since they started working on something else, and though they still remained as advisor on the project, I’ll be conveying this story in first person.
While refactoring the Node part of the app, I started reading the various documentations for Actions on Google, Dialogflow and Firebase. It appeared that the design and workflow of the app was of crucial importance. As usual for applications, good planning saves a lot of time. So, after having properly factorized the app and checking to to ensure it was suffering no regression, I took a step back and started to think. Let me tell you it was hard.
Advice from Google
The first attempt at making all these services work together implied the following workflow:
- User starts the app
- Actions on Google calls Dialogflow
- Dialogflow calls the Firebase hook
- The Firebase Function returns the proper welcoming screen, based of the number of times the user had already called the app, including that day
- The user is prompted to ask the app for what they want it to do
- The corresponding action is started.
The main problem was with the trivia game. The user was given a series of four or five clues, and had to guess to whom they were referring in the same number of tries. But whether the user had correctly guessed or not, the app continued to issue clues until the last one and wasn’t responding after returning the answer. We had tried setting more Intents with proper parameters and contexts on Dialogflow, but each try turned out unsuccessful.
I had a hunch that the segregation of responsibilities wasn’t clear enough in our project and we should think about microservices. Google France was monitoring our progress closely and ready to provide help. We organized a call with them and they confirmed my feeling was the right one: there was a flaw in our architecture.
This may be the most important piece of advice of this article: in your APIs like in your code, keep your logic separate. Keeping this in mind, all the logic related to the game and dealing with user tries and solutions would have to be in the Node part. That’s all there is to it, and that makes maintenance easier, as it usually does.
- Actions on Google was our controller, receiving requests and dealing responses
- Dialogflow was our Dispatcher, matching user speech with Intents
- The Node.js Firebase Function was to be our brain, the collection of microservices, each one attached to a particular intent, encasing all the logic related to that action.
Do not try to put some deep logic in Dialogflow. Especially when you know how to do what you want with Node. Remember, Keep It Simple, Smarty. We always want to try out new toys. But sometimes, it’s better to leave well enough alone.
User — Question History Dev
Invigorated by this little chat with myself and the awesome team at Google, I started working right away. And with this new pattern, I was able to implement a clearer way to deal with users’ answers, good or bad, and to allow chaining questions if the user stated so.
There were still a few features to implement, but the process was quite straightforward: I had to implement a user history, and deal with edge cases specs (“pass” and “stop” intents, mainly).
There are a few things to keep in mind: the Dialogflow App SDK can keep records of any data between calls, by means of the contexts and/or the app.data api. This allows you to easily retain state and keep track of the user’s progression, but you also have to remember to “flush” it and empty whatever you stored every time you switch to another intent.
After a few tries, I finally had the Trivia Game working properly; and they were never receiving the same question twice. Yes, I’m particularly fond of the “trial and error” approach, and since I’m relatively new to Node.js, I try a lot and fail a lot. But by doing so, I learn something every time.
At this point, there wasn’t much left for me to do. Minor tinkering, small wording corrections. On this subject, please note that spelling and syntactical rules apply. At least if you want your speech synthesis to be smooth and sound nearly human.
And then, we had to submit the app to Google for review.
There are in fact two reviews. A Quality Assessment review, and a Policy review. While the first one is quite simple and consists in a simple “using the app” test, the second one depends on various criteria. First, the rules applied to your app are different if it includes in-app purchases or not. Then, there are specific rules to deal with the fact that your app will open a mic and listen to the user. That’s where most reviews fail.
Your app does indeed have to specifically and openly ask something to the user each time you wait for an input. You simply can’t have your app say “Hi, user!” and wait for them to state their wishes. You have to implement a real question. Otherwise, your app will be rejected, and that’s a really dumb way to lose time. This is the rule for every intent in your Dialogflow API and for every phrase returned by app.ask() or app.tell() in your Node code.
Don’t think we left the app alone while waiting for the review. We came to a simple realization during this time…We were extensively testing our app, mainly because we were deeply enjoying ourselves and having fun, but we came across a little snag: the app had trouble understanding some of our statements.
Of course, speech recognition isn’t always as good as we want it to be. It would have been easy to put the blame on Google’s algorithm. If we had the nerve to think ourselves that good. So, we got our heads around the training tab on Dialogflow, to see what it had heard when it didn’t understand us. And that’s a crucial point. Nobody has perfect speech, and a nice quiet room to talk. The best way to tackle this particular problem then, was for us to define more phrases matching our intents, including slang and familiar ways of speaking.
Long story short, our app was approved and we received a nice notification about it. I have to say, this is a nice feeling. This app may not be perfect, far from it, but it was the first “real” (as in “put in production and used by a real person”) Node.js project I worked on at this scale.
So, thank you for reading, and I truly hope you’ll have learned a thing or two, or that you’ll say “if he did it, I can do it too”. For everyone of you who wishes to make an app for Google Home, but is afraid to try, I’ll say this: Go, try, and succeed! And then email me your success; I’ll be more than happy to hear from you.