Using Google Maps Places API to “validate” locations in a conversation with a virtual agent
Less than a couple of months ago I wrote an article to describe how Dialogflow CX system entities @sys.date
and @sys.time
can be combined into custom composite entities to workaround a quirk of @sys.date-time
when used with the Italian language. Today I would like to take you with me on a trip around the globe! Unfortunately I won’t be assisted by system entities like @sys.geo-city
and @sys.location
as they are not yet available in non-US and non-global regions. What that means is for agents deployed outside of US we cannot leverage them to match things like a zip code, a full address, street names, cities, countries, airport names, etc. So, how do we work this out? Well, here is when the Places API of Google Maps comes into play so sit back and enjoy the ride!
What are we trying to achieve?
Hi, are there any trains to San Franc tonight?
Short answer: we want to be able to extract the string “San Franc” from the above sentence. Actually let me make a step back: to help our user get to San Francisco tonight our agent should first resolve the intent and contextually extract any meaningful pieces of information like San Franc and tonight. These are what in Natural Language Understanding we call entities, you can think of them as subject matter experts or simply slots that we need to fill in with values from specific parts of the end-user input.
Entity types are used to control how data from end-user input is extracted. Dialogflow provides predefined system entities that can match many common types of data. For example, there are system entities for matching dates, times, colors, email addresses, and of course locations. If we had to build an agent to help users find/book trains in Dialogflow CX we would start from creating the intent head.findTrains. To create an intent we must provide a set of training phrases (also called utterances in NLU) which are example phrases for what end-users might type or say. When end-user input resembles one of these phrases, Dialogflow matches the intent. You control how end-user data is extracted by annotating parts of your training phrases. When you annotate parts of a training phrase, Dialogflow recognizes that these parts are just examples of actual values that will be provided by end-users at runtime. For an end-user input like “Are there any trains to San Franc tonight?”, Dialogflow would extract the date-time
parameter from "tonight" and the destination
parameter from "San Franc".
Easy … if @sys.location
or @sys.geo-city
were available in non-US & non-global regions as well ;-) Unfortunately at time of writing they are only available for agents deployed in the US. So, that’s exactly the issue we are trying to solve: we need to extract “San Franc” and validate it’s a legitimate city. By the way, have you noticed I didn’t say “San Francisco”, instead I am using the Aussie version which is “San Franc”? Similarly I do expect inputs like “NYC” or “San Dieeego” aren’t ignored because they don’t match exactly with the original names of the cities of New York City and San Diego. In fact they must be extracted from the user sentence and resolved into their actual names (New York City and San Diego). There is more though: I was born in “Ascoli Piceno” a lovely city in the centre of Italy which is normally referred to as “Ascoli”. To make things slightly complicated there is another place in Italy called “Ascoli Satriano”. For an end-user input like “Are there any trains to Ascoli tonight?” the perfect user-experience driven by the cooperative principle wants the agent to follow up with a question like “Did you mean Ascoli Piceno or Ascoli Satriano”?
As usual there isn’t just one solution to solve the problem, we can think of at least two or three ways to address all these requirements. We could create a complex custom entity from the list of all cities in the US (don’t forget synonyms…), or once we have extracted the location from the end-user input we could perform a “lose search” against a system of records (just keep in mind this is natural language so string matching doesn’t work well!). Option 3: we could leverage what the Geography system entities actually use under the hood… Google Maps!
Introducing the Places API
The Google Maps Platform offers an astonishing set of SDKs and APIs for Maps, Routes, and Places. Specifically the Places API & SDKs allow developers to integrate Google’s Place details, search, and autocomplete into their applications. The Places API is a service that returns information about places using HTTP requests. Places are defined within this API as establishments, geographic locations, or prominent points of interest. Google Maps Platform products are secured from unauthorized use by restricting API calls to those that provide proper authentication credentials. These credentials are in the form of an API key, a unique identifier that authenticates requests associated with your project for usage and billing purposes. Check out this link to know how you can generate your own API key and include it with every Maps JavaScript API request.
The Place Autocomplete service is a web service that returns place predictions in response to an HTTP request. The request specifies a textual search string and optional geographic bounds. The Place Autocomplete service can match on full words and substrings, resolving place names, addresses, cities, codes, etc. You can instruct the Place Autocomplete service to return only geocoding results, rather than business results. You may restrict results to be of a certain type by passing thetypes
parameter and among the types supported in place autocomplete requests we also find the(cities)
type collection which instructs the service to return results that match locality
or administrative_area_level_3
. This specific type (and a few more) are not supported by other services like Find Place hence my decision to use the Place Autocomplete service for my use case.
Let’s look at an example, a request for cities containing the string “Ascoli”:
https://maps.googleapis.com/maps/api/place/autocomplete/json
?input=Ascoli
&types=(cities)
&language=it
&components=country:it
&key=YOUR_API_KEY
The input parameter is required to initiate a Place Autocomplete request and it is the text string on which to search. The Place Autocomplete service will return candidate matches based on this string and order results based on their perceived relevance. There are other optional parameters like language in which to return results or the list of countries if you need to restrict your results to places within certain countries. In this example I am restricting results to be cities in Italy that match the string “Ascoli”.
Here is a a fragment of the response returned in the JSON format (you can choose between JSON and XML). For simplicity I have only included the first two predictions (2 of 3) and I have removed sections from the response. As expected the Place Autocomplete service returns “Ascoli Piceno” followed by “Ascoli Satriano”.
{
"predictions" : [
{
"description" : "Ascoli Piceno, Province of Ascoli Piceno, Italy",
"matched_substrings" : [
{
"length" : 6,
"offset" : 0
}
],
....
"types" : [ "locality", "political", "geocode" ]
},
{
"description" : "Ascoli Satriano, Province of Foggia, Italy",
"matched_substrings" : [
{
"length" : 6,
"offset" : 0
}
],
....
"status" : "OK"
}
You can be as detailed as you want and add the region or the state or the postcode for the city you are looking for (e.g: “San Francisco California”, “Ascoli 63100”). You can use acronyms like “NYC” or colloquial names like “San Franc” or misspell the world: they are all possible inputs when using Natural Language and our conversational system must be able to resolve and validate them. If the search is successful but no matches are found than the service will returnZERO_RESULTS
status code.
The Webhook
I have chosen to implement the webhook invoked by the agent using Node.js and run it on Google Cloud Functions. Quick introduction, Webhooks are services that host your business logic. During a session, webhooks allow you to use the data extracted by Dialogflow’s natural language processing to generate dynamic responses, validate collected data, or trigger actions on the backend.
What does this webhook do? It invokes the Places API Autocomplete service passing in input the destination extracted by Dialogflow from the end-user input (we will look at how to extract the location from the user intent in the next paragraph). From the webhook prospective “destination” is a parameter set by Dialogflow in the webhook request message as part of the call. Parameters are used to capture and reference values that have been supplied by the end-user during a session. Each parameter has a name and an entity type. Intents use parameters to extract data provided by end-users when intents are matched, in this particular scenario when an end-user input matches the “head.findTrains” intent at runtime, any parameter used by an annotation for the associated training phrases is set by Dialogflow. For this intent we can reasonably expect the destination where the user wants to go, the departure place as well as the date and time. For the scope of this PoC I will just focus on the destination, ultimately the logic that applies to destination would also apply to departure place.
At run-time the fulfilment for the intent route sets the “destination” parameter value that the function receives and reads from the webhook request:
if (!(req.body.intentInfo && req.body.intentInfo.parameters)) {
return res.status(404).send({ error: 'Not enough information.' });
}let destination = req.body.intentInfo.parameters['destination'].originalValue;
We can now initiate a Place Autocomplete request as an HTTP URL of the form below. As is standard in URLs, all parameters are separated using the ampersand (&
) character. Note, we are narrowing down the search to only cities in US and don’t forget to replace the key in this example with your own API key in order for the request to work in your application!
var config = {method: 'get',url: 'https://maps.googleapis.com/maps/api/place/autocomplete/json? input=' + destination + '&types= (cities)&language=en&components=country:us&key=YOUR_API_KEY',headers: { }};
I have used Axios to make the HTTPS request from the node application. Axios is a promise-based HTTP Client for node.js and the browser. It is isomorphic (= it can run in the browser and nodejs with the same codebase). On the server-side it uses the native node.js http
module, while on the client (browser) it uses XMLHttpRequests. To use Axios for node I first installed it using npm install and then imported with require()
using the following approach:
const axios = require(‘axios’);
To perform the GET request to the Place Autocomplete service using Axios I have used this approach:
axios(config)
.then(function (response) { //handle success (including ZERO_RESULTS) })
.catch(function (error) {
// handle error })
.then(function () { // always executed
});
If the invocation to the API is successful and the service finds matches for the given input I use a for-loop to iterate through the array of predictions returned in the response and create a second array containing just the description of each prediction.
axios(config)
.then(function (response) { //handle success (including ZERO_RESULTS)
if(response.data.predictions.length > 0) { for(var i in response.data.predictions){
var prediction = response.data.predictions[i].description;
predictions.push(prediction);...
Why do I create a second array? Well, the explanation is a bit long but I promise, I will get there! To start with, there are three possible outcomes that we need to handle if the service doesn’t return an error:
Case 1: no match (ZERO_RESULTS
indicates that the search was successful but returns no results. This may occur if the search was passed a location that doesn’t exist). What should the user experience be? “Conversation repair is the practice of fixing misunderstandings, mishearings, and misarticulations to resume a conversation. Repairing a conversation can help build a user’s trust by showing that the agent is listening to their request”. Repeat the question to the user, but rephrase it in a shorter way to indicate the information that is missing. I think these are all valid prompts for the first no match: “Sorry, which city?”, “Sorry, you’re traveling to which city?”.
Case 2: single match. The service returned one single location (I have noticed this happens when the search is passed the zip code or the state where the city is located other than the city itself). What should the user experience be? A single match here is like ace in tennis ;-) The Places API has validated the destination is legitimate, as a next step the agent will be providing the customer with a timetable of trains to fulfill the request.
Case 3: multiple matches (multiple predictions are returned). This third scenario is in my opinion the most likely one. According to the Autocomplete service documentation
The Place Autocomplete service will return candidate matches based on this string and order results based on their perceived relevance.
Based on that with a high degree of confidence we can say that the first prediction of the list is very likely the city where the user wants to go. What if it is not? What if by saying “Ascoli” the user didn’t mean “Ascoli Piceno” (first prediction in the list), in fact he meant “Ascoli Satriano” (second prediction). What would a good user experience be in this third path? I think the agent should ask the user an explicit and actionable question like “Just to confirm, did you mean Ascoli Piceno?”. They could confirm the agent’s guess or contradict it by saying “No, Ascoli Satriano” (this second location matches the second item in the array, loop is closed) or just “nope” (in this case the agent should just cooperate with the user by presenting the other possible matches). Either way the array of predictions helps move the conversation forward hence the reason why the webhook application creates a second array and writes it to the session (as a session parameter) that we can reference at any point during the conversation with the user.
To summarise, depending on the output returned by the Place Autocomplete service the webhook adds the predictions array to the session info, sends a different message back to the agent (in the fulfilment response) and transitions to a different target page.
res.status(200).send({
sessionInfo: {
parameters: {
predictions: predictions
}
},
fulfillmentResponse: {
messages: [{
text: {
text: [agentMessage]
}
}]
},
targetPage: targetPage
});
For example if the service returns more than one prediction the flow will transition to the page “Multiple Matches” and the below message will be shown to the user.
targetPage = "projects/diaologflow-cx-playground/locations/us-central1/agents/5b0d10a9-8bca-49f2-bb17-50144df939e0/flows/00000000-0000-0000-0000-000000000000/pages/9d9b5bb1-2979-4462-b4bf-115513b1d6d5";agentMessage = "Just to confirm, did you mean " + predictions[0] + "?";
The Agent
How do we blend this into Dialogflow CX? Earlier before I explained that what is sent to the Places API is what Dialogflow has previously extracted from the end-user input when it matches the “head.findTrains” intent at runtime. Any parameter used by an annotation for the associated training phrases of the intent is set by Dialogflow and added to the list of intent params for the webhook. Check out this link to see how in Dialogflow CX you can annotate parts of your training phrases and configure the associated parameters. When you annotate parts of a training phrase, Dialogflow recognizes that these parts are just examples of actual values that will be provided by end-users at runtime. For an end-user input like “Are there any trains that can take me to NY tomorrow?”, Dialogflow would extract the date-time
parameter from "tomorrow" and the destination
parameter from "NY". When building an agent with the console, most annotations are automatically created for you when you add training phrases that contain parts that can be matched to an existing entity type. These parts are highlighted in the console for you. The great thing though is the fact you can annotate words as needed meaning despite “NY” can’t be matched to the existing @sys.geo-city
system entity (because it’s not available outside of US regions) you can still instruct Dialogflow to recognise this part of the sentence. By providing a reasonable number of training phrases and annotating them accordingly you will train the NLU engine to recognise not just date-time but also destination as key-words and extract them from the end-user input at runtime.
The simulator in Dialogflow CX is a great way to test things out. So let’s see what the engine extracts from an end-user input like “Are there any trains to Boston tonight?”. Ignore what the agent says right now, we will look at that in a bit. What I wanted to highlight here are the two intent parameters (date-time and destination) that Dialogflow has identified and extracted. They are now ready to use ;-)
When the end-user input matches the “head.findTrains” intent at runtime we want trigger a request to the webhook and all the magic will follow as per the implementation of the cloud function :-)
So what happens now? When the webhook receives the request from Dialogflow first it reads the destination parameter value and after trimming/removing any invalid characters/lower casing it calls the Place Autocomplete service. In our example the destination entered by the user is Boston and it turns out that there are five Boston in the US. The webhook parses out the Place Autocomplete response and finds five predictions as shown by the predictions array . We are in the “Multiple Matches” scenario and while there’s a great chance that the user is referring to the city of Boston in Massachusetts we should not assume and therefore we want to check that. Once completed the webhook sends a 200 OK fulfilment response back to Dialogflow with the following instructions: transition to the page “Multiple Matches” and reply back to the user with the message “Just to confirm, did you mean predictions[0]?” (which is translated into “Just to confirm, did you mean Boston, MA, USA?”).
While user inputs are never predictable we can try and think about what at this stage of the conversation the user is most likely to say. “Multiple Matches” needs to handle (at least) a positive answer, a follow up from the customer, a negative answer and the unresolved intent. A positive answer will call a backend service to fulfill the user request. In my opinion, other inputs like “New Boston Texas” or “No, New Boston Texas” should match the “findTrains” intent again. In the example below the second invocation of the Places Autocomplete service has returned a single match.
On the other hand if the user response is negative with no further information provided I think it’s a good idea to be cooperative and display the other predictions without transitioning to a different page. Ultimately the goal is to help disambiguate and move the conversation forward to fulfill the user intent.
Conclusions
If you’ve come this far (unless you skipped the previous paragraphs!) I should really thank you for sticking with me till the end of this quite long blog post. I hope you enjoyed reading as much as I enjoyed writing and if you have any questions/feedback please comment or reach out. If you’re keen to check out the webhook implementation, code is available on my GitHub. Look forward to hearing from you!