Detecting addresses with Watson Assistant

Published in

IBM watsonx Assistant

7 min readJul 1, 2021

One of the struggles of building a chatbot is capturing user addresses. While any teenager can understand that “45 Pond Lane, Springfield, MA” is an address with street and city information, it’s surprisingly hard for chatbot to discern the same.

Why is it hard? Consider this address: “4310 Illinois Ave, Somerville, MA”. Because the street portion is the name of a state (“Illinois”) a rules-based algorithm probably would stumble to parse things correctly.

In an earlier article Detecting names and locations with Watson Assistant I introduced the concept of Contextual Entities in Watson Assistant, and illustrated how they can capture user names or locations. While location detection and address detection may seem similar, training a system to detect and extract elements of an address is a different challenge. Over the years I’ve seen customers tackle this problem in several ways — there’s no perfect way to do it, but with this post I’ll explain several approaches and describe the levels of implementation difficulty and reliability.

Collect and parse

This pattern provides the most conversational feel. We are all familiar with situations where someone asks, “What is your address?” and we respond:

1320 Maine Blvd (pause)
Springfield (pause)
Connecticut (pause)
03114

This interaction is so common that we instinctively know when to pause to let the person on the other end record the information. We also know when we’ll need to spell things out. Unfortunately, virtual assistants have a harder time parsing the information when humans act this way. Thankfully, Watson Assistant has the features and capabilities to make this relatively straightforward — once you know how.

Let’s return to contextual entities (as described in detail in this post). Contextual entities allow the author to annotate sections of text to give special meaning in specific parts. In the address detection use case, once a sufficient number of annotations are provided the system beings to learn what an address looks like. The system starts to understand that users typically provide the street address first, and a street address will usually have digits at the beginning. The system learns that users regularly enter two upper case letters to communicate the state, and so on.

For example, in the case of “4090A Charlton Road, Sturbridge MA 15466” you will want to tell the system that “4090A Charlton Road” is the street address, “Sturbridge” is the town, “MA” is the state, and “15466” is the customer’s postal code.

As with most machine learning techniques, the results from contextual entities are only as good as the training data provided. If all of the annotated examples have the address in a similar form — for example, if we only provide three-digit house numbers — the machine learning model will stumble when it encounters a five-digit house number. Another example: If we only provide street addresses that use “St” as the suffix, the system may not accommodate something that ends in “Rd”. You also need lots of examples of unusual spellings and errant punctation (“120A Olde Towne Ct..”).

This means it takes a lot of training data, with all the variations we encounter in the real world, for the model to perform well.

Luckily there are a number of publicly available datasets for training. In my case, I downloaded the examples published here by Eric Kelly. I converted these examples to the CSV format preferred by Watson Assistant and imported the content to create my own intent. Once imported into Watson Assistant I annotated all of the segments in the addresses, making sure not to annotate any of the punctuation characters that are not a part of the address itself. I annotated 139 addresses telling the system which part is the ‘street, ‘city’, ‘state’, and ‘zip’.

These contextual entities work very well when the user enters their entire address in one line — e.g. “140 Libby Way, Braintree, Ma 02140”. However, experience tells us that users often omit parts of their address accidentally or unwittingly. In such situations it is necessary to use a feature called slots to prompt the user for omitted pieces. For instance, if the user enters “140 Libby Way, Braintree”, we will make the system prompt for the state and postal code. This hybrid approach — machine learning parsing magic, plus deterministic steps— leads to a robust, pleasant user experience.

It’s not without drawbacks, however. This solution relies on the training data provided, and there will inevitably be situations where the user’s address causes mixed signals within the model, in which case the street address number may be confused with the zip, or vice versa, for example. There are techniques that can remedy this, but in some cases it can require quite a bit of trial and error, and at some point the bot author must ask is the added overhead of a “perfect” solution is worth the time and effort.

Use a form

As the pattern name suggests, the chatbot author takes the least risky approach and simply implements a form. While this approach might seem like “cheating” if we’re talking about machine learning, there are several merits. Even as user interfaces have evolved radically since the earliest days of computers, users remain very familiar with entry fields for address components. We instinctively know the layout of these forms, and have muscle memory to jump from one to the next easily. Better yet, modern browsers help populate some of these fields automatically, making it even easier. This approach also significantly limits user error. Fields can be individually validated, with certain fields being required. The user can be prohibited from advancing to the next phase of the chat without the required details. With the form approach, the chatbot is designed to return specific metadata at certain points in the conversation. In the case, of Watson Assistant this metadata is returned as an “action” defined in the JSON response for a given dialog node. When the chatbot author knows they need to prompt the user for their address, the author will return an action that the client application will be “listening” for. Once the client application encounters a specific action e.g., collect_address, in the response, the application will show an address entry form to the user. In most applications the address fields (street_address, city_address etc…) are stored as context variables in the session. Watson Assistant’s /message API can also fill context variables as the conversation progresses. With the form pattern, the client application can directly set the appropriate context variables based on the fields populated by the chatbot user. The drawback of course of this approach is it removes the conversational element of the chatbot, forcing the user to resort to “old school” data entry approaches. For chatbot authors who wish to design a conversational experience there are other options available, as outlined below.

Explicit parsing

With this pattern the chatbot author prompts the end user for individual component of an address: street number, street name, street name part 2, city, etc. This approach requires the least effort for the chatbot author. The author can use a mix of contextual entities and regular expression entities to ensure that the user is entering the required information correctly.

With this approach the bot is configured to prompt for each piece of information separately and explicitly. For example, the author can easily validate that the user has entered one of the accepted suffixes for street addresses like “St”, “Ave, or “Blvd”. Or they can match a Zip code against the USPS database.

If the user is asked to enter their entire address at once, it becomes difficult for the chatbot author to effectively validate the user’s input. From an implementation point of view this pattern is relatively straightforward. Each dialog node can check for some specific validation criteria — in the case of street address the author can use a pattern entity similar to ^(\d+)\s+(.*)\s+(st|blvd|ave|rd)$; for the city address the user’s input can be stored verbatim; for the state address a dictionary could be used; etc.

While this pattern provides the chatbot author with a relatively straight forward implementation, of the three patterns outlined in this article, it provides the least pleasant experience. The bot doesn’t appear intelligent, only capable of collecting one piece of information at the time. However, of the three patterns presented it is likely the easiest and quickest to implement.

As is the case with any software application, starting small and expanding is often better than trying to solve all problems in the first version. So this is a good place start.

As I showed in this article, there are several approaches to collect address information from chatbot users. There are pros and cons with each approach, and it’s up to you as the chatbot author to decide the best fit. In preparing this article, I built skills (code patterns) to support each approach. You can find the skills in this github repo. Feel free to download and use the supplied reference implementations.

Detecting addresses with Watson Assistant

Collect and parse

Use a form

Explicit parsing

Written by Dan O'Connor