A bottom-up approach to NLU
An important aspect of conversation design is understanding your customers’ intents. What are your customers asking? What problems do they have?
To solve this, access to real conversational data is critical — without it, you’re pretty much playing a guessing game; you can brainstorm the most common intents with your team, but correctly addressing the long tail specific to your domain is next to impossible.
However, access to conversational data isn’t enough: without proper tooling you’ll find yourself manually sifting through transcripts of conversations with absolutely no idea on where to start and when to stop, what utterance constitutes a valid intent vs. what is noise etc.
The typical approach to this problem has been to apply unsupervised clustering techniques.
There are two clear problems with unsupervised clustering as an approach to discovery and training of intents:
- A first obvious problem is that clusters will often overlap (see image above), and represent similar / same intents, requiring a manual intervention to disambiguate them.
- A less obvious but more fundamental problem, is that unsupervised clustering techniques do not say anything about how abstract or specific the intent generated from a given cluster should be.
For example, a cluster with utterances similar to “how can I transfer funds to my checking account?” could be assigned to any one of the these 3 labels, from most abstract to most specific
- Has a question
- Has a question > about bank account
- Has a question > about bank account > transfers
Determining which label to apply is a non-trivial problem, as the right level of abstraction for any given intent depends on whether there is sufficient data to accurately train the intent at that level of abstraction.
This is a classic chicken-and-egg problem: you need labeled data in order to correctly label your data.
Bottom-up approach to intent discovery & data labeling
Bottom-up labeling applies the tried and tested divide-and-conquer approach to this problem, with great success. Instead of expecting a human or unsupervised algorithm to correctly “predict” what intents and abstractions exist in the data, it provides a simple framework to iteratively discover this information.
The bottom-up “algorithm” is simple:
- Step 1: Identify a few very high-level intents that can capture most (if not all) of meaning in your data (in our experience, “has a question” and “has a problem” are great starting points).
- Step 2: Label your conversation / utterance data, assigning utterances to one of these high-level intents (the cognitive load at this labeling step is minimal, since the decision boils down to simply assigning each utterance to one of the existing high-level intents)
The outcome of this step is very valuable in itself, as it provides high-quality and domain-specific training data to classify users who “have a question” or “have a problem”.
- Step 3: For every intent (i.e: “has a question”), identify more specific “sub-intents” that its training examples can fall into (i.e: “has a question > about credit account”, “has a question > about account settings”)
- Step 4: Re-assign the top-level intents’ training data to the more specific sub-intents you’ve just created
- Repeat steps 3 & 4 (i.e: divide an conquer)
Every step produces training data for classifiers that can recognize increasingly specific intents: this is one of the major advantages of this approach.
What’s the catch?
If this solution to labeling and training data seems too obvious, it’s because it is: divide-and-conquer has been used to break down problems into manageable chunks for a long time; it just hasn’t been easily made available to data labeling and intent discovery use-cases yet.
The main reason for this is a question of tooling and resources: the labeling and refactoring workflows required to make this efficient and manageable at scale are costly to build out, and only the more sophisticated companies have done so — these companies are able to charge customers thousands and thousands of dollars to build and train intents from unstructured data.
There are however some solutions out there focusing on democratizing this approach: HumanFirst is one of them, and provides one of the first out-of-the-box bottom-up labeling and intent discovery solution. In our next article, we’ll explore how machine-learning and semantic search can accelerate this bottom-up approach. Stay tuned!