Hybrid AI and Machine Learning: Letting Computers Talk Back

By Martin Reddy

Artificial intelligence is a heavily overloaded term that means different things to different people. As Benedict Evans notes, AI is often used to describe seemingly magical behavior that we don’t yet understand, but once we do we just call it computation. In recent years, the emerging field of machine learning has become very popular and has demonstrated a lot of promise, with some considering it to be a subset of AI and others considering to be a distinct discipline. For more background on the related topics of AI and machine learning, I recommend watching Frank Chen’s fantastic primer.

In this article, I talk about how artificial intelligence and machine learning techniques can be applied to the challenge of human–computer conversation.

Two Approaches to Computer Conversation

At PullString, we’ve developed a broad technology platform that lets you create computer conversation experiences that users can interact with by typing or speaking in natural language. At the core of this system lies the conversational AI engine.

This area of AI is largely divided into two major camps. There’s the traditional rule-based approach to AI where we use pattern matching, natural language processing, and structured ontologies to map the user’s input into a known set of responses. Then there’s machine learning, and in particular deep learning, where we can use lots of data to figure out how to map user inputs to their high-level intents without explicitly writing code to do so. Both techniques have their pros and cons.

  • Rule-Based AI: In this approach you write explicit rules using a pattern-matching syntax to model the things that a user might say. For example, if you have an intent for “add two numbers”, you might write rules like “what is [number] (plus, and, +) [number]” or “add [number] and [number].” This means that you have to think ahead about all of the ways a user may express a particular concept and then write rules for each case. This can be quite time consuming and you often have to become skilled in the nuances of the particular pattern matching language, but on the other hand you have full control and precision over how your bot reacts.
  • Machine Learning: In this case, you capture lots of data from users actually typing into your system and then use machine learning techniques to infer patterns automatically. For machine learned intents, this is normally done by providing a set of positive, and, sometimes, negative examples of user input. While many machine learning algorithms can work in an unsupervised fashion (i.e., they require no manual labeling of data), in order to deduce user intent from example inputs you need to have a human in the loop to update the set of example inputs when the system responds incorrectly and then retrain the model. Also, the advice you tend to receive about how many examples you should provide for an intent is often unsatisfying: give it enough so it knows what you’re talking about, but not too many or it will just match everything. This is the Goldilocks Zone problem of machine-learned intents.
Check out our ebook “Create Convincing Computer Conversations,” which includes a thorough walkthrough of chatbot character development, writing and scripting advice, and writing prompts to sharpen your skills.

Combining The Best of Both Worlds

There’s been a lot of excitement recently around applying machine learning techniques to computer conversation problems, and here at PullString our platform offers several state-of-the-art machine learning capabilities to help you build great chat interactions. For example:

  • We take positive and negative examples of how users phrase various intents (such as “I want to order a pizza”) and use these labeled data to infer the underlying rules or classifiers for you, so that you don’t have to write them yourself.
  • We provide the ability to learn new synonyms or related concepts using unsupervised machine learning techniques on large corpora of human chat logs, such as automatically figuring out that “thx” is synonymous with “thanks” in current chat vernacular.
  • And we use machine learning techniques in our analytics offerings to be able to surface interesting trends in the way your users interact with your content, such as suggesting new intents that you may not have considered.

However, we also believe that rule-based techniques offer a lot of important benefits too, because they give you direct and precise control over how your bots respond to your users. Machine learning isn’t perfect 100% of the time, so it’s important to be able to correct it when you need to. As a result, we’ve adopted a hybrid approach to give you the best of both worlds: you can always write rules to express exactly what you want, but you can also use machine learning to automatically learn and fill in the gaps when that doesn’t work.

Example Intent in the PullString Author Intent Dialog.

The Benefits of a Hybrid Approach

Unless you’re a large technology giant, you likely have no user data when you first start out (or you may have privacy constraints that limit the data you can collect) so a rule-based approach may be your best — or perhaps only — option. After all, if you don’t have data, then machine learning isn’t going to help you very much. As some context, Facebook’s AI Research (FAIR) team estimated that they would need many millions of queries to optimize a single restaurant booking intent. As a result, you often see the strategy of bootstrapping machine learning systems. For example, some companies will release experiences that have real humans answering questions behind the scenes until they’ve collected enough data to start relying on machine generated responses. Another approach is to pay people on microworking platforms like Mechanical Turk or CrowdFlower to manually label your data for you before releasing your system to the general public. However, there are also situations where these approaches are not feasible. For example, we produced an experience on Facebook Messenger for the Call of Duty Infinite Warfare reveal. This had to be developed in a few short weeks and was only meant to be live for a couple of days. PullString’s rule-based capabilities were critical to being able to deliver this exceptionally engaging experience, which saw an impressive 6 million interactions over the first 24 hours. We’ve found that you can do a pretty good job writing your own rules for intents. For example, we found that we were getting over 80% accuracy with rules designed by authors in our Jessie Humani experience on Facebook and Skype. However, once you’re live in market and are able to collect data from actual users, you can train the machine learning algorithms to get that number into the high 90s.

Machine-learned synonym suggestion within PullString Author.

Putting Control Into Creative Hands

Another part of PullString’s philosophy toward AI is that we believe creating great experiences with appealing personalities requires the hand of a creative author in the process. Unique companies and brands don’t want cookie-cutter interactions; they will want to craft a voice and persona that represents their brand and appeals to their customers. As the field of computer conversation grows, we believe there’s a need for a “Photoshop” or “Word” content creation tool of the space. That’s why we’ve spent over 50 person years of engineering effort building a professional authoring environment for conversation, called PullString Author: to let talented nontechnical writers craft the characters, personalities, and story arcs that users will resonate with.

The PullString Author application for creating computer conversations.

This creative-driven philosophy extends to our use of AI too. We will use machine learning to help you improve your intents and give you analytical insights into your users’ experiences, but we always put the content creator in the front seat: if the machine learning system doesn’t do quite the right thing, you the writer can always craft a rule to fix it right there and then. In a pure machine learning system, you just have to hope it will eventually learn the right thing after you feed it even more data. That level of control is incredibly important. We’ve all seen some of the things that can go wrong when state-of-the-art machine learning algorithms are left to modify content or infer the user’s intent without human moderation. But also, different brands may want to customize the language and tone that they understand and respond back with — Coke doesn’t want to sound like Pepsi and Uber doesn’t want to sound like Lyft — so putting the author in the loop lets companies control their own unique and specific voice.

There’s no doubt that writing rules for each of your intents can be a cumbersome task, so to make that part of our hybrid offering easier to use, we ship a default intent library with PullString Author that covers many common intents and synonyms out of the box. This covers things like yes/no answers, common greetings, parsing out names or numbers, detecting swearing, and so on. In this case, we’ve spent the time tuning and honing these intents and synonyms to take the grunt work out of your effort to build great chatbots and conversational agents.

Conversation Is More Than Understanding Intent

Creating engaging conversational experiences is much more than just figuring out the intent of the user. The flow and cadence of how your bot responds back to the user is just as important. That’s why PullString also focuses on providing extensive dialog management capabilities. Context is important, so you can define intents at any depth in the conversation and you can compose multiple intents to detect more specific cases. As any comedian knows, timing is crucial, so we let you define time-based prompts and schedule timers in your content. Since conversations don’t follow strict linear paths, we support interjections that can switch to different topics and segues that can take you back to previous topics. A complex experience may have multiple stages or levels, so we provide the concept of activities, each with their own conversational contexts. General question answering requires access to knowledge bases, so we provide a web services feature that lets you connect to external APIs. Character responses should be rich and relevant, so we allow multiple responses at any point and provide the ability to customize responses to the user’s environment. And despite our proficiency in natural language interactions, we recognize that goal-oriented experiences often benefit from more efficient interactions, so we also provide integrations with cards, menus, and structured messages on messaging platforms like Facebook, Microsoft, and Slack.

Putting this all together, the PullString Platform gives you all the precision of pattern matching with the ease of use of machine learning; it gives you a professional authoring environment that puts you in full control of your experiences; it gets you up and running quickly with a default intent library and gives you rich analytics for insight into your data; and it offers complex dialog management features to support intricate nonlinear branching. If you’ve ever wanted to create your own artificial text or audio characters, then now you have the power in your own hands!