Conversational AI: How Do Chatbots Work?

Anna Prist
10 min readJun 10, 2019

--

There is ample evidence that Artificial Intelligence simplifies many routine things and daily tasks, changing our lives for the better. A growing number of businesses catch the buzzword and take an interest in this technology. And those who’ve put their trust in a broad variety of AI tech possibilities some time ago — can now benefit from better performance and competitive position, particularly in the case of chatbots and conversational AI. In this article we’ll explain why these technologies are in high demand and how does well-done dialogue platform look like.

Historical outline

As you probably may know, Artificial Intelligence is a term of a broad sense. It includes computer vision, predictive parsing, machine translation, and many other areas. Natural Language Understanding (NLU) and Natural Language Generation (NLG) are very promising ones among them and these areas have the highest growth rate. According to the ResearchAndMarkets forecast, the global NLP market accounted to hit the market value of $28.6 billion in 2026.

Creating computers that can understand Natural Language has always been an extremely difficult challenge and today NLU is considered to be an AI-hard problem, meaning that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem — making computers as intelligent as people.

Natural Language Processing (NLP) is a field within data science that describes the interactions between computers and human languages. Its main challenge is to program computers to process and analyze large amounts of natural language data.

Natural Language Understanding (NLU) basically deals with structuring raw data into a worm that machines can understand. Usually, NLU contains intent recognition and named entity recognition tasks. To put it simply, NLU is the ability of a machine to actually understand what the user says.

First chatbots and development systems emerged quite a time ago. In short, it all started with Alan Turing in the 50s, Eliza program in the 60s, linguistics, and machine learning research activities in the 90s (scientists began creating programs for computers to analyze large amounts of data and draw conclusions — or ‘learn’ from the results).

And then there was another big thing in modern history: in 2001 Richard Wallace developed AIML (Artificial Intelligence Markup Language) and built A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) chatbot upon it.

This methodology has been termed a ‘rule-based approach’ and over the next ten years, all the efforts to build a chatbot were just re-engineering and improving this methodology. In essence, notionally meaningful parts of phrases are discovered, coded, and scripting language that enables conversation scenarios is created. Most smart assistants use that approach today. Newest development frameworks are complex systems, which include:

· NLU part, which contains intent recognition and named entities recognition

· Linguistic modules (e.g. morphological analysis, spell-checking, etc.)

· Dialog management modules with local and global context-keeping

· Integrations and external API’s

We got to admit, a great deal of conversational solutions based on that system is quite labor-intensive. You got to put a lot of effort into chatbot to make it intercommunicate on a broad range of topics or to cover a specific discipline with profound knowledge.

These days the situation in that area had changed notably. It happened through the development of algorithms that define semantic similarity and machine learning solutions. That, in turn, made the approaches to text categorization and NLU models training speedy and handy. For example, dialogs that request access to a great amount of external data, discover hundreds of thousands named entities, and integrate with external info systems — still require a lot of effort. But the process of developing complex chatbot became a whole lot easier and intent recognition accuracy recovered pretty well. Messengers and webchats expansion along with perceptible progress in voice simulation and recognition technologies have led to the rapid growth of NLU tech penetration in 2015–2019.

Why are these technologies so popular right now?

Because there are few value drivers that enable its market growth:

1. Call centers

This is the best market to implement NLU algorithms (Gartner says, 25% of customer service operations will be using virtual customer assistants by 2020). Thousands of companies starting with banks, large retailers, and SMBs engage call centers services ­– this way they can serve their customers by efforts of just 2 or 3 support managers. A vast number of routine operations is delegated to Artificial Intelligence:

· chatbots answer model questions (FAQ principle applied)

· in a ‘call steering’ mode ­they reroute the user to the proper department through smart IVR

· works as an advisor — helps call center operators with smart hints and tips

Source: Twilio

All these actions help to reduce staff expenses and enhance call center’s traffic capacity without staffing increases. However, the bot+operator alliance is the most efficient way — agent picks up the phone only when a customer has some complex analytical questions — so he would be able to devote them as much time as needed to solve their issue.

2. Talking devices

3 years ago, Amazon Echo popped out and made everything a little bit more comfy: now Alexa can gently wake you up in the morning, turn on the music, find interesting facts and news, control smart home devices, call a cab, and order a pizza. This is the first mass device that has a good voice recognition system and the ability to actually hear the query even within loud external noises. Then Google announced its Google Home and now they have divided the market in the ratio 3:1 (Amazon is the leader). While China’s market leaders play the game at a much tougher level: every giant Internet company released their own smart speaker — Baidu, Xiaomi, Alibaba, Tencent, and JD.com.

This market is not limited with the smart speakers only — robots, toys, auto devices, smart watches, and smart home appliances — for sure we will see a lot of new and cool tech applications. As a matter of fact, Just AI has 5 ongoing projects right now.

3. Intelligent Virtual Assistants (IVA)

Amazon Alexa, Google Assistant, Apple Siri, Microsoft Cortana, and some others — they all discover user’s intents and run the commands. Virtual assistants are built-into many different smart devices, but the most widespread among them are smartphones and smart speakers. Voice assistants is the most promising product category. Last year smart speaker shipments reached 78 million units worldwide — and that’s 125% more than last year. This makes smart speaker a device with the highest year-to-year increase and its total offtake can be estimated as 120 million devices. That’s why the conversational AI market has intriguing possibilities for business too — virtual assistants may undertake support automatization and become an essential business-customer contact point.

So how does Conversational AI work?

User-chatbot interaction pattern may be presented like this:

First of all, user sends their query into one of the accessible channels — smartwatches, speakers, phones, toys, etc. Behind each query lies intent — user’s wish to get the correct answer or to get the service, product or some content like music or video.

Then after-treatment processing or message format conversion may take place. Dialogue platforms use techniques for text recognition, while some channels may consider voice only. A spoken dialog system comprises an automatic speech recognizer (ASR), a text-to-speech (TTS) synthesizer and integration systems. In some cases, it may be necessary to recognize a person’s voice ­– that’s when biometrics platforms are used. Some channels or assistants support both — natural speech and visual interactive elements like buttons or flypages that can be tapped. For working with these, integration with relevant API is needed.

Query converted to text then transmits to the dialog platform. The goal of the platform is to capture the core semantics of the given sequence of words, to get the intent, to handle it properly, and to give the correct answer or action. For that to happen dialog platforms use a bunch of technological processes like text normalization, morphological and syntax analysis, semantic analysis, hypothesis ranking, named entity recognition, and enquiry generation through API to an external database and info systems. An example of these external systems is any CRM system, contact bases or services like Deezer or Google Play Music. After the data received, the dialog platform generates the answer — a text, a voice message (by means of TTS). Then it starts content streaming or notifies of an action done (e.g. order placement in an e-shop). In case the original inquiry doesn’t provide enough data to perform an operation, the NLU platform starts a clarification dialog to get all the missing criteria and to clear up the ambiguity.

How does query processing logic in a dialog platform work?

illustrated through Just AI Conversational Platform

Since we’re using our platform as an example, it is worth noting that at the top level most platforms have almost the same main features (and here we talk of business skills, not just “chatting for fun” bot feature). A common pattern of our platform’s work may be represented like this:

The main loop of a processing client’s request consists of the following actions:

1. A system receives client’s request into a dialog control module — DialogManager.

2. DialogManager takes the context of dialog from the Dialog State database.

3. The client’s query (along with the context) goes to processing into the NLU-module, where user’s intent identified. In case of non-textual instances (buttons, etc.) this step may be skipped.

4. Drawing on the dialog scenario and the data extracted, DialogManager defines the next most suitable mode (box, screen, dialog page) that fits the client’s utterance to the fullest extent possible.

5. Business logic (scripts) performance according to the chatbot scenario.

6. External infosystems activation (in case they are preprogrammed in business logic).

7. Text reply generation using macro replacement and word matching coordination functions.

8. Context and settings of the dialog save in Dialog State DB for processing further queries.

9. Sending the response to the client.

Dialog control where the overall context is determined — is an important part of a system activity process. Through this process, one or another phrase would be understood differently, depending on who said it and what extra data was provided (e.g. user’s location). In some systems, DialogManager controls slot-filling (context filling-in with necessary data that may be drawn out of the client’s phrases or previous context or it may be requested from the customer). In our system, these functions are brought to the scenario level, so that it would be under full control of a bot’s developer.

The most complicated work stage here is the analysis of the utterance. This process is called Natural Language Understanding (NLU), which actually means understanding the meaning of a phrase. In its simplified form the process of understanding the language consists of these big steps:

· text pre-processing

· intent recognition

· named entity recognition

And that’s where significant differences between various platforms are hidden. Someone uses classic machine learning or deep learning techniques, someone’s just fine with the regular expressions and formal grammar, someone relies upon third-party services.

What the dialog platform should consist of?

Modern complex dialog platform (Conversational Platform) should include multiple functions and process modules. In simplified form they may be represented as:

The more integrations a platform has — the less time and effort will it take to build a new skill on it. The availability of a rule-based syntax will speed up chatbot development. Besides, separate tasks in dialog control are not even realizable without formal rules. Classification schemes and machine learning accelerate the process because they analyze a great number of log files for quite a short time. When integrated altogether into one unified system, different development methods may be combined within one project.

Visual skill design tools help to speed up skill development, simplify the debugging process and visualize further user-system conversation flow. Dialog platforms have a few very important, though not really obvious characteristics. These are sentiment analysis, rich and deep analytics, special filters (e.g. for expletives), multilanguage support, context keeping, algorithm’s accuracy, productivity, scalability, and stability. And these characteristics should be taken into account when creating a smart chatbot.

The voice era is now

According to the Capgemini study, 40% of consumers will use a voice assistant over a mobile app or website; large companies and SMEs striving to acquire new customers are contributing intensely to conversational AI. From supply chain to customer support, conversational AI is emerging as the new go-to technology to enhance efficiency, productivity and customer experience.

Developers embrace new platforms and services to build, train and host AI-powered chatbots and skills for smart assistants. Thanks to well-designed tools the process of voice assistant development is quite simple; some of these platforms don’t even require coding skills. And the more people involved in technology development, the faster and more intriguing it will evolve. That means pretty soon there will be even more interesting skills and ways to interact with the digital world, making 2019 the year of opportunity, challenge, and change.

--

--

Anna Prist

I write of great minds and smart machines that change the world for a better future