How To Build Simple AI Assistant With DeepPavlov Dream

Published in

DeepPavlov

9 min readSep 3, 2020

Authors: Vasily Konovalov, Darya Moroz, Daniel Kornev

Probably you already heard about our open-source NLP framework — DeepPavlov Library. It contains a bunch of essential NLP components to build basic dialogue systems. Several months ago together with one of our customers, Intersvyaz, we published a blog post describing how to develop such a basic dialogue system using our DeepPavlov library. However, when you want to have better control over the dialog flow, you want to use a system specifically tailored for building advanced dialogue systems comparable to those like Amazon Alexa or Google Assistant.

Fortunately, designing a platform for these kinds of dialogue systems has been our goal at DeepPavlov. We call this platform DP DREAM. It is an AI Assistant accompanied with a number of developer tools, and it is built on top of our conversational multiskill framework called DP Agent. While it shares DeepPavlov’s modular configurable approach to NLP pipelines, it is specifically created for the dialogue systems. DP Agent introduces a special conversational skill orchestrator that controls the entire dialogue with the user. In this article, I will describe basic and advanced dialogue systems, show how to begin building an advanced dialogue system using DP Agent, and tell you how it all relates to Alexa Prize.

Overview

Dialogue systems have recently become a standard in human-machine interaction, with chatbots appearing in almost every industry to simplify the interaction between people and computers. They can be integrated into websites, messaging platforms, and devices. Chatbots are on the rise, and companies are choosing to delegate routine tasks to chatbots rather than humans, thus providing huge labor cost savings. Unlike humans, chatbots are capable of processing multiple user requests at a time and are always available.

Chatbots can be placed in one of three categories: goal-oriented, chit-chat (open-domain), and mixed chatbots.

Goal-oriented chatbots behave like a natural language interface for function calls, where the chatbot asks for and confirms all required parameter values, and then executes a function. A chatbot on an e-commerce website that can help you to order a product is a typical example of a goal-oriented bot. By using them you can ask about the weather for a specific location, or create a new calendar entry. Purely goal-oriented single-domain chatbots can be considered relatively basic.

Chit-chat and mixed chatbots, however, can be seen as more advanced systems. They require conversational support for multiple domains in the same dialogue, which is a challenging problem.

Pure chit-chat or open-domain bots engage users in a conversation about popular topics such as entertainment, sports, politics, technology, and fashion. This can involve throwing in random trivia, puns, or even memes. Their conversation usually has no goal other than maintaining the conversation itself. A good example of a chit-chat bot is a classic Eliza chatbot.

However, the most advanced chatbots are hybrid as they combine support for solving particular user problems through their goal-oriented components, as well as support more of a non-goal-specific conversation, adding elements of the chit-chat bots. Great examples include Google Assistant, Amazon Alexa, Microsoft Cortana, Xiaoice, and others.

Designing these mixed chatbots is a challenging problem. Not only they require support for multiple classes of intents that cover solving different user needs, but they also require remembering some of the user’s information, and supporting discussing it, worldwide news, information from encyclopedias and other sources, throwing emotional reactions to users’ utterances, and so on.

Balance of focus between supporting open-domain conversations and goal-oriented scenarios defines the purpose of the chatbot. On one hand, Google Assistant and Amazon Alexa are mostly focused on helping users with practical activities, adding just a relatively small number of chit-chat components. XiaoIce, on the other hand, is more concerned with the user’s well-being, and its capabilities thus are more shifted towards the open-domain conversations rather than being utterly practical like its counterparts from Google and Amazon.

ChatBot’s High-Level Architecture

Regardless of the complexity of the chatbot they all have a similar high-level architecture. First, a chatbot needs to understand utterances in a natural language. The Natural Language Understanding (NLU) module translates a user query from natural language into a labeled semantic representation. For example, the utterance “What is the weather in Seattle” will be translated into a machine-understandable form like weather(Seattle). Then the bot has to decide what is expected of it. The Dialogue Manager (DM) keeps track of the dialogue state and decides what to answer to the user. At the last stage, the Natural Language Generator (NLG) translates a semantic representation back into human language. For example, rent_price(Seattle)=3000 USD translates to “The average rent price in Seattle is around $3,000.” The picture below shows a typical high-level dialogue system architecture.

Figure 1. High-Level Dialogue System Architecture

DeepPavlov Dream: AI Assistant Platform

Modern virtual assistants such as Amazon Alexa and Google assistants integrate and orchestrate different conversational skills to address a wide spectrum of user’s tasks. What if you chatbot should integrate different skills, for example, retrieve the weather forecast in a specific region and get the average rent prices? This is where our new project DeepPavlov Dream comes into the picture. DeepPavlov Dream is a platform for development of scalable and production ready multi-skill virtual assistants.

The key features of the architecture are scalability and reliability in a high load environment. Also, ease of adding and orchestrating conversational skills enables us to test different configurations of the dialogue system. The shared memory allows all the components to share a dialogue state.

DeepPavlov Dream combines the following types of services:

An annotator is a preprocessing service in a pipeline. Depending on the task, it can include either simple text preprocessing like sentence segmentation or complex like named-entity recognition, sentiment classification.
Skill is a component in a pipeline that is responsible for a particular logic, for example, retrieve a weather forecast, news, prices; response to a question; and others. Skills can be implemented in any programming language.
Skill Selector is a component that selects a subset of the skills for producing candidate responses. By default it uses all available skills.
Response Selector is a component that selects the best response among all produced responses. By default, the highest score response is selected.
Postprocessor is a final component in the pipeline, and it’s responsible for the postprocessing of the selected response. For example, it can add emojis, personal names, etc.
Dialogue State preserves the dialogue state, and all past dialogue turns between users and a conversational agent. The state supports sharing of stored information across the services.

The components of the DeepPavlov DREAM AI Assistant are depicted in Figure 2.

Figure 2. The components of the DeepPavlov DREAM AI Assistant.

Let’s get back to our case of asking the weather. The user input is an utterance “What is the weather in Seattle”. First, the utterance goes through annotators. In our case, it’s reasonable to apply DeepPavlov NER to identify token Seattle as a named entity (of course, you can use a custom NER component). Then, based on the user annotated input the Skill Selector selects the Skills that should be executed. The weather skill fetches the weather forecast from the server based on the place entity and on time entity. On the next step, the candidate annotator annotates the skill response (to prevent toxic comments and other undesirable responses in the case when they were fetched from the social data). The response selector selects the best possible response usually based on the model confidence. Finally, the response annotator composes the final output.

Just to remind you, DP Dream is powered by DP Agent and DP Library. More technical details can be found in the DP Library and DP Agent docs.

Deploy Simple Skill

Now you know enough theory to deploy your first simple AI Assistant derived from the bigger DP Dream AI Assistant. As a demo skill we will use Valentine Day skill that was adapted from it. To begin, clone the repo with the demo skill.

git clone https://www.github.com/deepmipt/dp-dream-demos.git
cd dp-dream-demos

Run docker-compose

docker-compose -f docker-compose.yml up --build

Now everything is up and ready to receive requests.

curl --location --request POST 'localhost:4242' --header 'Content-Type: application/json' --data-raw '{"user_id": "name", "payload": "what is love"}'{"dialog_id": "0b552baaef9360619ca9140301354e09", "utt_id": "dd54d34244e9f568f362277bec263334", "user_id": "name", "response": "Baby don't hurt me. Don't hurt me. No more", "active_skill": "", "debug_output": []}curl --location --request POST 'localhost:4242' --header 'Content-Type: application/json' --data-raw '{"user_id": "name", "payload": "who do you love"}'{"dialog_id": "0b552baaef9360619ca9140301354e09", "utt_id": "a3b8484d2166bfb934dfb7f3499373c4", "user_id": "name", "response": "This is a big secret, I can say that he is cute man.", "active_skill": "", "debug_output": []}

The diagram of the demo system is depicted on Figure 4, where Skill selector by default transfers the input to all defined skills and Response Selector by default picks the highest score response, and there are no annotators.

Figure 4. The Diagram Of The Simple AI Assistant Demo System

The next step will be to incorporate this installation into your product, and continue building skills and individual components to enhance product’s quality. In the coming months we will continue sharing and open-sourcing more and more of our DeepPavlov Dream AI Assistant skills and components with you. Stay tuned!

DREAM Team on Alexa Prize Socialbot Challenge 3

For the last three years, Amazon has been testing the capabilities of the voice platform known as Alexa. The Alexa Prize Socialbot Grand Challenge is a competition for student teams dedicated to the development of conversational intelligence.

The Alexa Prize Socialbot Grand Challenge is a part of Amazon’s mission to make the voice assistant smarter, talkative so it can become useful and exciting for users. But Amazon is not the only company in this business: Google, Apple, and Samsung are also working on their assistants.

In the challenge, the team had to deal with:

Infrastructure tasks: To work, the bot has to be deployed and tested somewhere, and it is also necessary to load and run programs and models.
Research tasks are the most creative part of the whole process: it is necessary to come up with new models that will be integrated into the dialogue with the user. The full cycle ranges from finding a problem to formulating a research problem, collecting data, or searching for existing ones, creating a baseline model or improving existing models, and comparing them by metrics.
Business tasks include the work directed to interest the user during a conversation.
Data collection tasks — parsing news sites, more interesting and popular news, movies, ratings, and reviews on them.

Students from DeepPavlov participated in the competition as the DREAM Socialbot team. The team used the DeepPavlov Agent that makes it possible to design a multiskill socialbot. Participation in the competition required the DREAM Socialbot not to cause any disappointment, say stupid things, nonsense, or insult the user. Also, the bot had to refrain from fat jokes, hate speech and homophobic content. On the contrary, the bot should be mindful and respectful on controversial topics and language.

This competition helps to introduce a new generation of computer scientists and engineers to the development of complex dialogue systems. Besides, we developed DeepPavlov Agent, considering the competition requirements and approaching the highest standards of conversational AI.

Conclusion

In this article, we described what is a dialogue system, and how you can develop a simple AI Assistant using the DeepPavlov Dream platform. You can find out more about DeepPavlov Dream in the announcement blog post and its page on our site; you can learn more about the underlying DeepPavlov Agent in the project docs and our site. The details of our participation in the Alexa Prize 2019 can be found here. And do not hesitate to ask any questions concerning the framework in our forum.