The King is Dead — Long Live The King, or welcome to Deepy 3000!

Daniel Kornev
Dec 8, 2020 · 12 min read

Authors: Daniel Kornev, Fedor Ignatov, Alexander Dmitrievsky, Oleg Serikov, Dilyara Baymurzina, Mikhail Burtsev

Winter is here. We are writing this post with both sadness and excitement. In the beginning of September we have launched a public demo of our AI Assistant called Dream, built on top of our original Alexa Prize 2019 DREAM socialbot. Today, we’re sad because we had to pull the plug and disable this public demo. The king is dead.

Yes, king is dead, and for a good reason: we are participating in Alexa Prize (again!).

While the bot we’re building for the contest will eventually evolve a lot, right now it’s based off the one we had running as a public demo. We simply can’t keep the same bot running both publicly and as an anonymous bot in the Alexa Prize 2020 challenge, even if later they’ll differ a lot. However, this not only doesn’t stop us from making Dream 1.0 available to you, this just accelerates our plans for open sourcing it. Although we can’t use Dream 1.0 per se (as it has a specific and a bit recognizable personality), we can and we will continue shipping its common components.

But as we planned earlier, some general purpose components such as annotators will be included into the DeepPavlov Library, with Intent Catcher being the first one coming hopefully in the next release. Skills will be published into an open source repository in our GitHub, and you’ll be able to clone it, pick the config and needed skills, and run them in your environment. For this repository, we will use our trusty Deepy 3000 as the simple Multiskill AI Assistant. This also means that we will use our Deepy 3000, a Moonbase A.I. Assistant originally featured at NVIDIA GTC Fall 2020, as our public demo!

Live long the king! Check out the trailer below and play with the demo right now!

Deepy 3000 Demo: Build Your Own Moonbase A.I. Assistant with DeepPavlov Dream!

What Is Deepy 3000?

Deepy 3000, or simply Deepy, is currently a very simple multiskill AI Assistant demo, with just two skills (goal-oriented one, written using our Go-Bot framework, and a chit-chat one, written using AIML), and a few annotators.

Scenarios

Deepy 3000 was designed to imitate a Moonbase AI assistant, Gerty 3000, from the Moon Movie. Why? We wanted to dream up to something likeable. In the modern sci-fi movies Gerty is one of the most lovable AIs.

Gerty 3000 as seen in the “Moon Movie” by Duncan Jones. © Sony Classics, 2009.

And so we came with this idea to Duncan Jones, the Moon Movie’s creator. Surprisingly, he gave us his blessing to us:

Now, unlike Gerty, our Deepy doesn’t have a physical body, and only lives in the cloud (and it can run on your PC, too!). But like Gerty, Deepy can help with solving tasks and having a simple conversations.

Of course, for the purposes of this demo the number of tasks and conversations has been significantly reduced. After all, although we’d love to build a Moonbase AI assistant, we don’t have a Moonbase; at least just yet!

Harvesters Maintenance Skill

This is a goal-oriented skill. We have two versions of this skill; first is a handcrafted skill with simple intent detection and slot filling, the second one is implemented using our Go-Bot framework.

Below is a list of some top intents we have defined for the Harvesters Maintenance Skill:

  • Tell the status of harvesters (illustrates search for data in the database),
  • Tell the status of a given harvester (illustrates slot filling),
  • Prepare rover for a trip (illustrates a call to an action in the external system),
  • and others.

Chit-Chat Skill

For a Chit-Chat Skill, we decided to use AIML. While our DREAM socialbot was designed to support a complex chit-chat, with factoid lookups and so on, a very simple Chit-Chat solution can be created using AIML. Check out Mitsuku as an example of significantly complicated chatbot made using AIML.

In our Chit-Chat Skill, we wanted to support a number of scenario-driven conversations, discussion of bot’s profile, and a few other things:

  • Where am I? (scenario-driven conversation),
  • What do I do here? (scenario-driven conversation),
  • Who are you? (bot profile),
  • Who made you? (bot profile),
  • and others.

Emotions

Original Gerty 3000 is well-known for its non-emotional voice and a screen that shows Gerty’s emotions.

For Deepy 3000, we wanted to imitate this emotional connection between the assistant and its user. We used all emotional reactions provided by Gerty 3000’s original designer, Mr. Gavin Rothery, from his website:

Gerty 3000: Emotions

Initially we wanted to use our Emotion Classification annotator from DREAM socialbot, but while it’s useful for chit-chat messages, it’s useless when neutral answers are provided by the goal-oriented skill.

We have ended up developing a bit more sophisticated solution:

Assistant Level

  • Some key phrases in Chit-Chat Skill were additionally annotated by emojis; these emojis are used then to change Deepy’s face,
  • Emotion Classification annotator’s data (example):
{
“anger”: 0.46746790409088135,
“fear”: 0.3528013229370117,
“joy”: 0.3129902184009552,
“love”: 0.2804321050643921,
“sadness”: 0.35413244366645813,
“surprise”: 0.19576209783554077,
“neutral”: 0.9979490041732788
}

UI Demo Level

  • Some key responses from the Harvesters Maintenance Skill were post-annotated in the UI demo app we have built for the demo,
  • Final decision which emotion to show is made at the UI level before giving an output to the system.

Note: in one of the upcoming versions of Deepy 3000, we will add a custom Candidate Annotator that will properly use these annotations to provide a coherent response to the user w/o any additional “hacks” on the UI Layer.

ASR & TTS

For the purposes of our demo we wanted to show that you can use our system with the built-in ASR & TTS NeMo library coming from our partner, NVIDIA.

To further imitate Gerty 3000, we have taken a few voice recordings from the movie and applied them in our adaptation of the “Real-Time Voice Cloning” project built by Corentin Jemine.

To further enhance the output of the NeMo ASR module, we have added a spell checking preprocessing module taken from our DeepPavlov DREAM Socialbot.

Deepy 3000 Architecture

Architecture of Deepy 3000 is very similar to the original DREAM Socialbot’s one; we use the same DeepPavlov Agent as a mechanism for Conversation Orchestration, we use the same pipeline. However, as said above, the number of used components has been significantly reduced to illustrate the idea and give the system to you.

To build your own multi-skill AI assistant with DeepPavlov Conversational AI technology stack, you need to define what kind of a system you want to build:

  • goal-oriented (multiple goal-oriented skills),
  • chit-chat (multiple chit-chat skills),
  • hybrid (mix of different goal-oriented and chit-chat skills).

Note: you can easily build a working system using just one custom skill. We illustrated it in our first demo of DeepPavlov Dream technology in the blogpost “How To Build Simple AI Assistant With DeepPavlov Dream”. While it won’t be a multi-skill one, it can still be beneficial to you if you want to use DeepPavlov Agent’s pipeline to properly augment user’s input and skill’s candidate responses using our annotators, and use your own approach to filter out candidate responses based on the candidate annotations.

Skills

In case of Deepy 3000, we wanted to illustrate a simple mixed multiskill AI assistant by adding both goal-oriented and chit-chat skills into the system:

  • Harvesters Maintenance Skill,
  • Chit-Chat Skill (based on AIML).

We also added a last chance service and a timeout service (not shown in the picture below) to keep system responsive in case both skills failed to provide a candidate response.

Slide taken from Daniel’s Conversations AI 2020 talk

For a comparison, here’s the architecture of the original DREAM socialbot we have built for the Alexa Prize 3:

Slide taken from Daniel’s Conversations AI 2020 talk

Note: As you can see, in DREAM socialbot (technical report) we had about 25 different skills of different types, about 10 annotators, we annotated both user input and candidate responses coming from the skills, and we also used custom skill selector to pick the best skills for execution, as well as custom response selector to pick the best response. We also used response annotators to further improve the output before returning response back to the user (via our Alexa skill or Telegram chatbot).

After defining which skills you want, you need to figure out which annotations you need.

Note: As you can see in the original DREAM Socialbot’s architecture, we used a large number of annotators across the entire system. To detect high-level intents we built our own Intent Catcher (that we will soon release in our DeepPavlov Library); to obtain mentioned entities we used our DeepPavlov NER component; etc. Given that we used some retrieval chit-chat skills we employed annotators that identify hate speech and toxicity of the response candidates.

For Deepy 3000 we needed just a handful of annotators. We used a Spell Checking Preprocessing Annotator from DREAM socialbot to improve the incoming speech or text from the user. As said above, we also picked the Emotion Classification annotator from DREAM socialbot to give emotional reactions based on bot’s responses.

Deepy 3000: Docker Config

Once we’ve got all of the components we needed, we outlined their list in the Docker Config:

Slide taken from Daniel’s Conversations AI 2020 talk

Above is a high-level perspective. In your Docker Config (`docker-compose.yml`) you should list all of the services you need; they include components of your Multiskill AI Assistant (skills, selectors, annotators), DeepPavlov Agent itself (w/ dependencies), and other supporting services.

In case of a Deepy 3000, we added:

  • 2 skills (harvesters_maintenance_skill as a goal-oriented skill, and program_y as a chit-chat skill),
  • 2 annotators (emotion_classification and spell_checking),
  • custom rule-based response selector,
  • DeepPavlov Agent (and its Mongo DB as a dependency),
  • NeMo ASR,
  • NeMo TTS,
  • Custom Voice Cloning TTS (not shown on the picture above).

Here’s a link to a complete Deepy 3000’s docker-compose.yml file with handcrafted goal-oriented skill.

Here’s a link to a complete Deepy 3000’s docker-compose.yml file with Go-Bot-based goal-oriented skill.

Deepy 3000: DeepPavlov Agent Pipeline

Now that we added all of the components to our docker-compose.yml, it’s time to add the core ones (skills, annotators, response selector) to the DeepPavlov Agent’s Pipeline.

Logically, its pipeline can be seen as built from left-to-right (as shown in the Architecture slides in the previous sections) or from top-to-bottom (as shown in the DeepPavlov Agent’s docs):

DeepPavlov Agent Architecture

Agent’s pipeline_conf.json is a representation of this architecture; it includes all of the services, marks the dependencies between them, specifies how each of these components can be accessed by the system, and defines what is coming in and what is coming out.

Here’s an excerpt from the slides showcasing Deepy 3000’s Agent Pipeline:

Slide taken from Daniel’s Conversations AI 2020 talk

Connectors: Basics

You can add a component to DeepPavlov Agent as either a Python code or as a HTTP endpoint. For that, you’ll need to learn a few things about Agent’s so-called Connectors:

Definition: Connector represents a function, where tasks are sent in order to process. Can be implementation of some data transfer protocol or model implemented in python. Since agent is based on asynchronous execution, and can be slowed down by blocking synchronous parts, it is strongly advised to implement computational heavy services separate from agent, and use some protocols (like http) for data transfer.

By default, DeepPavlov Agent supports two types of Connectors: HTTP and Python. However, given it’s an open source system, you can always extend it by adding support for other types like Sockets.

While sometimes it is useful to use Python directly (especially for debugging purposes), we strongly recommend running components as independent HTTP services. This will make your solution more robust and easily scalable.

In our case, we used HTTP connectors for all of the services excluding the Last Chance and Timeout services which are the built-in services.

Note: You can learn more about all types of supported services in the DeepPavlov Agent’s docs here: Services HTTP API.

Components: Working With Dialog State

Each such component receives a Dialog State (or a part of it), and changes it as the result of it’s execution. For example, skill can write its list of hypotheses, user input annotator can write its annotations (e.g., extracted entities or sentiment analysis results).

Note: Each component can obtain an entire Dialog State, or just a piece of it (e.g., current user’s utterance). It is useful to get an entire Dialog State in a skill, and it might be more practical to send only current user’s utterance to an annotator. Ultimately the decision depends on what is actually needed by the given component. For example, a custom skill might need to know about user’s and/or bot’s profile, past conversation history etc. In this case, having an access to an entire Dialog State makes sense. In contrast, a custom annotator that extracts mentioned entities using NER model only needs current user’s utterance, so it makes sense to write a custom Dialog State formatter that provides only current user’s utterance to the given annotator.

Components: Putting Them All Together

In each component (or service, in Agent’s terminology) you should also specify:

  • how the Dialog State is provided to it (via a dialog formatter),
  • how the response is formatted prior to giving it to the Agent (via a response formatter),
  • Agent’s State Manager method (see docs),
  • (optional) a list of individual or a group of components (services) it depends upon.

Here’s a link to a complete Deepy 3000’s pipeline_conf.json file with handcrafted goal-oriented skill.

Here’s a link to a complete Deepy 3000’s pipeline_conf.json file with Go-Bot-based goal-oriented skill.

Deepy 3000: Goal-Oriented Skill

To implement the Goal-Oriented Skill we used two different approaches. Below is the one which uses our Go-Bot framework:

Slide taken from Daniel’s Conversations AI 2020 talk

This is a Flask-based Python web app. It has one main POST endpoint (`/respond`) and a few others. Agent uses this endpoint to “talk” to the skill. As it is a skill, we provide it with a full access to the Dialog State:

dialogs = request.json[“dialogs”]

Given that we have used Spell Checking annotator for user’s utterances, here we obtain not the original human’s utterance, but the results of the Spell Checking processing:

sentence = dialog[‘human_utterances’][-1] [‘annotations’].get(“spelling_preprocessing”)

As this is a Goal-Oriented Skill, we use a GoBot Wrapper we have built for this demo to obtain the detected id of skill’s response alongside with the corresponding confidence:

uttr_resp, conf = gobot(sentence)
response = gobot.getNlg(uttr_resp)

After that, we return both the generated response and its confidence to the Agent:

responses.append(response)
confidences.append(conf)
return jsonify(list(zip(responses, confidences)))

Deepy 3000: Chit-Chat Skill

To implement the Chit-Chat Skill we used an AIML-based approach:

Slide taken from Daniel’s Conversations AI 2020 talk

What’s Next?

Now that you’ve got to know Deepy, we welcome you to check out its brand new repository:

Feel free to read its Wiki, clone the repository itself, explore it, and run it on your own hardware!

Roadmap

We have outlined our tentative roadmap for Deepy. Check it out in our wiki: Roadmap.

Distributions

We have provided a number of pre-defined configs for 3 different distributions:

  • deepy_base — basic Deepy distribution comprising of two skills (simple goal-oriented and chit-chat skills, Emotion Classification and Spell Checking Annotators),
  • deepy_gobot_base — Go-Bot-based Deepy distribution comprising of two skills (Go-Bot-based goal-oriented and chit-chat skills, Emotion Classification and Spell Checking Annotators),
  • deepy_adv — more advanced Deepy distribution which, in addition to deepy_gobot_base components also includes a few more annotators including Entity Linking, Intent Catcher, and Sentence Segmentation).

In this post, we’ve explored the first two distributions. The third one, deepy_adv, is currently running on our Demo Web Site.

Wrap up

We welcome you to go and try out Deepy 3000, and let us know what you think!

P.S. If you have any questions, feel free to join our Telegram Group, as well as join us on our monthly Community Calls! And don’t forget DeepPavlov has a dedicated forum where all kinds of questions concerning the framework and the models are welcome.

DeepPavlov

An open-source library for Conversational AI