DeepPavlov: An open-source library for end-to-end dialogue systems and chatbots

Sep 18, 2019 · 9 min read

A guest post by Vasily Konovalov

DeepPavlov images compilation
DeepPavlov images compilation

Dialogue systems have recently become a standard in human-machine interaction, with chatbots appearing in almost every industry to simplify the interaction between people and computers. They can be integrated into websites, messaging platforms, and devices. Chatbots are on the rise, and companies are choosing to delegate routine tasks to chatbots rather than humans, thus providing huge labor cost savings. Unlike humans, chatbots are capable of processing multiple user requests at a time and are always available.

However, many companies don’t know where to start when developing a bot to meet their business needs. Historically, chatbots can be divided into two large groups: rule-based and data-driven. The former relies on predefined commands and templates. Each of these commands should be written by a chatbot developer using regular expressions and textual data analysis. By contrast, data-driven chatbots rely on machine learning models pretrained on dialogue data.

In this article I will explain how to develop chatbots with DeepPavlov, and why TensorFlow is an indispensable tool in doing so. DeepPavlov is built and maintained by the Neural Networks and Deep Learning Lab of the Moscow Institute of Physics and Technology. DeepPavlov is a winning submission to the #PoweredByTF 2.0 Challenge.The code for this article can be accessed on Google Colab.

Architecture of dialogue systems

To keep things simple, let us start with the most basic elements of dialogue systems. First, a chatbot needs to understand utterances in a natural language. The Natural Language Understanding (NLU) module translates a user query from natural language into a labeled semantic representation. For example, the utterance “Please set an alarm for 8am” will be translated into a machine-understandable form like set_alarm(8 am). Then the bot has to decide what is expected of it. The Dialogue Manager (DM) keeps track of the dialogue state and decides what to answer to the user. At the last stage, the Natural Language Generator (NLG) translates a semantic representation back into human language. For example, rent_price(Atlanta)=3000 USD translates to “The rent price in Atlanta is around $3,000. The picture below shows a typical dialogue system architecture.

Building dialogue systems with DeepPavlov

The open-source conversational AI framework DeepPavlov offers a free and easy-to-use solution for building dialogue systems. DeepPavlov comes with several predefined components for solving NLP-related problems. The framework allows you to train and test models, as well as fine-tune hyperparameters. It supports Linux and Windows platforms, Python 3.6, and Python 3.7. You can install DeepPavlov by running:

pip install -q deeppavlov

The DeepPavlov models are defined in separate configuration files under the config folder. A config file consists of five main sections: dataset_reader, dataset_iterator, chainer, train, and metadata. The dataset_reader defines the dataset’s location and format. After loading, the data is split between the train, validation, and test sets according to the dataset_iterator settings. The chainer section of the configuration files consists of three subsections. The in and out sections define input and output to the chainer, whereas the pipe section defines a pipeline of the required components to interact with the models. The metadata section describes the model requirements along with the model variables.

You can interact with the models defined in the configuration files via the command-line interface (CLI). However, before using any model you should install all its requirements by running it with the install command. The model’s dependencies are defined in the requirements part of the configuration file.

python -m deeppavlov install <config_path>

Where <config_path> is a path to the chosen model’s config file.

To get predictions from a model interactively through CLI, run

python -m deeppavlov interact <config_path> [-d]

Where -d downloads the required data, such as pretrained model files and embeddings.

You can train a model by running it with the train parameter. The model will be trained on the dataset defined in the dataset_reader section of the configuration file.

python -m deeppavlov train <config_path>

The DeepPavlov framework allows you to test all the available models on your data to identify the one that performs best. To test a model, specify the dataset split along with the split fields in the dataset_iterator section of the configuration file.

python -m deeppavlov test <config_path>

In addition, you can run a server with API access to a model by executing DeepPavlov with the riseapi command:

python -m deeppavlov riseapi <config_path>

You can find more running actions in our docs.

An indispensable tool

TensorFlow is an end-to-end open-source platform for machine learning. TensorFlow was an indispensable tool when developing DeepPavlov.

Starting from TensorFlow 1.4.0, Keras has been part of the core API []. Keras is a high-level API that lowered the barrier to getting started with deep learning. Keras provides a high-level abstraction layer over TensorFlow so that we can focus more on the problem and hyperparameter tuning. Most text classification models of DeepPavlov have been implemented by using Keras abstractions. Keras provides us with fast prototyping techniques for quickly trying various neural network architectures and tuning hyperparameters.

In addition, the flexibility of TensorFlow allows us to build any neural network architecture we can think of, including, but not limited to, sequence tagging and question answering. Specifically, we use TensorFlow for seamless integration with the BERT-based models. We’ve already implemented BERT-based English and multilingual models for text classification, named entity recognition, and question answering (more on that in the upcoming sections). Moreover, the TensorFlow flexibility enables us to build BERT on our data; this is how we trained BERT on conversational data that led to better performance on social networks input.

Another great advantage of TensorFlow is TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot metrics, and show additional data. TensorBoard allows us to inspect models and make appropriate changes while debugging them. This can be useful for gaining a better understanding of machine learning models.

DeepPavlov delivers results

DeepPavlov comes with several predefined components powered by TensorFlow and Keras for solving NLP-related problems, including text classification, named entity recognition, question answering, and many others. Nowadays, state-of-the-art results in many tasks have been achieved by applying BERT-based models. The release of BERT (Bidirectional Encoder Representations from Transformers) [research paper] made the year 2018 an inflection point for the Natural Language Processing community. BERT is a transformer-based technique for pretraining language representations. We integrated BERT into three downstream tasks: text classification, named entity recognition (and sequence tagging in general), and question answering. As a result, we achieved substantial improvements in all these tasks. In the following sections, I will describe in detail how to use the BERT-based models of DeepPavlov. The code can be accessed on Google Colab.

BERT for Text Classification

Let’s demonstrate the DeepPavlov BERT-based text classification models using the insult detection problem. It involves predicting whether a comment posted during a public discussion is considered insulting to one of the participants. This is a binary classification problem with only two classes: Insult and Not Insult.

Any pretrained model can be used for inference via both the command-line interface (CLI) and Python. Before using the model, make sure that all the required packages are installed using the command:

python -m deeppavlov install insults_kaggle_bertpython -m deeppavlov interact insults_kaggle_bert -d

You can interact with the model via Python code.

You can train the BERT-based text classification model on your own data. In order to do so, modify the data_path parameter in the dataset_reader section of the configuration file.

Then train the model in CLI:

python -m deeppavlov train my_text_classification_config.json

Or via Python:

You can read more about BERT-based text classification models here, and you can test them in our demo.

BERT for Named Entity Recognition

In addition to the text classification models, DeepPavlov contains BERT-based models for named-entity recognition (NER). This is one of the most common tasks in NLP and can be formulated as follows: Given a sequence of tokens (words, and possibly punctuation marks), provide a tag from a predefined tag set for each token in the sequence. NER has a variety of business applications. For example, it can extract the important information from resumes to facilitate their evaluation by HR professionals. Moreover, NER can be used to identify the relevant entities in customer requests, such as product specifications, corporate names, or company branch details.

We trained our NER model on the OntoNotes English-language corpus, which has 19 types in the markup schema, including PER (person), LOC (location), ORG (organization), and many others. In order to interact with the model, first install its requirements.

python -m deeppavlov install ner_ontonotes_bert_multpython -m deeppavlov interact ner_ontonotes_bert_mult [-d]

In addition, you can interact with the model via Python code.

The multilingual BERT (M-BERT) model enables zero-shot transfers between languages, which means you can test the model on non-English sentences, even though it was trained on English OntoNotes, for example

You can read more about NER models in the article. In addition, you can check out the NER models in our demo.

BERT for Question Answering

Context-based question answering is the task of finding an answer to a question over a given context (e.g., a paragraph from Wikipedia), where the answer to each question is a segment of the context. For example, the triple of context, question, and answer below forms a correct triplet for the context-based question answering task.


In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls under gravity. The main forms of precipitation include drizzle, rain, sleet, snow, graupel, and hail. Precipitation forms as smaller droplets coalesce via collision with other raindrops or ice crystals within a cloud. Short, intense periods of rain in scattered locations are called “showers.”


Where do water droplets collide with ice crystals to form precipitation?


within a cloud

A question answering system can automate a lot of processes in your business. For example, it can help your employers to get answers based on your internal company documentation. In addition, it helps you to check the reading comprehension ability of your students in tutoring. Recently, the context-based question answering task attracted a lot of attention in academia. One of the major milestones in this field was the release of the Stanford Question Answering Dataset (SQuAD). It is a new reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles. The SQuAD dataset has given rise to endless approaches to question answering problems. One of the most successful is the BERT-based question answering model. This model outperforms all the others and currently delivers results bordering on human performance.

In order to use the BERT-based QA model with DeepPavlov, first install its requirements.

python -m deeppavlov install squad_bert

Then you can interact with the model as follows:

python -m deeppavlov interact squad_bert -d

In addition, you can use the model via Python code

The Multilingual BERT model enables building a multilingual QA system simply by training it on the English SQuAD dataset. The multilingual QA supports all of the 104 languages that were used to train M-BERT. You can use it as follows

As you can see, we call the model by providing the batch of contexts and the batch of questions, and as an output, the model returns the batch of extracted results from the contexts with their start positions. This code snippet demonstrates that the multilingual QA model, while being trained on an English dataset, is capable of extracting answers from a French context even when the question is asked in a different language.

The detailed comparison of cross-language transferability of the multilingual QA model can be found in a dedicated article.


We hope this was helpful and that you’ll be eager to use DeepPavlov for your own natural language understanding use cases. You can read more about us in our official blog. Also, feel free to test our BERT-based models by using our demo. And don’t forget DeepPavlov has a dedicated forum, where any questions concerning the framework and the models are welcome.

Moscow Institute of Physics and Technology
Moscow Institute of Physics and Technology


TensorFlow is an end-to-end open source platform for…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store