Knowledge-Grounded Response Generation in the DREAM Socialbot

Alsu Sagirova
DeepPavlov
Published in
3 min readMar 2, 2023

Nowadays, pre-trained language models successfully deal with a wide range of NLP tasks. One of the popular applications of the pretrained language models is dialogue generation. To support the socialbot user engagement and make the conversation interesting and memorable, it is crucial to integrate new information while being relevant to the current discussion topic. One way of knowledge incorporation is to use raw facts retrieved from knowledge bases in socialbot’s utterances. Such statements often come from encyclopedic-like articles. Despite such utterances being formally correct, they may result in the socialbot sounding like a robot or even irrelevant. To tackle these challenges, we employ the neural conversational model in the knowledge-grounded response generation task setting.

The pre-trained neural conversational model attempts to maintain a meaningful dialogue with the user and be consistent with the current discussion topic. The Knowledge Grounding approach aims to generate relevant utterances that contain (where it is suitable) new information from given knowledge facts. It helps to evolve the discussion and encourage the user to continue the dialogue with the chatbot.

In DREAM Socialbot [1], the knowledge-grounded response generation is implemented with the knowledge grounding skill that aggregates the conversation history and factual knowledge available to the moment, uses the collected information as a language model input, and infers the response with the skill-associated service.

Conversational model tuning

To allow the open-domain chatbot model to incorporate knowledge in its answers, we fine-tuned the Blender 90M model on the Topical Chat Enriched dataset [3]. The language model input sequence contained the user utterance, conversation history, and a paragraph of knowledge.

We tested short one-sentence knowledge facts and long facts consisting of three sentences. The resulting scores are presented in the table below.

Perplexity and token accuracy scores before and after fine-tuning ParlAI Blender 90M model on the Topical Chat Enriched dataset for one knowledge sentence grounding and three knowledge sentences grounding. Scores are provided for validation rare set containing entities that were infrequently or never seen in the training set.

In each experiment, fine-tuning was performed until the validation perplexity stopped improving.

The next table compares the raw facts’ retrieval approach with the neural knowledge-grounding model predictions for three example conversations from the Topical Chat Enriched test set. We also provide the dataset golden responses to assess the quality of the fine-tuned model predictions.

The example conversation with golden labels and two versions of fine-tuned Blender 90M responses. The model response parts that match the grounding information are highlighted in bold, showing the ability of the model to incorporate given facts into the utterance.

The fine-tuned model may not assure the lexical and/or syntactical correctness of the inferred text. It attempts to preserve the context of the current conversation while presenting new information from knowledge facts to the user.

Knowledge-Grounded Conversational Skill

In DREAM Socialbot, the knowledge for the neural conversational model is gathered from three types of sources:

  • news descriptions from the News Skill. This information source helps continue news discussion or could be used to retrieve knowledge about recently mentioned entities
  • facts from CoBotQA and Fact Retrieval if any are available for the currently discussed topic or entity
  • the hand-crafted facts on one of the popular conversation topics (games, movies, sports, science, music, food, emotions, relationships, weather, activities, celebrities, children, travel, art, jokes). Such facts are used when a user wants to change the topic of the conversation and does not specify the new subject, so the skill generates a prompt based on the pre-selected list of facts.

Model response post-processing

All the chatbot utterances are generated in textual format and further processed into speech. One of the greatest and at the same time the worst features of the neural conversational model is that we cannot fully control the generation. So, the resulting utterance might contain abbreviations or acronyms that sound misleading. Another issue with the generative model incorporation is that the model can generate utterances containing greetings or farewells in the middle of a discussion. Such chatbot responses may be confusing or even discourage the user from further conversation. Finally, the model can simply infer a response that is too short or too long for the user to react. To tackle these problems, we assign each generated candidate utterance a confidence score. Utterances found to have these described unwanted patterns are assigned with a lower score which leads to a lower chance of being selected as a current socialbot response.

References

[1] Baymurzina, D., Kuznetsov, D., Evseev, D., Karpov, D., Sagirova, A., Peganov, A., … & Burtsev, M. (2021). DREAM Technical Report for the Alexa Prize 4. 4th Proceedings of Alexa Prize.

[2] Roller, S., Dinan, E., Goyal, N., Ju, D., Williamson, M., Liu, Y., Xu, J., Ott, M., Shuster, K., Smith, E. M., Boureau, Y., & Weston, J. (2020). Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637.

[3] Hedayatnia, B., Kim, S., Liu, Y., Gopalakrishnan, K., Eric, M., &Hakkani-Tur, D. (2020). Policy-Driven Neural Response Generation for Knowledge-Grounded Dialogue Systems. arXiv preprint arXiv:2005.12529.

--

--