Uber Open Sourced Plato to Perform Natural Language Research at Scale
The framework facilitates research across different NLP areas from language generation to translation.
I recently started an AI-focused educational newsletter, that already has over 70,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
TheSequence explains the main machine learning concepts and keeps you up-to-date with the most relevant projects and…
Natural language understanding(NLU) is one of the areas of artificial intelligence(A) that has seen the greatest adoption into mainstream applications. From basic chatbot to sophisticated digital assistants, conversational applications are becoming a common trend in the software industry. Consequently, the market has seen an explosion on the number of tools and platforms that enable the implementation of conversational applications. While the process of building simple, domain-specific chatbots has gotten way easier, building large scale, multi-agent conversational applications remains a massive challenge. A few months ago, the Uber engineering team open sourced the Plato Research Dialogue System, which is the framework powering conversational agents across Uber’s different applications.
Uber might be considered one of the world’s prototypical environments for the implementation of conversational agents. From customer service agents, interactions with drivers and riders to future scenarios in self-driving vehicles, Uber is exposed to one of the widest landscapes of conversational applications in the world. While, at first glimpse, it might have been surprising that Uber decided to implement their own conversational AI platform, it all makes sense once you factor in the scale and complexity of the scenarios they are trying to address.
The Challenges of Large Scale Conversational Applications
The space of conversational applications has seen an explosion of tools and platforms in the last few years. From the solutions provided by the major cloud players such as Microsoft LUIS, Watson Assistant or Google Cloud Natural Language to a number of a standalone frameworks such as Olympus, PyDial, ParlAI, the Virtual Human Toolkit, Rasa, DeepPavlov or ConvLab; there seem to be a stack for every conversational app scenario. Despite the sophistication of some of those platforms and frameworks, they have marked limitations when applied at scale:
· Single-Agent System: Most conversational AI frameworks are designed for dialogues between one agent and one user and don’t support multi-agent scenarios.
· Preset Architecture: Mainstream conversational AI frameworks are based on prebuilt models that can’t be changed regardless of their performance in specific scenarios.
· Offline Training: Learning from online interactions is a key capability of large scale conversational applications. However, the process of training a conversational AI agent in most frameworks is based on offline interactions
· Limited Extensibility: Replacing the core architecture of a dialog systems with new models or incorporating pre-trained agents is not possible in most conversational AI frameworks.
The Plato Research Dialogue System
Uber built the Plato Research Dialogue System(PRDS) to address the challenges of building large scale conversational applications. Conceptually, PRDS is a framework to create, train and evaluate conversational AI agents on diverse environments. From a functional standpoint, PRDS includes the following building blocks:
- Speech recognition (transcribe speech to text)
- Language understanding (extract meaning from that text)
- State tracking (aggregate information about what has been said and done so far)
- API call (search a database, query an API, etc.)
- Dialogue policy (generate abstract meaning of agent’s response)
- Language generation (convert abstract meaning into text)
- Speech synthesis (convert text into speech)
PRDS was designed with modularity in mind in order to incorporate state-of-the-art research in conversational systems as well as continuously evolve every component of the platform. In PRDS, each component can be trained either online (from interactions) or offline and incorporate into the core engine. From the training standpoint, PRDS supports interactions with human and simulated users. The latter are common to jumpstart conversational AI agents in research scenarios while the former is more representative of live interactions. Each individual component of the platform can be trained online of offline using the machine learning frameworks supported by PRDS which included TensorFlow, PyTorch or Uber’s own Ludwig.
One of the key contributions of PRDS is the support for multi-agent conversational interactions in which a large number of agents can learn from each other. This capability can facilitate research in multi-agent learning, where agents need to learn how to generate language in order to perform a task. In PRDS, the agents can communicate over speech, text, or structured information (dialogue acts) and each agent has its own configuration. The dialogue principles define what each agent can understand (an ontology of entities or meanings; for example: price, location, preferences, cuisine types, etc.) and what it can do (ask for more information, provide some information, call an API, etc.).
Extensibility and customization are some of the other areas in which PRDS excels. Each stage of the PRDS NLU pipeline can be split into different or jointly-trained components such as text-to-dialogue state or text-to-text. This facilitates the incorporation of new conversational AI research techniques into PRDS without sacrificing its core architecture.
To execute conversational agents, PRDS relies on a central coordinator knows as the Controller. PRDS uses a Controller class to orchestrate the conversation between the agents. The Controller will instantiate the agents, initialize them for each dialogue, pass input and output appropriately, and keep track of statistics.
The models for training conversational agents are, arguably, one of the most important contributions of PRDS. The platform supports the training of agents’ internal components in an online (during the interaction) or offline (from data) manner. More importantly, PRDS records every single learned experienced using a structure known as the Dialogue Episode Recorder(DER) which contains information about previous dialogue states, actions taken, current dialogue states, utterances received and utterances produced, rewards received and other relevant information. DER plays a key role facilitating the interpretability and debugging of conversational AI agents in PRDS.
PRDS represents one of the most advanced and extensible frameworks for the implementation of large scale conversational agents. PRDS’ core architecture can be used to train conversational agents across different deep learning frameworks without introducing any modifications to the core models. PRDS joins the outstanding number of contributions that Uber has made to the machine learning ecosystem which include projects such as Pyro, Horovod, Ludwig, Manifold, PyML and many others.