
Google AI: Introducing the Schema-Guided Dialogue Dataset for Conversational Assistants
This research summary is just one of many that are distributed weekly on the AI scholar newsletter. To start receiving the weekly newsletter, sign up here.
Conversational assistants are one of the most interesting AI advances that we have witnessed recently. So far, we have seen them increasingly become a meaningful part of our personal lives as well as businesses to improve customer service. No doubt the future of these assistants is exciting and will keep expanding — the smart virtual assistant market is estimated to grow at a CAGR of more than 26 % to reach over $12 billion U.S. dollars by 2024.
AI engineers across the globe are actively working on the next generation of conversational AI capabilities, including reading and understanding human emotions. But not before they overcome some existing challenges such as a lack of enough data. This is because the existing datasets for multi-domain task-oriented dialogue do not sufficiently capture a number of challenges that arise with the production of scalable virtual assistants.
Towards Scalable Multi-Domain Conversational Agents
Google AI recently introduced the SchemaGuided Dialogue (SGD) dataset, a task-oriented dialogue corpus. With over 18000 dialogues in the training set spanning 26 services belonging to 17 domains, it now stands as the largest publicly available annotated task-oriented dialogue dataset.
The annotations comprise of active intents and dialogue states for each user utterance and the system actions for every system utterance. SGD is the first dataset to cover such a wide variety of domains and provide multiple APIs per domain.

Google also proposes a schema-guided approach for building virtual assistants as a solution to the existing challenges. The approach utilizes a single model across all services and domains, with no domain-specific parameters.
Potential Uses and Effects
The SGD dataset will go a long way in helping confront many real-world challenges that are not adequately captured by existing datasets. It also encourages scalable modeling approaches for virtual assistants by simplifying the integration of new services and APIs with large scale virtual assistants.
The dataset is also designed to serve as an effective testbed for intent prediction, slot filling, state tracking, and language generation, among other tasks in large-scale virtual assistants.
Read more: The Schema-Guided Dialogue Dataset

