Meta-Learning in Dialog Generation
Learning to learn
Unlike a well-known dataset, our real life problem domain always only have small labeled dataset while we may not able to train a good model under this scenario. Data augmentation is one of the way to generate syntactic data while meta-learning is another way to tackle this problem.
In this series of stories, we will go through different meta-learning approaches. One of the motivation for this task is that even children can recognize a object by giving just one example. Model does not learn to classify specific category but learning pattern to distinguish inputs. This series of meta-learning will cover Zero Shot Learning, One Shot Learning, Few Shot Learning, Meta-Learning in NLP.
In this story, we will go through two approaches that applying meta-learning in dialog generation. In the customer service field, the company needs to employ a customer service representative to support customer’s needs. As a business grows, the CS department needs to scale out linearly. Therefore, the dialogue system is introduced to solve this problem. How can we build a dialogue system such that it can “chat” with customers automatically?
As one of the meta-learning series, we will cover the usage of meta-learning in dialogue generation. Several methods will be covered which include
Domain Adaptive Dialog Generation via Meta Learning (Qian and Yu, 2019)
Personalizing Dialogue Agents via Meta-Learning (Lin et al., 2019) and
Memory-Augmented Recurrent Networks for Dialogue Coherence (Donahue et al., 2019)
Domain Adaptive Dialog Generation via Meta Learning
Qian and Yu proposed domain adaptive dialog generation method based on meta-learning (
DAML) which extending Mode-Agnostic Meta-Learning
(MAML) (Finn et al., 2017).
DAML It is trained by leveraging multiple single-domain dialog data and adapting to a new domain with only a few training examples.
Model-Agnostic Meta-Learning (MAML)is proposed by Finn et al. in 2017. It is a model-agnostic framework. Model-agnostic means that it is not model specific. Finn et al. evaluates this framework on regression, classification and reinforcement learning problem and result is promising. You may visit this story for more detail if you are not familiar with it.
To set up the experiment, authors used 3 domain data (i.e., “restaurant,” “weather,” and “bus information search”) to train the initialize model and fine-tuning for the target domain, which is “movie information search.”
Training procedure follows the practice of
MAML(Finn et al., 2017). First of all, the loss is calculated (#2) and updating the local gradient of the temporary model (#3). For every batch of data, it will calculate loss and updating the local gradient again and again. After finished a batch of data, a final loss will be calculated (#5) for updating the global gradient (#7).
Personalizing Dialogue Agents via Meta-Learning
Lin et al. (2019) proposed to apply
MAML (Finn et al., 2017) on personalizing dialogue agent problem. Persona-Agnostic Meta-Learning (PAML) is trained to adapt to new personas by leveraging only a few dialogue samples from the same user.
The model input is persona descriptions (few sentences per person), and dialogue (set of utterances), and the output is the response. The setup is similar to DAML expect PAML includes persona description.
Training procedure follows the practice of
MAML (Finn et al., 2017). The major difference between
MAML (Finn et al., 2017) and normal training is step 4 to step 8. The model evaluates the batch of data and updating the optimizer later on (i.e., step 9).
Memory-Augmented Recurrent Networks for Dialogue Coherence
Donahue et al. do not extend MAML (Charles et al., 2017) approach while applying memory-augmented meta-learning to solve a dialogue system problem. They proposed two architectures which are
Memory-augmented dialogue with dual NTMs (D-NTMS) and
Single-NTM language model dialogue system (NTM-LM).
Neural Turing Machines (NTM) is introduced by Graves et al. in 2014. A quick some summary is that the model reply on both internal memory (i.e. RNN hidden states) and external memory (i.e. memory bank out of neural network) to decide the output. You may visit this story for more detail if you are not familiar with it.
Intuitively, we should handle speakers separately as we believe that speakers should have different roles, background, and other attributes. Due to this kind of difference, the model may be affected and causing performance downgrade if handling all speakers in a single model (single Neural Tuning Machine in this case). Therefore, dual NTM architecture is inspired. Different speaker’s utterances feed to specific NTM for reading and updating external memory.
However, Donahue et al. found that the aforementioned model may have difficulty exchanging dialogue and causing lousy performance. Therefore, the second approach is proposed. It leverages GRU to handle multiple sequence problems and using only one NTM for external memory operation.
From the experiment result, NTM-LM outperforms than traditional Seq2Seq and D-NTMS architecture.
- DAML trains by multiple rich resource data and targeting to learn new domains quickly.
- PAML trains across persona data and targeting to learn new persona quickly.
- Handling speakers’ utterances separately may not lead to a good result. You may perform a further experiment on your target dataset.
Like to learn?
I am a Data Scientist in the Bay Area. Focusing on the state-of-the-art in Data Science, especially in NLP, data augmentation, and platform related. Feel free to connect with me on LinkedIn or Github.
- Model-Agnostic Meta-Learning explanation
- Memory-Augmented Meta-Learning explanation
- DAML implementation (PyTorch)
- PAML implementation (PyTorch)
- C. Finn, P. Abbeel, and S. Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. 2017.
- K. Qian and Z. Yu. Domain Adaptive Dialog Generation via Meta-Learning. 2019.
- Z. Lin, A. Madotto, C. S. Wu, and P. Fung. Personalizing Dialogue Agents via Meta-Learning. 2019.
- D. Donahue, Y. Meng, and A. Rumshisky. Memory-Augmented Recurrent Networks for Dialogue Coherence. 2019