Synopsis of LeakGAN: Long Text Generation via Adversarial Training with Leaked Information

Combining Reinforcement Learning with Generative Adversarial Networks has been applied to many problems such as drug discovery and text generation. Generating long text does not allow the generator to backpropagate on the already generated words until all the words have been generated and inspected by the discriminator. LeakGAN utilizes a Manager module that leaks latent representations from the discriminator to the generator so that it can use such representation for generating the next word similar to Hierarchical RL.

Generative Adversarial Networks address the problem of the discrepancy between training and inference for text generation. Most Recurrent Neural Networks (RNN)-based text generation methods rely on mimicking the probability distribution over a vocabulary corpus given the previous ground truth words which are not readily available during inference. Thus, extending Generative Adversarial Networks from the continuous domain of images in Computer Vision to the discrete domain of text in Natural Language Processing.

Modeling text generation as a Markovian Process allows the potential of utilizing Reinforcement Learning to aid Generative Adversarial Networks in processing the text sequentially and thus transferring the problem from the continuous domain to the discrete domain. A Markovian Process has a State, an Action, a Policy, and a Reward annotated as {State, Action, Policy, Next State, Reward}. The State in this situation is the previously generated words, the action is the next word to be generated, and the Policy is the Generator which is mapping the random noise to a probability distribution over a discrete action space. Then, the reward is calculated based on the fidelity of the generated text as administered by the Discriminator.

Apparently this approach is lacking when confronted with long text as the signal obtained for backpropagation is both sparse and indiscriminate for individual words and their representation or semantics. Some solutions decided to match the latent representation of the Generator with the Discriminator to alleviate the latency of the backpropagation signal while others tried to model the hierarchy of the text using Parts-of-Speech and relevant semantic structures to alleviate the sparsity of the signal. Solving sparsity is done utilizing Hierarchical RL that is to manually identify the hierarchical structure for the agent by defining several low-level sub-tasks and learning micro policies for each sub-task while learning a macro-policy for choosing which sub-task to solve. However, they were both lacking in face of long text in the former case or the data set was manually annotated by domain experts in the latter case making it infeasible for large corpora.

LeakGANs violate the adversarial game by allowing a Manager LSTM to encode the latent features representations of the Discriminator and pass them over to a Worker Module. The Worker Module itself is another LSTM encoding the previously generated words. Then, both outputs are combined to aid the Generator in predicting the next word. LeakGANs show promising results in terms of BLEU score and even the human Turing Test. Bilingual Evaluation Understudy Score. The Bilingual Evaluation Understudy Score, or BLEU for short, is a metric for evaluating a generated sentence to a reference sentence. A perfect match results in a score of 1.0, whereas a perfect mismatch results in a score of 0.0. The Turing test, developed by Alan Turing in 1950, is a test of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Playing with analogies, the generator is a felon committing forgery, the discriminator is a police detector, and the manager is a spy leaking information between both of them.

As a solution to the Mode Collapse inherent in adversarial training, that is when the generator learns a single pattern and keeps on repeating it to fool the discriminator, the authors proposed a methodology named “Interleaved Training”. Interleaved training works by switching between adversarial training and supervised training every 15 epochs and this works well in practice and implicitly introduces stabilization to the generator in reference to the ground truth training data.

As a solution to the Vanishing Gradients inherent in Recurrent Neural Networks ( RNNs ), that is when the network is very deep and wide and there is not enough gradient flowing to shallow layers to update their weights, the authors utilized a trick from RankGAN called “Bootstrapped Rescaledf Activation” that is to upscale the reward gradient with respect to the timestep. This allows for a constant expectation and variance across mini-batches and at the same time accelerates the convergence by overcoming the vanishing gradient problem.


  • Long Text Generation via Adversarial Training with Leaked Information, Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, Jun Wang, 2017

Machine Learning Engineer