We thank Yoav for the comments on our VAE generation paper (titled as “Controllable text generation” in the preliminary version), and appreciate many of the points. Natural language understanding and generation are challenging, far from being fully addressed. Most papers including ours are trying to make steps forward, and do have space for improvement. That’s why we appreciate and encourage open discussions like Yoav’s that facilitates solid work in the field.
We’d like to clarify some points of our paper to avoid potential misleading messages it may send:
- The paper was not meant to fully address the challenging NLG task, and we used the word “text” in the title, a relatively weak term in the context, to avoid over-claiming. We admitted that a title like “Towards controlled generation of text” could be better, and thank Yoav for expressing his concerns on the original title. Indeed the paper emphasized more on the interpretability of generation (and, again, we were not claiming a full solution of the interpretability issue). We were attempting to learn disentangled hidden representations in the VAE, helping people apply their intentions or structured knowledge on the semantically-meaningful latent code and obtain desired outputs. As discussed at the end of the paper, interpretability provides an interface that connects the black-box neural model with conventional structured models/representations like linguistic knowledge. And it is the acknowledgement of the complexity of natural language (and other modalities) that we believe even with large data and powerful neural models, structured knowledge is still helpful to improve generation and other tasks. This idea is consistent with our previous work on enhancing neural networks with logic rules (https://arxiv.org/pdf/1603.06318.pdf). We encourage advances on both DL and conventional NLP sides, as well as their combinations.
- Another contribution of the paper is on the modeling part. Briefly, we enhanced the vanilla VAE with holistic discriminator-based metrics that were learned jointly. Alternatively, the model can be seen as a combination of VAE with classic wake-sleep algorithm. We found such perspective was particularly interesting, and our follow-up analysis along the line has led to our recent work attempting to unify deep generative models ( https://arxiv.org/pdf/1706.00550.pd). Overall, our emphasis on the methodological foundation and algorithmic details make our paper better fit to ML venues.