Why it’s Not Difficult to Train A Neural Network with a Dynamic Structure Anymore!

TLDR; Finally, the open source community addressed the demands for dynamic structure in Neural Networks. We saw 3 major library releases in last 3 months that support dynamic structures.

  1. Tensorflow Fold (Google)
  2. Dynet (CMU)
  3. Pytorch (Twitter, NVIDIA, SalesForce, ParisTech, CMU, Digital Reasoning, INRIA, ENS)

At Innoplexus, we compile information from structured, unstructured and semi-structured sources to assist our customers in making real time decisions. To achieve this speed we convert text in natural language from unstructured sources to properly structured representation. Since speed is a major bottleneck, our NLP systems are based on the recurrent structure of language due to the ready availability of tools and distribution of computation over multiple machines.

Over the course of time, we realized the limitations of recurrent approaches like LSTM and GRU which try to fit recursive natural language into a sequential framework. This leads to loss of syntactical information in information processing task. But unfortunately implementing recursive neural networks from scratch can turn out to be a nightmare since it involves writing complex backprop code with very high precision.

The majority of ML libraries like Tensorflow, Torch or Theano allow the creation of a static network which restricts the change in the structure of network as a function of the input. This turns out to be a significant limitation in Natural Language Processing/Understanding where syntactic information is encoded in a parse tree which varies as a function of the input text. Many applications like Syntactic Parsing, Machine Translation and Sentiment Analysis require syntactical information along with semantic. Due to the unavailability of any framework, developers use to end up implementing training procedures in Numpy. This turns out to be very error prone and is a tedious task which has to be performed with high precision.

We faced a similar problem while implementing an Entity Extractor at Innoplexus. It uses semantically united recursive neural nets which has a tree like structure. Due to unavailability of any framework that supported dynamic structure we ended up implementing it in Tensorflow. This caused heavy loads on our computational graph which made our training process slow and memory inefficient. Moreover, deciding a batch size to flush the graph became a critical question to the training process. Just when we were about to rewrite the entire training procedure in Numpy to speed things up, we came across Dynet.

DyNet (formerly known as cnn) is a neural network library developed by Carnegie Mellon University and many others. It is written in C++ (with bindings in Python) and is designed to be efficient when run on either CPU or GPU, and to work well with networks that have dynamic structures that change for every training instance.

We refactored our code in Dynet with petite modification to our Tensorflow code. Dynet isn’t as mature as tensorflow in terms of functions available, therefore we ended up writing our implementation for Tensorflow counterpart. Alternatively, PyTorch is more mature and supported by a wider community. You can create a dynamic graph like this:

PyTorch: Dynamic Graph Construction

Google recently launched Fold which encompasses a wider array of Python object than tensorflow. It provides support for structured data, such as nested lists, dictionaries, and protocol buffers. This overcomes the static graph limitation of Tensorflow. It’s approach is entirely different from PyTorch/Dynet. It uses dynamic batching to parallelize the operations in graphs of multiple instances. Look into it, it is kind of cool. In a nutshell, here is how it works:

Tensorflow Fold: How it works.

In the space of NLP where language can come in various expression lengths, therefore dynamic computational graphs are essential. One can just imagine how grammar is parsed to realize the need for a stack and therefore dynamic memory and thus dynamic computation. This significant development is summarized aptly by Carlos E. Perez in his post.

With this development, it would not be unreasonable to expect that Deep Learning architectures will traverse the same evolutionary path as traditional computation. That is from monolithic stand-alone programs, to more modular programs. Introducing dynamic computational graphs are like introducing the concept of procedure when all one previously had was “goto” statements. It is exactly the concept of procedure that we can write our programs in a composable manner. One of course can argue that DL architectures have no need for a stack, however one only needs to see recent research on HyperNetworks and Stretcher networks. There are networks in research were context switching like a stack appears to be effective.

We are using these libraries to refactor our code to move from recurrent systems to recursive systems with minor modifications. This provided us a tremendous improvement in our existing model as well as enabling us to solve problems that were previously out of reach. I hope this helps you in making the same shift as we did!

Happy Hacking :D