Every since DeepMind AlphaGo made its splash into the world of AI, there’s this question that has been begging to be answered. What is the general form of an architecture the fuzes Deep Learning and more conventional AI based search techniques.
I’m not usually a fan of research papers that explore Deep Learning for Program Induction. It just seems to me as to be overly ambitious and therefore I tend to be quite skeptical of any results. So when last week, there was a lot of buzz in the press about Microsoft’s DeepCoder: Learning to Write Programs, I basically mostly ignored it as being research that was receiving a lot of undeserved hype. Steven Merity has a longer article analyzing the paper and the corresponding hype (see: “Stop saying DeepCoder steals code from StackOverflow” )
The research however is interesting in a way that is distinct from its effectiveness in writing programs. What is interesting is that the DeepCoder architecture appears to be very general and the approach can be employed in many more problem contexts.
The architecture of DeepCoder consists of four components:
(1) A Domain Specific Language (DSL) and attributes. DeepCoder does not work off any kind of general programming language but rather a more restricted DSL. In the research, the selected language that was examined was one that was a subset of a query language. The attributes were an enumeration of the features of a specific program instance of the DSL.
(2) A DSL Program Generation Capability. The function of this component is to generate programs that are based on the DSL and additional parameterization (ex. input-output pairs etc). The function is able to generate millions of programs with the DSL as its seed.
(3) A Deep Learning Model. The model attempts to predict the attributes in (1) based on the generated programs in (2). The trained DL model serves as a way to perform a quick approximate prediction of the viability of a generated program. In the case for DeepCoder, the model tries to predict the set of features (or operations) that may likely be needed to achieve the outputs based on the inputs.
(4) Search. The aim of this component is to integrate the predictions in the trained model in (3) to guide a more conventional program search algorithm to find actual solutions. In this case, programs that satisfy the constraints of (2). In the research, the authors integrated with 3 different search algorithms, depth-first-search, a “sort and add” enumeration algorithm and a program synthesis algorithm.
In essence, the approach searches for different generated programs to find programs that satisfy initial constraints. It is a hybrid approach that is similar in AlphaGo’s approach in that it employs Deep Learning as a component for quick approximate function evaluation.
Even though the focus of DeepCoder has as its domain the generation of programs, the framework can be applied to simpler less expressive languages.
This framework, similar to AlphaGo, is a generalized bootstrapping framework for incrementally training more intelligent solutions. That’s because what is described is a learning approach that get’s better with each iteration. In AlphaGo, the system got better by training against itself. AlphaGo was bootstrapped by training against a collection of previously recorded game play. In the framework describe here, the bootstrapping is performed using synthesized data. The trained Deep Learning model does not necessarily need to be of high accuracy. This is because the stage that follows it, employs the Deep Learning model as just the starting point in its own more comprehensive search. The self-improvement potentially comes when the results are fed back to further train the Deep Learning model with even more data.
I would label these kinds of architectures that search for other architectures as meta-search architectures. To summarize:
(1) Define a language that you can use to generate data.
(2) Generate data by creating valid statements using varying combinations of the language.
(3) Train a Deep Learning model to learn the language.
(4) Use the trained model as a way to speed up search through more language examples.
(5) Iterate back into (2) to expand the training set with better language examples.
This hybrid approach is a fusion of intuition based cognition and logic based cognition. Conventional computation (i.e. logic based) is used in stages 1,2 and 4. With the purpose of training the intuition machine by either synthesizing new data or searching for new training data. You are going to see more and more of this interplay of intuition and rational machines. It is analogous to DeepMind’s research that is inclined towards the equation AI = DL + RL, where RL is reinforcement learning.
In DeepMind’s PathNet research, which incidentally is also a meta-search algorithm for new architectures. Reinforcement Learning and Evolutionary Algorithms are employed towards search for better DL solutions. Rather than search for different combinations of language (here it is programs) to improve, PathNet searches for combinations of DL layers to improve on a solution. DL layers are just a different kind of language and therefore the same DeepCoder approach can apply to searching for DL architectures. This in fact has been previously done by research at Google and MIT (see: “Designing Neural Network Architectures using Reinforcement Learning” and “Neural Architecture Search with Reinforcement Learning”.)
It is helpful to reflect on the standard approach to training Deep Learning systems as described in the following flowchart:
What we are hoping to achieve here by a language driven approach is a systemized way of synthesizing new data. Treating data generation from the perspective of generating expressions or sentences by sampling from a synthetic language is conceptually appealing. Furthermore, we can bring to bear many of the computer science techniques that have been developed previously.
The approach of treating Deep Learning solutions as language comprehension problems is extremely compelling. This language approach was also used by experimental physicists in a paper “QCD-Aware Recursive Neural Networks for Jet Physics.” where experimental data was treated like a natural language with the intention of training a Deep Learning system to learn a synthetic language. Something that is not too different from learning DSLs. One key takeaway from this research is that the language that was used was a synthetic language that had semantics that was derived from QCD theory.
We can contrast this approach to the Probabilistic Graph Model (PGM) approach to ML. In the PGM approach, developers construct a probabilistic graph that defines the relationships different variables. The approach uses Monte-Carlo sampling to construct Bayesian consistent distributions for the variables. In this language driven approach, we similarly build up relationships between concepts, however we do this through a DSL. The DSL rules are much richer and expressive that that of a graph model. The requirement however is that we can ‘sample’ the DSL so as to synthesize new data. We then use Deep Learning to learn from this synthesized data. We then feedback into itself by employing more traditional search algorithms. I hope you see the appeal of the potential power of this approach.
A recent published research paper (March 2017) titled “Using Synthetic Data to Train Neural Networks is Model Based Reasoning” examines the above idea in greater depth. They explore the technique in a “Captcha-breaking” architecture. The research summarizes the uniqueness of the approach as follows:
approximate inference guided by neural proposals is the goal rather than training neural networks using synthetic data. A consequence of this is that there is no need to ever reuse training data, as “infinite” labeled training data can be generated at training time from the generative model.
This indeed is a remarkable approach to exploiting Deep Learning system that demands greater focus.
Deep Learning systems are in essence systems that learn new languages. Just like languages, they are built in multiple layers of abstraction These machine created languages however are difficult to interpret or explain, however that does not imply that they should be treated differently in how we treat languages.