Neural Compilation using End to End Models and NMT

Published in

Intel Student Ambassadors

6 min readDec 26, 2019

After recently completing an operating systems class this quarter, I couldn’t help but think about the different potential approaches that could be used to tackle compilation, and came up with two distinct yet effective methodologies that I thought were worth further exploring. This article will first set up the motivation for neural driven compilation models, and then explore potential implementations/some research that has already been applied to this area.

Motivation

Current compilers are often layered one on top of the other, with languages such as python being translated down to some combination of intermediate languages, until eventually we reach C, and finally x86 64. Such a model opens up many possibilities for the propagation of errors and inefficiencies. It is highly dubious, when compilers nest so many different levels of translation, that each one is translating effectively, and thus it is highly likely that most compilers are doing a far worse job than what theoretically could be obtained. This issue has many parallels to some techniques in machine translation models, where a network would be trained to identify each characteristic — perhaps sentiment, sentence structure, or POS tagging, and then these different results would be combined together to give the desired results. Oftentimes in situations such as these, end-to-end models can gain a couple of percentage points of improved performance, as they get rid of the inefficient optimizations that take place at each individual level, and train the model to only care about the ever so relevant end result. Thus, one of the logical avenues to explore when it comes to neural compilation is the research that has been done in end-to-end machine translation models. As compilers have to convert between a large array of different code forms, it also makes sense that a machine translation model could be insightful.

Google’s Multilingual NMT System

The first case that comes to mind when attempting to apply machine translation techniques to compilation is some of the work on Google’s Multilingual Neural Machine Translation System, which forms the current backbone of many of Google’s Translation services(1). In essence, the model revolves around the idea of one universal system with the ability to translate from any one language to the other, in any combination. A useful heuristic for understanding it is to envision some sort of intermediate golden language that all languages are transcribed into and thus any combination of translations can be achieved. This same idea seems very similar to those used in compilation, but instead of using a type of translation similar to the bootstrapping process in the master boot record, we can have each intermediate language be a different node on our language graph, with any combination of languages only requiring one encoder and one decoder.

Google has also found that one of the main benefits of the neural machine translation system is the ability to translate between unusual language combinations. While perhaps it is uncommon to translate from Japanese to German, a model that understands both of these respective languages in English still should have a reasonable chance at performing this translation. Similar parallels can be drawn between programming languages. It’s likely uncommon for someone to directly transcribe their Ruby code into direct x86 64, but with such a translation model this shouldn’t pose much of an issue. One can see how these same translation techniques could be extended not only to compilation, but also perhaps to the more general task of programming language translation.

End to End Model Approaches

Snapshot from Neural Coreference Resolution End to End Model

Although end-to-end models may seem like they oversimplify our network model (abstracting out our control over the representations of the inner layers), they have found success in some elements of natural language processing/machine translation. Specifically, implementations of end-to-end models have found great success when tackling the task of coreference resolution, an admittedly different problem than the one of the compilation but certainly one that parallels can be drawn to(2). Not only can the coreference resolution model be applied to some aspects of translation, but it is also conceivable that a well-trained model could figure out the necessary libraries to link while building the executables, helping to alleviate the headache that can sometimes occur from handling these issues yourself. Instead of the NMT network proposed in the prior paragraphs, a model could specifically be trained for the translation between the given language all the way down to the machine code level, getting rid of the possible optimizations that while helpful for each individual translation level, may not be beneficial when we reach our final layer.

The Problem of Accuracy

One critical difference between compilers and machine translation is the need for absolute accuracy. Although perhaps it is acceptable for some of the connotations or meaning of a language to be lost in speech translation, programs are expected to perform exactly according to their respective standards, and a model that is 99.9% accurate likely wouldn’t be acceptable to most programmers. Thus the avenue for neural compilation likely isn’t for one of accuracy, but more along the lines of aggressive compilation flags (like –O3 optimization in gcc) which increase the risk that the intended behavior won’t happen, at the benefit of increased performance (this being a more extreme case of those flags). Neural compilation could also potentially provide close “guesses” as to the program’s contents translated into another programming language, giving the programmer a guide, but not yet a solution, for the code in that respective language.

Other Differences Between NLP and Neural Compilation

Although there certainly are additional obstacles that compilation provides for our NLP models, the area also offers some additional tools as well. While sentiment and understanding of words can be very hard to optimize for, as it can be difficult to accumulate large training sets and we must rely on the collective opinion of experts, the behavior of programs is much more concrete. With compilers that exist and give programs that as far as we know to behave correctly, though they might not perform their tasks optimally, we have a much easier baseline to optimize around. A possible implementation of the neural compilation model could go as follows:

After training the model on various datasets, compile a piece of code with the NMT model
Compare the behavior in terms of output, etc, with other compilation models and optimization levels
Optimize for accuracy first, creating some unique “tweak” parameters for this compilation until the behavior is correct (to whatever degree the programmer specifies)
Optimize for speed/code simplicity upon the “tweak” parameters created for this compilation
Save “tweak” parameters and later on decide if they should be used to affect the main model

Conclusion

Both end-to-end models and neural machine translation provide interesting insights into developing the compilers of the future. I hope you enjoyed the article and let me know if you have any ideas for how these techniques, or any others, could be applied to create better compilers!

References

[1]Aharoni, Roee, et al. “Massively Multilingual Neural Machine Translation.” Proceedings of the 2019 Conference of the North, 2019, doi:10.18653/v1/n19–1388.

[2] Lee, Kenton, et al. “End-to-End Neural Coreference Resolution.” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, doi:10.18653/v1/d17–1018.