Software writes Software — Program Synthesis 101

After reading this post you’ll know about — what’s Program Synthesis, how it works, what are the implications for a $700bn+ software development industry and what are the most interesting startups in this field. The 2nd post will be about my experience as an employee #2 at Synthetic Minds, a company that uses Program Synthesis to secure smart contracts.

For a long time, the Holy Grail of Computer Science (CS) has been to find a way to make programs write other programs. Academic achievements aside, the business opportunity for such solution is enormous. For instance, the amount of money that is spent yearly on software developers in the US alone is more than $700billion*. Studies have shown that 28% to 50% of the developers’ time can be saved if developers are augmented with the proper tools. Though developers cost much less in other countries, they are still relatively expensive for businesses. More than that, bad code is even more expensive as it might lead to breaches and both financial and reputational damages. Within the last 10 years, a lot has been done to automate software development and it significantly influenced almost every stage of the process — from quality control to integration and deployment. The actual code creation, the most labor intensive and creative part, hasn’t been impacted as much.

The Future of Work in all fields has become one of the hottest topics in the last two years. Would the core software development also be affected by automation or is it something that only humans could do?

Program Synthesis 101

Ideally, some programs could develop other programs without any human involvement. In its entirety, this is sci-fi now, but in part, it is already possible. James Bornholt describes the promise of Program Synthesis:

Wouldn’t it be more efficient for us to tell the computer what we want the program to do, and leave the details of how to the computer to figure out? It’s the ultimate abstraction: a programmer who only tells the computer what they want, rather than how to do it, is completely absolved from any implementation details.

Businesses might benefit from Program Synthesis in two ways. First, the software development cost could be reduced by empowering developers to write code more efficiently. Developers will be able to focus on the most creative parts of the process, while the routine work will be done by a machine. Second, business people will also become more productive as they will be able to cut their routine by developing scripts even though they might not have coding skills.

Among many CS professionals, Program Synthesis is a very hot computer science field, that fortunately (or unfortunately) hasn’t yet become a common buzzword that you see in every other pitch deck. There are only 10–15 startups in this field. One plausible reason is that it is one of the most complicated fields of machine intelligence and it is very hard to claim that products use Program Synthesis unless they really do.

Within modern techniques there are two ways to automatically generate programs:

  1. Symbolic Systems — use of abstract logical reasoning to create programs that satisfy a given specification. This is what most people mean when they say “Program Synthesis”.
  2. Neural Systems — use of neural-nets and massive amounts of data to fit the model to a given specification. You would hear “Neuro Program Synthesis” when referred to it.

What is Program Synthesis and how it works?

No matter what algorithm people use, the goal of Program Synthesis is to transform high-level specifications of what the program should do into a low-level program that implements it.

Below is a plain English description of what’s called Counterexample-Guided Inductive Synthesis (CEGIS). Though it sounds way too geeky, on a high level it is quite simple. For more technical elaboration on this and two other methods, you can read this amazing article and the first two chapters of this book. Credit to the authors for providing such a good description.

High-level representation of CEGIS

Within CEGIS, a synthesizer works in automatic iterative loops that consist of 4 steps: Specification, Synthesis, Verification, Feedback. In every run of the loop, the algorithm creates and then checks whether a candidate software program satisfies desired specifications. If it does — it’s a success; if it doesn’t — the algorithm learns the reason.

The simplest version of the process relies on Oracle-Guided Synthesis — a black box that can take an input and provide the corresponding output. We do not need to know what’s exactly in the box (in fact, the box could be a human providing the desired input-output mappings), but ultimately she only wants to reconstruct a piece of software that approximates its functionality. Black box (a.k.a. Oracle), in this case, is the source of ultimate truth and is used to generate information that guides the synthesizer.

  1. Specification = User defines what she wants the program to do. It is actually one of the hardest parts as it is not trivial to clearly and unequivocally define what the software should precisely do. A vivid example is a famous game called FlappyBird in which the player has to get the bird through a set of obstacles by making it flap every time the user clicks. If the user defines the goal as in the previous sentence then at best a synthesized program will be a boring game that has the same height of obstacles and is run at the same speed. Thus the user has to define all major specifications. E.g. the game should have varying obstacles, increasing pace, etc. This is somewhat simple for FlappyBird, but imagine writing specifications for Facebook? Given that user intent is undefined, writing such specifications might take more time than to create Facebook itself.
  2. Synthesis = CEGIS takes basic building blocks and creates a program that fits the desired specifications or input from the Oracle. These building blocks are libraries provided by the user and might contain anything from simple arithmetics operations (+, –, /, *) to complex sub-programs that execute specific functionality (write to the database, search in Google, etc.). The Synthesizer then arranges these blocks into a program that matches all information generated until that point. The key to scalable synthesis is coming up with programs 1) without explicitly enumerating all possibilities 2) and that are learning from previous attempts. This is where all the IP of companies working this space lies.
  3. Verification = CEGIS verifies that the candidate is right not only for finite information gotten from the Oracle/specification but is generalizable to any input. For that purpose, an SMT solver is used. To put it simply, an SMT solver deconstructs a program into a number of statements, a.k.a theorems, and proves them in a generalizable form. Within this generalized theorem proving setting, it is possible for the Verifier to do the following: test whether the generated program matches the information received so far, and whether it is possible to create a new input on which the program’s output disagrees with the Oracle’s version of the truth.
  4. Feedback = CEGIS uses this new input (on which the current synthesis output and Oracle disagree) and runs step 1 through 3 once again. It repeats this process as many times as required, and if the verification step cannot produce any disagreement, then the outcome is the desired program.

Of course, this is a very simplified representation of one possible way to do Program Synthesis. There are various other approaches, but this is enough to grasp how Synthesis is plausible.

FlashFill is used to fill all cells by an example from the first cell.

It is surprising but filling Excel cells by an example is a simple illustration of Program Synthesis. This feature is called FlashFill — based on only one example the user is proposed to fill all other cells with data that has the same format/style/logic. Under the hood Excel does all that by constructing millions of small programs, 10–20 lines of code each, and finding the best one to do the job. Though the result seems to be simple for a human, from a Computer Science point of view this is big progress. The user does not need to show Excel how to do this filling — she just has to show an example of what she wants to get. Sumit Gulwani who developed FlashFill at Microsoft describes:

It’s the difference between teaching someone how to make a pizza step by step and simply showing them a picture of a pizza and minutes later eating a hot pie.

How is it different from generally available AI, i.e. DNNs?

For people who are acquainted with AI/ML, an explanation described above might look very familiar. Indeed, on a high-level, this is similar to Generative Adversarial Networks (GANs).

A simple representation of GANs

GANs are a subset of deep neural-networks (DNNs), and are being used to produce content — primarily video or images that are already of decent quality and hard to spot as fake (e.g. Deepfake). GANs also have two subsystems — a generator, whose purpose is to create content to fool a discriminator, whose purpose is of course not to be fooled and to dismiss samples that look fake.

Both Program Synthesis and DNNs might serve the same purpose — generate something, be that audio, designs for new drugs or another program. And though on a high-level they might appear similar, there are a number of important differences.

  1. DNN, in essence, is a probabilistic model, Program Synthesis is not. Irrespective of the quality of a DNN model and irrespective of how good the training data is, the outcome is always a probabilistic prediction. Every time someone shops on Amazon she gets “recommended products”. Sometimes she likes those and sometimes she doesn’t. Ultimately, Amazon doesn’t care that much because if the user buys even 1 out 100 times that’s better than nothing, though Amazon always tries to be better at its predictions. Program Synthesis, in turn, should have a binary outcome — the program either satisfies the specification or it doesn’t; there should be no “good enough” case in between.
  2. DNN requires massive amounts of data, Program Synthesis doesn’t. The core of DNN philosophy is that “Data is the King”. And in general, the more good data the developer has the better predictions she could get. Program Synthesis works fundamentally differently. It operates with high-level representations and therefore requires just a few examples of the desired behavior. This certainly has advantages since the developer does not need huge data sets. But as was mentioned above — designing correct specifications might be very labor intensive.
  3. DNNs tend to be quite general, Program Synthesis is very specific in scope. As it is with FlashFill, every piece of code generated by Program Synthesis is small and is aimed at a specific small task — e.g. capitalize the first letter, combine first name and last name, etc. DNNs, in turn, aim to be generalizable models and tend to ignore domain specifics. This means that DNNs might be much less computationally intensive, but it also means that DNNs might miss important details. The latter would not happen with Program Synthesis if the specifications are properly designed.

Eventually, there is no right or wrong approach since both approaches have their own scope and pluses/minuses. I believe that to get to futuristic “software writes software” we need a combination of Neural (e.g. DNNs) and Symbolic Systems. In this case, the Neuro part will be responsible for “intuition” to define the user intent, while the Symbolic part will be responsible for “details”, i.e., implementing the intuition in a detailed program.

Startups

Though the promise of Program Synthesis is very appealing, there are very few startups applying it to real-world problems. The field itself is quite research intensive and therefore requires massive human and capital resources. Microsoft is one of the titans that is considered a thought leader in the space. Nevertheless, within 2016–2018 few companies emerged and they use Program Synthesis to solve various problems.

  • Synthetic Minds — 2-year old, San Francisco-based company that uses Program Synthesis to build a platform for automated smart contract security. In Oct-2018 they raised a $5.5m Seed round from Khosla Ventures and Pantera Capital.
  • Darwinai.ca — 2-year old, Waterloo, Canada-based company that uses AI to optimize design and creation of deep learning networks. In Sep-18 they raised a $3m Seed round led by Obvious Ventures and Inovia Capital.
  • Gamalon — 5-year old, Cambridge, MA-based company that aims to streamline the identification of ideas from large amounts of unstructured data by writing and rewriting own programs. In May-18 raised a $20m Series A led by Intel Capital.
  • Prodo.ai — 2-year-old, London, UK-based company that automates software development and helps developers write high-quality code. In April-18 they raised an undisclosed Seed round from S28 Capital.
  • Trifacta — 6–year old, San Francisco-based company that created a platform to radically improve the efficiency of wrangling data by making a series of small transformations similar to FlashFill. In Jan-18 raised $48m Series D round with Accel and Alphabet as investors among others. DataXFormer is/was an open-source tool with the same goal.
  • Diffblue — 2-year old, Oxford, UK-based developer of an automated testing software designed to improve code development by using automated reasoning and learning. In Jun–17 raised $22m Series A from Goldman Sachs.
  • DeepCode.ai — 2-year old, Zurich, Switzerland-based developer of a programming software designed to help developers construct, change or modify new coding programs. In May-18 raised $1m Seed from btov Partners and angels.
  • UIzard — 1-year old, Copenhagen, Denmark-based company that developed a platform that instantly transforms any wireframe or mockup into HTML, Android, or iOS apps. In May-18 raised Pre-seed $800k from LDV Ventures and others.
  • Micropsi-Industries–4-year old, Berlin, Germany-based company that develops a platform that enables robots to learn skills that would be hard or impossible to hand-engineer. In Oct-18 raised $6m Series A led by Project A Ventures.

Another interesting application, though not a startup, is called DeepCoderthat can build programs on the basis of snippets of code it finds on the Internet. Additionally, there is a list of open-source tools that in one way or another automate software development.

Challenges

A natural question is — if Program Synthesis is such a wonderful technology, why don’t we see a proliferation of applications built on top? Though the promise of fully automated software generation is very appealing, there are two main challenges that currently limit the applicability of Program Synthesis.

  1. Increasingly complex specification requirements. The more sophisticated software the user wants to create, the more labor intensive will be the Specification part. In some instances, it may even be impossible to write down the specifications, as the developer does not know the exact user intent. Thus, the usage of Program Synthesis is usually justifiable when existing applications are tied to narrow-focused fields with a well-defined set of objectives. FlashFill and AutoFill in Excel are good examples of such applications.
  2. Increasingly complex computations. The larger the space the developer wants to explore with Program Synthesis the more computation is required. In the general case, Program Synthesis is undecidable. Programmers use different tricks to narrow the search space and to reduce the amount of time/computation it takes to solve the problem. This eventually also narrows the applicability of Program Synthesis. The flip side of the narrow focus is that for certain problems, e.g. smart contract security, Program Synthesis provides qualities that are much sought after–full space exploration and therefore confidence that nothing was missed.

The 2nd post will describe in greater detail how Program Synthesis is applied to smart contract security. Stay tuned.


P.S. This post will be updated from time to time to include emerging startups in the field.

* Back of the envelope calculation — there is a total of >4m developers in the US at $100k median salary or $180k fully loaded cost that includes the overhead.