Generative AI for Experimental Design

Understanding ExPT: Synthetic Pretraining for Few-Shot
Experimental Design [1] (NeurIPS, 2023)

11 min readNov 8, 2023

A Thought Experiment

Suppose you were given some play-doh and a simple objective : design an ‘ant’ with legs and a body so that it can run very quickly.

You would probably decide to make its legs long enough to touch the ground, make them thick enough to support the weight of the body, make sure they are all the same length, make sure the body is not too heavy, and so on. There would be some trial and error, but eventually you’d get a reasonable looking ant.

Now imagine, every-time you make a new version of a play-doh ant, it comes to life, runs from one end of the room to the other, and records its speed. You can then keep making new versions of the ants, let them run around, record their speeds, make improvements, let them run again, and so on until you get the perfect ant! This is an example of online experimental design, a type of black-box-optimization.

Let’s say instead, someone before you made a set of play-doh ants, say a million of them, let them run around, and recorded each of the speeds of each of these ant designs. Then, you could use a variety of techniques like [2][3], and use that data to design the perfect ant — a technique could be as simple as picking the best performing design in the data, or averaging the top 10 designs, or performing gradient descent on the designs. This is an example of offline experimental design.

But let’s be honest, most people aren’t going to sit around and make a million play-doh ants and then record each of their speeds. More reasonably, someone probably collected data for a 100 play-doh ants. So we only have a few data points.
In the real world, this is far more realistic. Imagine you were optimizing for the design of a nuclear reactor —most likely you only have maybe a 100 designs with their corresponding performances. Similarly, if you’re optimizing for the chemical reactivity of Zinc-based compounds, you would have millions of chemical compounds that you know about, but reactivity numbers for much fewer.

This is the setting that ExPT operates in. ExPT requires only a few labeled designs in order to generate new and high quality designs that perform better than the given data. This is an example of few-shot offline experimental design.

The Setting

Let’s assume we’re operating in the domain of robotics (motivating the ant problem!). We are given some data :

An unlabeled dataset, just containing various designs of ants. These designs could be completely random or inefficient.

A small labeled dataset with designs of ants and their corresponding speeds. These would have been collected through experimentation and aren’t necessarily high performing.

***x_i*** is the design of the ant, **y_i** is its speed

Naturally, we expect the unlabeled dataset to be much larger than the few-shot labeled dataset.

Problem Statement

Suppose X represents the set of all possible designs (designs could represent the morphology of ant-like robots, the chemical formula of a compound, genetic code etc.)
Then we can define a function f, such that for a given design x, f(x) is some objective (like the speed of a robot, the reactivity of a compound, etc.). f is referred to as the black-box function — something we do not have any access to.
Our objective is to find a design x* that maximizes the function f

i.e., the design of the ant that has the highest speed!

ExPT — The Intuition

So our objective is to find a design x, in some domain X which when input to an unknown function f , maximizes the value of f.

We also have a large number of unlabeled designs, D-unlabeled, and a few labeled designs, D-few-shot.

Ideally, we’d know what f is, or at least be able to try out different inputs on f. Then we’d just be able to train a neural network on a large labeled dataset. But since we have no idea what f is, we can instead come up with a set of synthetic functions ~f that operate on the same domain X, but represent arbitrary objectives that we had access to. For example if f = speed :

and so on.

Then we’d be able to gain some practice maximizing these functions, and assume that when given a few data points representing f, the objective we actually want to maximize, that the model would be able to adapt.

ExPT — Formalized

ExPT uses a pre-training-adaptation approach inspired by this thought process:

Synthetic data-generation : Create a large amount of synthetic data using the unlabeled data, D-unlabeled.
Pre-training : Create an encoder-decoder architecture using a transformer as the encoder and a VAE as a generative decoder. The transformer takes in labeled pairs (x, y) and treats this as context. The VAE then takes in this context and tries to generate designs x* that achieve a given y*. This architecture is trained on the synthetic dataset.
Adaptation : After pre-training, given a few (x, y) pairs from the few-shot labeled dataset, representing some objective f , as well as a high target value y, the network is able to generate high quality designs for this objective.

An added advantage of this approach is that after pretraining the network, it can be used to optimize for any downstream objective. So the synthetically trained model could be used to generate designs of ants that optimize for speed and then later be used to generate designs of ants that optimize for jump height — without any re-training necessary! This is an example of a foundation model — a model that is trained on a large unlabeled dataset in a particular domain with the ability to adapt to several downstream tasks.

This also falls into the category of few-shot learning, a technique that is also used while creating the GPT-x models; the LLM is trained on vast amounts of unlabeled data, and then at inference time, it can adapt to various downstream tasks with very little context information provided to it.

Let’s explore the above stages in detail.

ExPT — Synthetic Data Generation

As alluded to earlier, ExPT approaches pretraining by creating a set of synthetic functions which represent arbitrary objectives.

We require functions ~f on the domain X. [1] describes an approach to sample these functions from Gaussian Processes. Gaussian Processes represent distributions over functions, and have universal approximation properties [4], making them a good choice for a sampling distribution.

Gaussian Processes (GPs) are parameterized by a kernel K

In this way, we can very quickly get thousands of functions ~f on the domain of interest X. And unlike our hidden, black-box function f, we have complete access to the ~f’s. So to create a synthetic dataset, we can just input large numbers of designs, x and compute their values, ~f(x) for each of the synthetic functions.

Dataset mapping designs x_i to values generated by synthetic functions ~f

We now have a labeled dataset that was synthetically generated and can be used to pretrain our neural network.

ExPT — Pretraining a Transformer+VAE Encoder-Decoder

The objective is to operate in a few-shot setting. Therefore, we need a model capable of performing in-context-learning. Transformers have performed extremely well at in-context learning in language-related tasks in recent times. Inspired by this, ExPT uses a transformer-based architecture as an encoder.

Transformer Encoder

The transformer encoder take in labeled designs from the synthetic dataset which are referred to as context points

Along with target y values

And outputs hidden vectors

Each hidden vector h_i encapsulates the information from all the context points and the target output y_i. Since the objective is to model the probability of a design under this context, we have that :

Going back to the ant analogy, this is like giving the model a set of designs of ants along with their respective speeds, and then giving it a desired speed and asking it to design an ant that would achieve this speed.

VAE Decoder

The transformer encoder produces hidden vectors h_i, one for each desired design. Since the desired designs, x_i are high dimensional (think of all the parameters you could change while designing the body of an ant!), we want to use a generative model to produce these designs.

Variational Autoencoders (VAEs) are popular choices among generative models because of their light hyperparameter tuning requirements and training stability. In [1], experiments show that VAEs also produce better empirical results in this problem setting as compared to diffusion models or generative transformers.

The VAE simply models the conditional distribution discussed above :

i.e., sampling from this distribution produces a design x_i based on the hidden vector h_i.

Pretraining

This encoder-decoder architecture is trained purely on the synthetic dataset :

Randomly sample N labeled examples from the dataset

Of these, choose m to be the context points, and the rest to be the target points

ExPT is then trained to predict the designs associated with these target points

At this stage, we have a trained foundation model in the design domain that we’ve chosen! For example, a trained foundation model for the designs of ants. It can now be adapted to any downstream task of our choosing in this domain.

ExPT — Adaptation

After pretraining, the model is capable of taking in context points and a target y value and generating the design x that would result in that y value.

At this stage, we are finally allowed to use our real dataset, D-few-shot which contains real designs-value pairs. We can set a budget of points, m that we are allowed to use as our context.

Since the objective is to find a design that maximizes y, we can input the maximum possible y into the model and ask it to generate the corresponding design using the context!

Experiments

Most experiments were derived from Design-Bench [5], which contains several tasks in offline experimental design. These tasks were further adapted to create the few-shot setting described in [1]. Design-Bench defines several tasks in robotics, genetic, and chemistry, and provides design-value pairs as well as methods to evaluate new designs

Ant

The ant design task, which has been our motivating example throughout, was also taken from Design-Bench! Design-Bench defines an Ant as having 4 legs and a body. The objective is to change the shapes of the body and legs to optimize for the speed of the ant.

On top of this, [1] also defined several other objectives and generated data for them. The paper defines Ant-vy which attempts to maximize the vertical speed of the ant while maintaining horizontal movement, and Ant-energy which attempts to minimize the energy used by the ant. These are alternate objectives to Ant which attempts to maximize horizontal speed.

Following the steps above, synthetic data is generated using Ant designs and Gaussian Processes. An ExPT is then pretrained on this synthetic data to be able to predict Ant designs. In [1], a total of 1,280,000 synthetic functions are used to generate the synthetic data. Note, during the pretraining, the model has no idea what the final objective (speed, vertical speed, etc.) is!

During adaptation, the model is evaluated separately on each of the three objectives. In each case, the designs chosen to create the few-shot dataset, D-few-shot, correspond to only 1% of the total available data. This data is fed into the model as context in order to generate designs.

The below designs were generated by the model and were shown in [1] to perform better than the best designs in the dataset on each of the given objectives! We see some pretty intuitive results from the 3 objectives :

Ant

The optimal Ant has long legs and discovers that leaping forward helps maximize its speed.

Ant-vy

The optimal Ant-vy has discovered that having long hind legs allows it to ‘prop’ itself up, to be able to jump upwards.

Ant-energy

Finally, the optimal Ant-energy designs the shapes of its legs to be different sizes so that it’s able to ‘sit down’ and conserve all its energy!

Other Design-Bench Experiments

In addition to the Ant experiment, 3 other problems were chosen from Design-Bench : optimizing the shape of a robotic cat (D’Kitty) and optimizing for DNA sequences (TFBind-8, TFBind-10).

For each of these design settings, D-few-shot was constructed in two different ways: by considering the poorest 1% of available data and by considering a random 1% subset of the data. ExPT, along with several other common baselines in the domain of offline black-box-optimization were run on the 4 tasks in both the D-few-shot-random and D-few-shot-poorest settings.

Average scores when the models are given access to the 1% poorest performing examples

Average scores when the models are given access to a random 1% of the available examples

This shows that ExPT is able to produce high performing designs even when exposed to examples that may not be of very high quality!

Takeaways

Data-efficient learning is a growing field. In particular, unsupervised pretraining followed by few-shot learning is a popular approach to overcome a lack of labeled data. [1] proposes an approach to carry out experimental design in a few-shot setting. It discusses a new technique to generate synthetic data using Gaussian Processes, which allows for large-scale pretraining, even in a data-efficient setting. It further formulates ExPT, a novel architecture designed to be a foundation model, which, once pretrained on the synthetically generated dataset, can be adapted to any downstream optimization task.

Acknowledgements

Thanks to my co-authors, Tung Nguyen and Aditya Grover for their inputs and feedback on this blogpost.

— Sudhanshu Agrawal

References

[1] Tung Nguyen, Sudhanshu Agrawal, and Aditya Grover. ‘ExPT: Synthetic Pretraining for Few-Shot Experimental Design’. arXiv [Cs.LG], 2023, http://arxiv.org/abs/2310.19961. arXiv.

[2] Daniel James Lizotte. Practical bayesian optimization. 2008

[3] Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, and SM Ali Eslami. Conditional neural processes. In International conference on machine learning, pages 1704–1713. PMLR, 2018

[4] Charles A Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels. Journal of Machine Learning Research, 7(12), 2006.

[5] Brandon Trabucco, Xinyang Geng, Aviral Kumar, and Sergey Levine. Design-bench: Bench-marks for data-driven offline model-based optimization. In International Conference on Machine Learning, pages 21658–21676. PMLR, 2022.