Empowering Efficient BO Transfer with Neural Acquisition Process (NAP)

6 min readJun 5, 2023

General Objectives & Results:

Our primary objective is to enhance the effectiveness of Bayesian Optimisation (BO) by leveraging meta-learning to transfer knowledge across different problem domains, thereby significantly improving sample efficiency.

In pursuit of this goal, we introduce the Neural Acquisition Process (NAP), an innovative end-to-end architecture based on Transformer models designed explicitly for BO. NAP learns acquisition functions directly and provides a comprehensive framework for optimising various tasks within the BO paradigm.

Through extensive experiments, we demonstrate the outstanding performance of NAP, achieving state-of-the-art results across diverse domains. Specifically, our framework exhibits remarkable success in antibody design, EDA logic synthesis sequence optimisation, and hyperparameter optimisation tasks, surpassing existing approaches in effectiveness and efficiency.

From Bayesian Optimisation to Meta-Bayesian Optimisation:

Bayesian optimisation, a widely recognised paradigm in machine learning for efficient optimisation of black-box functions, has gained significant traction in diverse domains. This approach relies on two fundamental components to achieve its objectives.

The initial component involves constructing a surrogate model, wherein a Gaussian process (GP) is commonly employed due to its probabilistic nature. GPs offer the advantage of generating predictions with calibrated uncertainties, making them highly suitable for capturing and representing complex patterns in the data. Additionally, GPs exhibit sample efficiency, enabling effective utilisation of limited data.

Once the surrogate model has been established using observed data, the following component determines the most promising regions to explore within the search space. This is accomplished by utilising an acquisition function, which considers the uncertainty estimated by the model. By effectively balancing the exploration-exploitation trade-off, the acquisition function guides the optimisation process towards regions of the search space likely to yield the most significant performance improvement.

Despite its success, the existing setup of Bayesian optimisation has certain limitations. Firstly, the Gaussian process (GP) model encounters well-known computational challenges and often needs to represent high-dimensional spaces adequately. Addressing these limitations typically requires problem-specific techniques and expert knowledge.

Secondly, the current approach treats the acquisition function and process as separate entities, functioning independently. This disjoint treatment may only partially exploit the potential synergies between these components.

Lastly, Bayesian optimisation traditionally operates in a “tabula rasa” manner, meaning that each new problem starts from scratch without leveraging prior knowledge or experience.

Meta-Bayesian optimisation may be used to overcome some of those limitations. The objective of employing meta-learning is to acquire transferable knowledge from similar tasks, which classical GP models and acquisition strategies often struggle with. By learning from related tasks, the meta-learning approach aims to enhance the adaptability and efficiency of BO, enabling it to leverage existing knowledge and experience when confronted with new problems.

In Meta-Bayesian optimisation, we assume that we have observed data from previous optimisation tasks (source tasks) and are now confronted with a new function to optimise (test task). The related literature has a couple of approaches to tackling this problem:

Learn a Meta-Model: FSBO acquires a neural feature extractor that encompasses all source tasks. Subsequently, these features are employed in a GP model, and a conventional acquisition function is utilised. This model, known as a Deep Kernel GP, is already established in the literature. The notable advantage lies in learning the deep kernel across source tasks and applying it to a test task.
Learn a Meta-Acquisition Function: On the contrary, MetaBO adopts a traditional GP model but substitutes the acquisition function with a neural network. This neural acquisition function is subsequently trained using reinforcement learning (RL) techniques.
Learn a Sequential Model: Optformer adopts a unique methodology by training a sequential model to forecast the following points, dimension by dimension, and directly predict the upcoming y values. This is accomplished by applying a notably large transformer model, utilising exclusively supervised learning techniques. The sequential model employed in Optformer enables a meticulous analysis of the data’s underlying patterns and dynamics, facilitating precise predictions at each step.

These approaches excel in transferring information to new tasks and enhancing the sample efficiency of Bayesian optimisation. However, they still encounter challenges stemming from using a GP model and the disjoint nature of the two components.

Neural Acquisition Processes (NAP):

Now, we present our method NAP, which embodies an architecture that boasts the following advantages:

Eliminates the need for a Gaussian process (GP).
Utilises a Transformer architecture that encompasses both the model and acquisition components.
Facilitates end-to-end differentiability, enabling seamless optimisation and learning throughout the entire framework.

Training: Since we do not have access to acquisition function labels in the datasets of source tasks, we utilise reinforcement learning (RL) to train our architecture. In this approach, the reward for the trajectory is determined by the achieved regret. However, we observe a logarithmic sparsity pattern in the rewards, which hampers training effectiveness. We introduce an inductive bias as an auxiliary loss, incorporating supervised information to address this. Notably, in the algorithm presented below, our architecture includes two losses through which the gradient flows back, facilitating comprehensive training.

Cool Properties:

NAP possesses desirable properties derived from its specific architecture based on Transformers.

Property 1: Invariance to History Order: Unlike a classical Transformer, NAP does not employ positional encoding. Consequently, we can treat the observed points’ history as an unordered set, where the sequence in which the points were observed becomes inconsequential. This aspect holds critical significance in Bayesian optimisation since predictions regarding the next point to query should not be influenced by the order in which the previous points were observed.

Property 2: Query Invariance: Referring to the given attention mask, it is noteworthy that each freshly explored point, referred to as the query point, only has visibility of itself and the observed history. The other query points, however, are concealed from its view. Consequently, the predictions generated demonstrate conditional independence concerning the history and remain unswayed by the sequence in which the queries were conducted. This attribute aligns harmoniously with the expected behaviour in Bayesian optimisation, as the predictions for new points must remain unaffected by the concurrent exploration of other points.

Comparison to OptFormer: OptFormer encounters limitations due to its reliance on purely supervised training, which leads to dependencies on both the order of variables and the order of dimensions within each variable. In contrast, our model exhibits several advantages: it is smaller, occupying only 10% of the original model’s capacity, utilises a reduced memory footprint of 40%, and requires a mere 2% of the compute time. Despite these efficiency gains, our model achieves identical regret results to OptFormer.