Automatic Prompt Engineering: A New Approach to Fine-Tuning Large Language Models

3 min readMay 17, 2023

In the burgeoning field of AI, large language models (LLMs) have demonstrated impressive capabilities as general-purpose computers. The performance of these models, however, depends significantly on the quality of the prompts used to guide their output. Most effective prompts are meticulously handcrafted by humans, a process that requires considerable expertise and time. To address this challenge, Zhou et al., (2022) proposed an innovative method called Automatic Prompt Engineer (APE) for automatic instruction generation and selection. This method not only alleviates the need for human input but also enhances the model’s performance in various tasks. In this article, we will delve into the workings of APE and explore its potential implications and applications.

An Overview of Automatic Prompt Engineering (APE)

APE treats the instruction as the “program,” optimized by searching over a pool of instruction candidates proposed by an LLM to maximize a chosen score function. The quality of the selected instruction is evaluated by the zero-shot performance of another LLM following the chosen instruction. This novel method has shown to significantly outperform the previous LLM baseline and achieve performance comparable to, or even better than, instructions crafted by human annotators.

This method represents a significant stride forward in the field of AI as it leverages LLM’s capabilities in a novel way to generate and select instructions automatically. This is done by formulating the task as a black-box optimization problem using LLMs to generate and search over heuristically viable candidate solutions.

Let’s explore the workflow of APE, which is broken down into five main steps:

The LLM is given a task specified via output demonstrations.
The LLM generates several instruction candidates through direct inference or a recursive process based on semantic similarity.
The candidates are executed using the target model.
The LLM computes evaluation scores for each instruction.
The instruction with the highest score is selected.

This process of automatic instruction generation and selection fundamentally alters the way we approach fine-tuning and prompt engineering in LLMs. By automating the process, APE significantly reduces the time and effort required to generate effective instructions.

Contributions and Findings

APE’s method for automatic instruction generation represents a notable shift from traditional manual prompt engineering. The method has achieved remarkable results, surpassing human performance when using the InstructGPT model. This is measured by the interquartile mean across the 24 Natural Language Processing (NLP) tasks introduced by Honovich et al. (2022).

In addition to surpassing human performance, APE has shown to be an effective tool for enhancing the performance of LLMs in various tasks. It has demonstrated improved few-shot learning performance, identified better zero-shot chain-of-thought prompts, and effectively guided models toward truthfulness and informativeness.

Furthermore, extensive experiments were conducted to test APE’s capabilities. The results have shown that the automatically generated instructions outperformed the LLM baseline by a large margin. The performance was either better or comparable to the instructions generated by human annotators on 24/24 Instruction Induction tasks and 17/21 curated BIG-Bench tasks.

Implications and Applications

The advent of APE has far-reaching implications for the future of AI and machine learning. It stands as a significant milestone in the continuous journey to improve the performance of AI models and reduce human effort in model fine-tuning.

APE provides a new way to approach instruction generation, which could have significant implications for sectors reliant on LLMs. For instance, in education, an LLM fine-tuned using APE could generate high-quality responses to a wide range of prompts

Automatic Prompt Engineering: A New Approach to Fine-Tuning Large Language Models

An Overview of Automatic Prompt Engineering (APE)

Contributions and Findings

Implications and Applications

Written by Ali Razavi