Evolving Language Model Prompting: Prompting Strategies for Enhanced Language Model Performance (part 2)

David Gutsch
8 min readOct 24, 2023

--

prompt engineering AI

In part 1 of this series: “Tackling Language Model Limitations: A Dual Approach to Compositionality and Modular Reasoning,” we introduced some inherent shortcomings of language models, and explored some of the methods by which we can augment models to overcome those shortcomings. In this article I will introduce two more prompting strategies and their respective tradeoffs with the intention of providing enough general context on prompting strategies to enable you to choose the right strategy for the LLM agents you’re building today!

In part one the two main strategies to augment LLMs shortcomings were: 1 the utility of using external tools to enhance the language model’s access to information. 2) elicitive prompting strategies such as Chain of Thought or Self-Ask which provide a methodology by which we can improve the reasoning skills of LLM agents. Today we’ll take a look at the ReAct prompting strategy which uses both of the aforementioned strategies to provide better answers to users. The second strategy we’ll introduce is Plan and Solve Prompting which only uses elicitive prompting, but I have a conjecture that it could also use external tools to improve on all three of the prompting strategies for narrow use cases.

ReAct

ReAct uses both external tools to provide access to information the LLM itself does not have, as well as providing a, “reason to act and act to reason,” strategy to synthesize answers to questions the LLM cannot on it’s own synthesize. In the words of the authors of the original article:

ReAct prompts LLMs to generate both verbal reasoning traces and actions pertaining to a task in an interleaved manner, which allows the model to perform dynamic reasoning to create, maintain, and adjust high-level plans for acting (reason to act), while also interact with the external environments (e.g. Wikipedia) to incorporate additional information into reasoning (act to reason).

What I find unique about this strategy is what the authors refer to as interleaving, where the prompt asks questions, looks up information, then makes observations about the queried information before using both queried information and previous observations to ask another round of questions going through another reason to act cycle again. This method gives the model better interoperability between information retrieval and observation over the retrieved information.

Let’s take a look at an example of ReAct in action so we can get a better understanding of how the method works.

ReAct example (reason to act)
1d illustrates an example of ReAct

We see in the example above the agent thinks, queries information, then makes observations on the new information before starting the cycle again. Then once sufficient knowledge and reasoning has been performed the agent returns a final answer. The following describes the way the agent is configured in the experiment for the paper:

Action Space: We design a simple Wikipedia web API with three types of actions to support interactive information retrieval: (1) search[entity], which returns the first 5 sentences from a corresponding entity wiki page, if it exists, or else suggests top-5 similar entities from the wiki search engine. (2) lookup[string] which would return the next sentence in the page containing string, simulating ctrl+F functionality on the browser. (3) finish[answer], which would finish the current task with an answer. The purpose is to simulate how humans would interact with Wikipedia, and force models to retrieve via explicit reasoning in language.

For this experiment the agent was configured to mimic the strategy that we humans would use to browse the web to answer a question. In this case the agent is using Wikipedia to inform itself on the question for which it is answering; However, we could substitute any knowledge base over which we can define search and lookup operations, where “search” is used to find semantic similarity, and “lookup” is used to find the next lexical occurrence of a keyword. We humans do this without realizing what we are doing. We first find an article that contains information which most closely matches what we are attempting to learn about (i.e. the semantic meaning). Then once we have enough context on the subject we narrow down our search to a keyword that will better inform us on the specifics of the information we need to learn (i.e. a lexical search). In the example included above the ReAct agent performs a semantic search on “Apple Remotes” and discovers it uses the “Front Row (software),” on which it proceeds to perform a lexical search. Notice when the agent is unable to find a Wikipedia match for “Front Row,” it receives a list of similar articles it then observes which one will be most useful for answering the initial question. This is a powerful strategy that we humans can use to increase our time to solution when researching a new or unfamiliar concept.

Finally let’s compare ReAct to some of the other prompting and information retrieval mechanisms that we’ve looked at so far. The MRKL system we first looked at is technique aimed directly at information retrieval, while powerful for information retrieval, it doesn’t have a reasoning mechanism akin to ReAct. Now Self-Ask is very similar to ReAct, but there is a defining features that make it different. For ReAct each cycle of reasoning to act and acting to reason is grounded in the information store that we have provided the model. The research from the ReAct article found that when compared to other prompting and information retrieval agents, ReAct keeps itself more grounded in the information provided. To be fair to Self-Ask they did integrate a search engine for answering sub-questions in a part of their study, but at the core of their strategy was prompting and less so knowledge augmentation. There are important tradeoffs in play, but it lends itself well to questions that need to be grounded in your information store. In the article they use a Wikipedia API, but that could be substituted with any knowledge retrieval system.

Limitations

Now that we know some differentiating features of ReAct let’s take a look at the limitations.

While interleaving reasoning, action, and observation steps improves ReAct’s groundedness and trustworthiness, such a structural constraint also reduces its flexibility in formulating reasoning steps, leading to more reasoning error rate than CoT… We note that there is one frequent error pattern specific to ReAct, in which the model repetitively generates the previous thoughts and actions, and we categorize it as part of “reasoning error” as the model fails to reason about what the proper next action to take and jump out of the loop.

For ReAct, successfully retrieving information knowledge via search is critical. Non-informative search, which counts for 23% of the error cases, derails the model reasoning and gives it a hard time to recover and reformulate thoughts. This is perhaps an expected trade-off between factuality and flexibility, which motivates our proposed strategies of combining two methods.

A ReAct agent’s ability to ground itself in data from your knowledge store can limit its ability to reason. In such cases Self-Ask and even Chain of Thought (CoT) Reasoning may be better strategies. If the agent cannot find the information it needs to inform its next set of observations and actions, it stands to reason that it will limit the agent’s ability to converge on the correct final answer. Given this limitation of ReAct let’s take a look at another prompting strategy that can be more flexible.

Plan-and-Solve Prompting

Now on to our final prompting strategy, Plan and Solve prompting is an elicitive prompting strategy which is different from all three of our previously introduced strategies because it does not have access to external information. Let’s introduce the strategy as defined by it’s authors then enumerate it’s advantages:

Plan-and-Solve Prompting consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan. Replace “let’s think step by step” with: “Let’s first understand the problem and devise a plan to solve the problem. Then, let’s carry out the plan and solve the problem step by step” (see figure 2(b)).

Plan and Solve prompting example
(b) Illustrates an example of Plan and Solve prompting

Sounds pretty similar to Self-Ask or chain of thought prompting right? While true there are some important improvements that make this prompting strategy useful. Today’s LLMs are sensitive to phrasing and expressions in prompts, so in addition to planning and solving the problem, we can add more specific language to the prompt to better identify the specific problems it needs to solve. In this article the authors were attempting to improve the ability of LLMs to perform arithmetic, so to achieve this they added some specific instructions to help the LLM focus on important intermediary steps required to do math.

  • “pay attention to calculation”
  • “extract relevant variables and the corresponding numerals”

While this strategy is simple, it was devised to solve two of the three errors that commonly plague Chain of Thought (CoT) reasoning. Calculation errors: These are errors in the calculation leading to wrong answers, and Missing step errors: these occur when some intermediate reasoning step(s) is missed especially when there are many steps involved [2]. These two kinds of errors compound: if the LLM leaves out the relevant and important variables, it is more likely to miss out on relevant reasoning steps, and that is supported by the results of the paper. “It is observed that both variable definition and plan existences have a negative correlation with calculation errors and missing-reasining-step errors” [2]. In other words the existence of a plan and an explicit focus on defining useful variables in the prompt decrease the likelihood of missing reasoning step errors.

Limitations

There are some limitations to this strategy. While improving calculation errors and missing reasoning steps, this methodology cannot solve the problem of semantic misunderstandings of an LLM. In addition there is an additional cost to using this method, which is the cost of devising a prompt that will be able to help the LLM focus on the details specific to the problem at hand. In my estimation this will limit the utility of this prompting strategy to narrow problem sets, which is supported by the use case the authors chose for the article, performing arithmetic.

Why am I excited by this method

As I’m sure you’ve noticed Plan and Solve Prompting is the only strategy I have introduced that is ONLY an elicitive prompt. Now given the wonderful tools that langchain provides us, I believe one or more tools can be made available to this strategy to augment its lack of information access. Hopefully the question is ruminating, David you’ve introduced three other valuable and generalizable strategies to improve the utility of LLM agents, why do I need another one? Well since we can write specific prompts to help an LLM focus on a narrow problem, I hypothesize that using plan and solve prompting in conjunction with an information retrieval mechanism to which the other three prompts had access; plan and solve prompting with information retrieval will give us the ability to better solve specific narrow problems that other prompting strategies do not have the specificity and focus to solve.

I am currently working on a project to test this hypothesis and I will report back soon with the results! As always thank you for choosing to spend your time and attention with me, I will always do my upmost to ensure I’m providing maximal value in exchange for it!

If you enjoyed this series, and have not read my 3 part series on vector databases, I recommend you take a look!

References

  1. https://arxiv.org/pdf/2210.03629.pdf
  2. https://arxiv.org/pdf/2305.04091.pdf

--

--