E10 : Plan-and-Solve Prompting

Published in

Research Papers Summarized

4 min readSep 30, 2023

Adding more detailed instructions to zero-shot-CoT prompt helps in eliciting better reasoning capabilities in LLMs

Paper Name : Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Paper URL : https://arxiv.org/abs/2305.04091

Authors : Lei Wang, Wanyu Xu, Yihuai Lan Zhiqiang Hu, Yunshi Lan,Roy Ka-Wei Lee, Ee-Peng Lim

Conference : ACL 2023

Please find the annotated paper here

Problem Statement :

Few-shot CoT prompting requires manual crafting of exemplars along with reasoning steps (rationale)
Zero-shot CoT eliminates the need for exemplars but on the other hand suffers lack of performance compared to Few-shot CoT.
This drop in accuracy is attributed to three types of error - calculation errors, missing reasoning steps, semantic misunderstanding

Solution :

Designing a prompting strategy that includes detailed instruction to LLM but at the same time does not require few-shot examples.
The zero-shot CoT prompt (Lets think step by step) is replaced with a detailed prompt namely Plan-and-Solve(PS) prompting.
The PS prompting has two variants - PS and PS+
In PS prompting, the zero-shot CoT prompt is replaced with the below instruction to reduce the errors occurring due to missing reasoning steps.
Lets first understand the problem and devise a plan to solve the problem. Then, lets carry out the plan and solve the problem step by step.
In PS+ prompting, below additional instructions are added to improve the quality of the reasoning steps generated and improve the quality of the calculations by LLMs.
extract relevant variables and their corresponding numerals
calculate intermediate results(pay attention to calculation and commonsense)

Experimentation :

The new prompting technique (PS and PS+) were tested using GPT-3 (text-davinci-003) model.
The temperature of the model was set at 0 for without self-consistency experiments and was set at 0.7 and N (number of responses to be generated) as 10 for with self-consistency experiments.
The prompting strategy was tested on 10 benchmark datasets spread across 3 reasoning tasks namely arithmetic reasoning, common-sense reasoning and symbolic reasoning.
Arithmetic Reasoning datasets - GSM8K, SingleEq, AQuA, AddSub, MultiArith, SVAMP
Common sense Reasoning datasets - StrategyQA, CommonsenseQA
Symbolic Reasoning datasets - Last letter concatenation, Coin flip
PS and PS+ variants were tested against 3 types of prompting baselines - Zero-shot CoT and Zero-shot PoT,Manual Few-shot CoT, Auto-CoT

Observations :

The PS+ prompting outperformed Zero-shot-CoT prompting baselines in all 10 datasets by an average of 2.5% higher accuracy.
The above observation proves the hypothesis, that LLMs can reason better with more detailed instruction even in a zero-shot setting.
The PS+ prompting outperformed Manual-Few-shot-CoT prompting in 4 out of 10 tasks.
The 4 tasks include 3 task from arithmetic reasoning and 1 from symbolic reasoning.
The above observation is a promising one towards improving the reasoning capabilities of LLM with more detailed instructions and less/zero examples.
PS+ prompting outperformed Zero-shot-PoT in 5 out of 6 arithmetic tasks. The results could not be compared in common sense and symbolic tasks as Zero-shot-PoT does not work for them.
PS+ prompting outperformed Auto-CoT in 4 out of 6 arithmetic tasks.
A qualitative comparison between Zero-shot-CoT vs PS vs PS+ was done to understand the distribution of errors in 100 sample data identified from GSM8K dataset.
Results show that PS+ prompting gives 39 wrong answers, out which 5% are due to calculation error, 7% are due to missing reasoning steps, 27% are due to semantic misunderstanding. This is in comparison with 44 wrong answers by Zero-shot-CoT that has the distribution as 7%,12%,27% across the different error types respectively.

Evaluation was done on GSM8K and SVAMP dataset with and without self-consistency for Zero-shot-CoT and PS+ prompting and each time PS+ prompting outperformed Zero-shot-CoT.

PS+ prompting could not outperform Manual-Few-shot CoT in any of the common sense reasoning tasks.This implies that LLMs still require guidance in form examples (few-shot) to solve common-sense related reasoning

Limitations :

Designing the correct prompt that can generalise well will be a time consuming process.
Solving common sense reasoning tasks using zero-shot prompts still lags behind solving using few-shot CoT.

Conclusion :

The experiments clearly indicate that even zero-shot prompts with detailed instruction can definitely elicit better reasoning capabilities in LLMs.
With detailed and more clearer prompts, this can be good start towards eliciting reasoning capabilities in LLMs without the need for curating few-shot examples for use in Manual-CoT.

Written by Praveen Thenraj