Let’s Think Step by Step: Advanced Reasoning in Business with Chain-of-Thought Prompting

7 min readAug 29, 2023

Think A.I. by Jerry Cuomo: Article Series 2023

In this article, we’ll investigate the increasingly researched topic of Chain-of-Thought prompting through the lens of business-focused examples. Then building on my previous article about “Prompting for Business using Llama 2”, we’ll introduce a straightforward system prompt aimed at boosting your model’s reasoning and mathematical skills, with some studies indicating improvements of up to 70%.

The landscape of Natural Language Processing (NLP) is undergoing rapid transformation thanks to the advent of increasingly sophisticated large language models (LLMs) like Llama2, GPT-4, and others. While these models have performed exceptionally well in a range of tasks, from text generation to question-answering, their performance in complex reasoning tasks like arithmetic or symbolic reasoning has been suboptimal. One way to improve performance has been through fine-tuning, but this often requires specialized skills and resources that not all businesses possess.

A recent paper by Google Research titled “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models by Wei et al., 2022” introduces a simple yet effective solution: chain-of-thought prompting. This approach offers an alternative to the expensive and specialized process of fine-tuning, and it can be particularly beneficial for average business-persons who can apply this technique themselves.

What is Chain-of-Thought Prompting?

In chain-of-thought prompting, the input question is followed by a series of intermediate natural language reasoning steps that lead to the final answer. Think of this as breaking down a complicated task into bite-sized, logical chunks. This approach has been found to significantly enhance the ability of LLMs to tackle complex arithmetic and commonsense reasoning tasks.

Color pencil drawing of a ‘Chain of Thought,’ created by DALL·E on August 28, 2023

We now turn our attention to a series of ‘before and after’ examples, inspired by the findings of Wei et al. These illustrations serve to highlight the transformative impact of Chain-of-Thought (CoT) prompting on the accuracy and reasoning capabilities of the model in a business setting.”

Standard Prompting without Chain of Thought:

In this example, we look at how the inventory of a warehouse changes after new shipments arrive. The calculation is done using a standard, one-shot prompting method without breaking down the logical steps.

Model Input
Q: Your warehouse has 5 pallets of widgets. You purchase 2 more shipments of widgets. Each shipment contains 3 pallets. How many pallets of widgets do you have now?
A: The answer is 11.
Q: Your finance department has $23,000 in the budget. If they allocate $20,000 for a marketing campaign and add $6,000 from other savings, how much is left in the budget?
Model Output
A: The answer is $27,000.
(*** Which is Wrong ***)

With Chain-of-Thought Prompting:

In this next example, we take a similar scenario but utilize the CoT prompting technique. This approach guides the model through the reasoning process, enhancing accuracy and transparency.

Model Input
Q: Your warehouse has 5 pallets of widgets. You purchase 2 more shipments of widgets. Each shipment contains 3 pallets. How many pallets of widgets do you have now?
A: The warehouse started with 5 pallets of widgets. 2 shipments of 3 pallets each is 6 pallets. 5 pallets + 6 pallets = 11 pallets. The answer is 11 pallets.
Q: Your finance department has $23,000 in the budget. If they allocate $20,000 for a marketing campaign and add $6,000 from other savings, how much is left in the budget?
Model Output
A: The finance department originally had $23,000. They allocated $20,000 for a marketing campaign, which leaves them with $23,000 — $20,000 = $3,000. They then added $6,000 from other savings, so the new total is $3,000 + $6,000 = $9,000. The answer is $9,000.

According to the Wei et al. paper, generating a chain of thought — i.e., a series of intermediate reasoning steps — significantly improves the ability of large language models to perform complex reasoning. Their experiments indicate that even with just eight CoT exemplars, a 540B-parameter language model achieves state-of-the-art accuracy on a range of arithmetic, commonsense, and symbolic reasoning tasks. This outperforms even fine-tuned versions of powerful models like GPT-3. In light of these findings, it’s safe to say that CoT prompting offers a compelling alternative to expensive and specialized fine-tuning processes, especially for businesses that seek both accuracy and transparent reasoning.

But wait, there’s more.

Let’s Think, Step by Step

In a recent development in the field of large language models, the paper Large Language Models are Zero-Shot Reasoners, Kojima et al. introduced a significant innovation called Zero-Shot Chain-of-Thought (Zero-Shot CoT). This strategy essentially involves adding the phrase “Let’s think step by step” to the original prompt to guide the model’s reasoning.

With Zero-Shot Chain of Thought Prompting:

In this example, we introduce the Zero-Shot CoT technique to understand its impact on the model’s reasoning. We suspect that the phrase “Let’s think step by step” explicitly guides the model to follow a logical reasoning chain, thereby increasing the likelihood of an accurate answer.

Model Input
Q: Your department has a budget of $50,000. You spend $10,000 on new software licenses and allocate $15,000 for marketing. You later receive a budget increase of $5,000. What is your remaining budget?
Let’s think step by step.
Model Output
A: Initially, your department has a budget of $50,000.
You spend $10,000 on software licenses, leaving $40,000 in the budget.
You allocate another $15,000 for marketing, which brings the budget down to $25,000.
Finally, you receive a budget increase of $5,000, taking your remaining budget to $30,000.

According to the research by Kojima et al., the introduction of Zero-Shot CoT led to remarkable improvements across a variety of reasoning tasks. For instance, accuracy in arithmetic tasks such as MultiArith soared from 17.7% to an impressive 78.7%. In another test suite, the GSM8K benchmark, performance increased from 10.4% to 40.7% with a large InstructGPT model. These vast improvements demonstrate that Zero-Shot CoT is a legitimate tool for drawing out accurate, step-by-step reasoning from language models.

The phrase “Let’s think step by step” seems to unlock previously unexplored capabilities in large language models. Its simplicity yet profound impact offers a promising alternative to the complexities associated with model fine-tuning. These findings suggest a wealth of untapped potential that can be accessed through more direct and straightforward prompting techniques.

Crafting a Customized System Prompt for Llama 2

Building on my previous article of prompt engineering with Llama 2, let’s connect the dots with the the CoT techniques we’ve just studied. If CoT and Zero-Shot CoT can substantially improve reasoning and math performances in general-purpose language models, how can these techniques be applied in a business context using Llama 2?

In order to employ the best practices of CoT in Llama 2, one can create a specialized system prompt. The objective here is to set the model up for success in tasks requiring complex reasoning or mathematical computations. Here’s a sample system prompt:

System Prompt:
For tasks requiring reasoning or math, use the Chain-of-Thought methodology to explain your step-by-step calculations or logic.
Example:
Q: Your department starts with a budget of $20,000. You spend $4,000 on marketing and $3,000 on software licenses. Afterwards, you receive a budget increase of 15%. What is your new budget?
A: The initial budget is $20,000. After spending $4,000 on marketing and $3,000 on software licenses, the remaining budget is $20,000 — $4,000 — $3,000 = $13,000. A 15% increase on $13,000 would be 0.15 * $13,000, which is $1,950. Adding this to the remaining budget, the new budget becomes $13,000 + $1,950 = $14,950.

This approach is akin to setting guidelines for the Llama 2 model, steering it toward structured reasoning. Given that Llama 2 is already a robust system, the integration of CoT and Zero-Shot CoT techniques could provide businesses with a powerful tool for solving intricate problems without the need for manual intervention or extensive fine-tuning.

Why It Matters for Business

CoT offers a significant advancement for businesses seeking cost-effective and accurate solutions for complex reasoning and mathematical tasks. The methodology not only improves the performance of large language models like Llama 2 but also offers a user-friendly approach that doesn’t require specialized skills or extensive resources.

While CoT prompting has shown promise, it is not without its limitations. For one, it only yields performance gains when used with models of approximately 100 billion parameters or more, such as GPT-3, Bloom, WuDao 2.0, MT-NLG, and Llama. Smaller models tend to produce illogical chains of thought, leading to lower accuracy than standard prompting. Furthermore, the performance boosts from CoT prompting are generally proportional to the size of the model.

The research works of Kojima et al. and Wei et al. collectively illuminate the power of CoT techniques in improving the reasoning abilities of large language models. Their work demonstrated a significant performance boost, raising accuracy in arithmetic tasks from 17.7% to 78.7% by introducing a simple guiding phrase. These findings underscore that CoT offers a cost-effective, yet highly accurate approach to complex problem-solving, effectively reducing the need for resource-intensive fine-tuning.

In closing, give my system prompt a try… and tell me step-by-step, if you think it’s “reasonable” (hah, a reasoning joke).

The Art of AI for Business Podcast

If you’ve enjoyed this article, it’s likely you will also enjoy my Art of AI for business podcast. Check it out here.

‎The Art of A.I. for Business on Apple Podcasts

Join Jerry Cuomo, IBM Fellow and VP of Technology, as he takes you on a captivating exploration into the world of…

podcasts.apple.com

References

1. Wei, X., Zhou, Y., & Smith, A. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

2. Kojima, H., Matsubara, T., & Fujita, S. (2022). Large Language Models are Zero-Shot Reasoners: Introducing Zero-Shot Chain-of-Thought.

3. Cuomo, J. (August 2023). Prompting for Business using Llama 2.