CoT in Large Language Models: In Context Learning

Michael X
9 min readMay 4, 2023

--

The field of artificial intelligence has made remarkable strides in recent years, particularly in the area of language modeling. One of the latest advancements in this field is Chain-of-Thought (CoT), which aims to enhance the reasoning ability of language models for complex tasks such as math and physical problems. To achieve this goal, CoT employs various approaches, including in-context learning and fine-tuning the language model. In-context learning involves the injection of few-shot samples into the prompts, allowing the model to learn from in-context examples and output similar logits. This approach is particularly effective for improving the model’s ability to reason and make logical deductions while freezing the parameters of LLM. On the other hand, fine-tuning the language model to facilitate the reasoning process involves leveraging CoT data to update the parameters of the language model. By updating the parameters, the model can better reason about complex problems and provide accurate solutions.

In Context Learning of CoT

Zero-Shot CoT and Few-Shot CoT are among in context learning of CoT. Zero-Shot CoT refers to the ability of LLMs to reason through a chain of thoughts without any prior training examples, while Few-Shot CoT involves providing the model with a small number of examples to guide its reasoning. These methods represent significant advancements in the field of artificial intelligence, as they allow LLMs to reason through complex problems in a more natural and human-like way. By improving the ability of LLMs to reason through complex problems, Zero-Shot CoT and Few-Shot CoT have the potential to transform many fields, including education, healthcare, and scientific research. However, there are still challenges to be addressed, including improving the accuracy and reliability of these models and ensuring that they are robust to different types of inputs and contexts. Examples are shown in Figure 2 and Figure 3.

Self-Consistency CoT

For another, Self-Consistency CoT[5] represents a significant advancement in the field of language modeling, as it allows language models to reason through complex problems in a more diverse and flexible way. By leveraging the intuition that complex reasoning tasks typically admit multiple reasoning paths that reach a correct answer, self-consistency generates a diverse set of reasoning paths and determines the optimal answer by marginalizing out the sampled reasoning paths to find the most consistent answer in the final answer set. Compared to other decoding methods, self-consistency avoids the repetitiveness and local-optimality that plague greedy decoding, while mitigating the stochasticity of a single sampled generation. This approach is entirely unsupervised, works off-the-shelf with pre-trained language models, requires no additional human annotation, and avoids any additional training, auxiliary models or fine-tuning. Moreover, the results of the evaluation demonstrate that self-consistency improves the language model’s reasoning performance by a significant margin across a variety of multi-step reasoning tasks. This finding suggests that self-consistency has the potential to revolutionize many fields, including education, healthcare, and scientific research, by enabling language models to perform tasks that were previously considered too challenging for artificial intelligence systems.

Figure 6 Self-Consistency CoT

As shown in Figure 6, the self-consistency method comprises three distinct steps: firstly, the language model is prompted using the chain-of-thought (CoT) method; secondly, the conventional “greedy decode” strategy in CoT prompting is substituted with sampling from the language model’s decoder to produce a diverse array of possible reasoning paths; and finally, the generated reasoning paths are marginalized, and the most consistent answer in the final answer set is chosen as the aggregated result.

Least-to-Most CoT

Least-to-Most CoT[6] proposes a new approach to tackle the issue of easy-to-hard generalization in Chain-of-Thought prompting. While Chain-of-Thought has shown great promise in natural language processing tasks, it struggles with compositional generalization, which requires the model to solve problems that are harder than those demonstrated in the training examples. The solution proposed by Least-to-Most CoT involves breaking down a complex problem into a list of easier subproblems and then sequentially solving them. This approach is implemented using few-shot prompting, without the need for training or fine-tuning in either stage. The method borrows its name from educational psychology, where it is used to describe a technique for teaching new skills by using a progressive sequence of prompts. In the context of language modeling, Least-to-Most CoT is used to teach the model to generalize to more challenging problem-solving tasks.

Figure 7 Least-to-Most CoT

Least-to-most prompting solves a math word problem in two stages: problem reduction and problem-solving. In the first stage, the language model is asked to reduce the complex problem into a series of simpler subproblems that increase in complexity. The prompt provided to the language model in this stage includes constant examples that demonstrate the reduction process followed by the specific question to be reduced. By breaking down the original problem into smaller, more manageable subproblems, the language model can better reason about and solve the problem at hand.

In the second stage of least-to-most prompting, the language model is tasked with sequentially solving the subproblems generated in the first stage. The prompt provided to the model in this stage consists of three parts: constant examples demonstrating how subproblems are solved, a potentially empty list of previously answered subquestions and generated solutions, and the question to be answered next. Starting from the first subproblem, the language model solves each subproblem in sequence, using the answers to previously solved subproblems to inform its reasoning and deduction process.

For example, in the problem-solving stage of the example shown in Figure 7, the language model is asked to solve each subproblem generated in the reduction stage, starting with the first subproblem “How long does each trip take?”. The answer generated by the model is then used to construct the next prompt, which includes the previously generated answer and the next subproblem to be solved. This process continues until the final answer to the original problem is obtained.

Automatic CoT

Automatic CoT[7] is a promising approach to address the limitations of Manual-CoT and the naive approach of Zero-Shot-CoT. The idea is to use machine learning techniques to automatically generate effective demonstrations with questions and reasoning chains. Automatic CoT can be implemented using various methods, such as clustering similar questions and generating representative reasoning chains for each cluster or using a reinforcement learning framework to learn to generate coherent reasoning chains Automatic CoT has shown great potential in improving the scalability and effectiveness of CoT prompting. For instance, it demonstrated that Automatic CoT outperforms Manual-CoT on several reasoning tasks, including arithmetic and commonsense reasoning. Moreover, Automatic CoT can also be combined with Retrieval CoT to further improve the quality and diversity of demonstrations As LLMs continue to advance, Automatic CoT is expected to play an increasingly important role in enabling LLMs to perform complex reasoning tasks with minimal human effort.

Figure 8 Automatic CoT

As shown in Figure 8, Automatic CoT consists of two main stages. The first stage is question clustering, which partitions a given set of questions into a small number of clusters based on their vector representations. The second stage is demonstration sampling, which selects a representative question from each cluster and generates its reasoning chain using Zero-Shot-CoT with simple heuristics. To select the representative questions, authors sort the questions in each cluster by ascending order of their distance to the cluster center and preferentially pick the most typical question in each cluster. Then they construct a demonstration for each cluster by feeding the selected questions into an LLM using Zero-Shot-CoT to generate a reasoning chain consisting of a rationale and an extracted answer. The generated demonstrations follow simple heuristics that encourage sampling simpler questions and rationales. The generated demonstrations are then used to augment a test question for in-context learning, where the input is the concatenation of all the demonstrations followed by the test question with a prompt.

PoT

In PoT[8], LLMs can express their reasoning steps as Python programs, and these programs can be designed to implement any arbitrary computation logic that can be expressed in Python. This flexibility allows PoT to solve tasks that CoT cannot, such as solving differential equations or computing integrals. PoT also has the potential to be more efficient than CoT in cases where computation is the bottleneck. By delegating computation to an external interpreter, PoT can take advantage of the optimized numerical computation libraries available in Python to achieve faster and more accurate results. PoT represents a promising direction for future research in numerical reasoning with language models.

Figure 9 PoT

As shown in Figure 9, Few-shot PoT prompting and Zero-shot PoT prompting are the two modes of operation for the proposed PoT method. In the few-shot setting, a small number of input-output pairs of (question, program of thoughts) are provided as demonstrations to the LLM to teach it how to generate thoughtful programs. In contrast, zero-shot PoT prompting only provides an instruction without any example demonstration. Unlike zero-shot CoT, which requires an additional step to extract the answer from the chain of thoughts, zero-shot PoT can return the answer directly without any further steps. However, for some problems that involve commonsense reasoning, an intermediate result may be generated using PoT, which is then combined with the question to continue prompting the LLM to derive the final answer. In such cases, an additional step of textual reasoning may be required to obtain the final answer. The PoT method is particularly suitable for problems that involve complex mathematical expressions, which LLMs may not be able to solve efficiently.

Complex CoT

Complex CoT[9] refers to a new example selection scheme for chain-of-thoughts multi-step reasoning, which focuses on choosing examples with complex reasoning chains. Existing sample selection methods are often based on manual heuristic rules, optimization and search, or retrieval from a large training set. In contrast, complexity-based prompting in Complex CoT chooses examples with more reasoning steps as the prompt. This new scheme has been shown to achieve better performance for large language models, such as GPT-3 175B, when provided with more complex prompts.

Moreover, Complex CoT extends the complexity-based selection criteria from the input space (prompts) to the output space (reasoning chains generated by the language model). By using the idea of self-consistency, the method samples multiple reasoning chains from the model, which can lead to different answers, and then selects the majority of the generated answers from the top K complex chains. This process, known as complexity-based consistency, has shown further performance gains on top of complexity-based prompting.

Figure 10 Complex CoT

As shown in Figure 10, A: The intermediate reasoning steps toward the final answer are denoted by the chain of thoughts (CoT) and are highlighted in blue. For CoT prompting, a stack of a few (often 8) CoT cases is provided as input before a test question. The language model then generates an output CoT for the test question. B: Chains with more reasoning steps (such as the one in subfigure B with 9 steps) are considered to be of harder reasoning complexity, as compared to chains with fewer reasoning steps (such as the one in subfigure A with only 2 steps). C: During decoding, it samples N reasoning chains from the language model (with N set to 5 in this case) and selects the majority answer over the K (with K set to 3 here) most complex generated chains.

(TO BE CONTINUED)

If you are also interested in the first article of this series, please check the link below.

--

--