On short of “ReWOO: Decoupling Reasoning from Observations
for Efficient Augmented Language Models”

Exploring the Potential of Decoupled Reasoning

Minh Le Duc
5 min readJul 9, 2024

Introduction

The current Augmented Language Models (ALM) are hitting a wall. Their reliance on tools, which allow them to extract external observations, helps them enhance their reasoning but wastes time and resources. To truly unlock its potential, we need a new approach to reasoning.

ReWOO’s workflow: Planner creates interrelated plans for a given question before responding with the tool. The plan encourages Workers to utilize external tools and gather proof. Finally, plans and evidence are linked and entered into Solver to determine the answer.

ReWOO, which stands for Reasoning WithOut Observation, comes to save the day. It is a modular paradigm that decouples the reasoning process from external observation. By doing so, ReWOO greatly decreases token usage, repeated execution, and the enormous computational complexity caused by duplicate prompts. As a result, ReWOO achieves higher accuracy than existing frameworks as well as being 5 times more efficient in token counting on a multi-step reasoning benchmark.

One of the greatest things about ReWOO is its modular design. In terms of system design, the architecture appreciates the singularity of each module, creating a great chance for adaptability, maintainability, and scalability for an AI system. For example, one can quickly modify a component (a search engine, a thought process, etc.) while causing no harm to the remaining ReWOO paradigm. In addition, reducing the quantity of tokens while maintaining the quality of the generated response makes ReWOO stand out compared to the existing monolith AI-based frameworks. Thus, ReWOO is highly recommended for individuals who value the elegance of system design while also being fascinated by AI.

Problem statement

Existing ALM systems facilitate LLM thinking by integrating insights from different tools. In other words, a user’s prompt is wrapped with context prompts and exemplars before being fed into an LLM for reasoning. The LLM receives the context and exemplars many times, leading to duplication.

Using the RAG system as an example, while it provides accurate information that is relevant to our knowledge based on our personal data collection, eliminating hallucinations, all of the relevant documents provided by the retrieval component and fed directly into the prompt can result in prompt redundancy. Additionally, other frameworks, such as ReAct or RAT, use tools to improve the quality of their reasoning based on external observation. However, the response time of those tools should be considered because they may get halted during the request process. Furthermore, it may get duplicate execution results in the same searched sources by the external tools, or even worse, it can get disconnected during the request/response phase, causing the whole pipeline to fail.

Therefore, we need a paradigm that is able to:

  • Reduce the number of tokens that are retrieved via external tools, for instance, redundant prompts, repeated executions, etc.
  • Decouple each component so that if a single part of the process fails, the system can still function properly.
  • Maintain or improve the performance.

How ReWOO Works

ReWOO divides the core components of ALM, including step-wise reasoning, tool-calls, and summarization, into three modules: Planner, Worker, and Solver. The Planner divides a job into interdependent plans, each of which is assigned to a Worker. The Worker gathers external knowledge using technologies to offer proof. Solver combines plans and facts to provide a comprehensive solution to the starting challenge.

To be more specific, ReWOO described below:

  • Planner uses the predictable reasoning of LLMs to create a solution blueprint. Concretely, it consists of sequential tuples (Plan, #E), where Plan is a descriptive message for the current step and #Es, denoted by step number s, is a specific token for storing apparently valid evidence from the corresponding designated step Worker[Instruction].
  • Worker allows ReWOO to interact with the environment through tool-calls.
  • Solver examines all plans and evidence to develop a solution to the original assignment or problem, such as answering QA questions or returning work status for action requests.
Difference between ReWOO and other monolith frameworks. In (a) observation-dependent reasoning, the task requested by the user is initially wrapped in context prompts and exemplars before being fed into an LLM to begin the reasoning process. The LLM creates a thought (T) and an action (A) before waiting for an observation (O) from tools. The observation is added to the prompt history to initiate the next LLM call. In ReWOO (b), Planner generates a list of interconnected plans (P) and instructs Worker to retrieve evidence (E) from tools. The P and E are combined with the task and then placed into Solver to generate the final solution. Note that in (a), the context and exemplars are continually input into the LLM, which results in quick redundancy.

ReWOO can handle complicated, multi-step activities by referring to #Es from earlier stages in instructions to Workers. This is especially useful for jobs when following steps rely on prior observations. Workers are tasked with populating #Es with genuine evidence or observations based on the blueprint provided by the Planner. Prompting Solver to use offered plans and evidence “with caution” improves ReWOO’s performance. Solver’s capacity to complete basic tasks and partially compensate for Planner or Worker errors is responsible for this improvement.

Further developments and Conclusion

ReWOO is a modular ALM framework that solves multi-step reasoning problems effectively by separating reasoning from tool input and observations. Comprehensive studies on both public NLP benchmarks and curated tasks show that ReWOO outperforms other methods in terms of increasing performance while using far fewer tokens. A side study demonstrates that ReWOO performs pretty well in the event of a tool failure. The authors’ analysis also reveals the possibility of general reasoning offloading through instruction tailoring and specialization.

From my perspective, ReWOO should become the standard for developing an AI system that prioritizes maintainability and adaptability while maintaining accuracy. Indeed, by decoupling the reasoning process into many components, ReWOO enables those components to function independently as services. This makes ReWOO pretty SOLID. These services may be represented as nodes, rendering the entire system observable as a directed acyclic graph, making the system clearly practicable.

Future advancements to ALM systems based on ReWOO include modular LLM fine-tuning, tool representation learning, and system graph learning and optimization. The authors show that their work provides a strong framework for these developments, bringing us closer to fully scalable AGI.

Thank you for reading this article; I hope it added something to your knowledge bank! Just before you leave:

👉 Be sure to clap and follow me. It would be a great motivation for me.

👉 More details at: Substack

👉Follow me: LinkedIn | Github

Reference

  1. Xu et al. — ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models — URL: https://arxiv.org/abs/2305.18323
  2. Lewis et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — URL: https://arxiv.org/abs/2005.11401
  3. Yao et al. — ReAct: Synergizing reasoning and acting in language models — URL: https://arxiv.org/abs/2210.03629
  4. Wang et al — RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation — URL: https://arxiv.org/abs/2403.05313v1
  5. Wei et al. Chain of thought prompting elicits reasoning in large language models — URL: https://arxiv.org/abs/2201.11903

Prompt Redundancy Reduction

The article saves a whole section to prove this argument. It is a fact that we usually use AI tools in a black-box way, such as ChatGPT, Claude, etc. For those who are developers, those services charge by counting the number of input/output tokens. Not paying attention to those details might cost us a fortune. I leave the proof below:

--

--