Agentic workflows Vs. Using LLMs in zero-shot mode

5 min readApr 7, 2024

Current agentic-workflow’s design patterns:

Reflection: The LLM examines its own work to come up with ways to improve it
Tool use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would. Multi-agent programming framework: https://microsoft.github.io/autogen/

In this post we analyze the latest papers about the first two categories of the previously mentioned design patterns:

Design pattern 1: Reflection

The LLM examines its own work to come up with ways to improve it

Concept: This paper presents the SELF-REFINE framework, which improves the output of BIG large language models (LLMs) through iterative refinement using self-generated feedback, without needing external data or additional training.
Application and Impact: It demonstrates wide applicability across tasks such as dialogue generation, code optimization, and mathematical reasoning, showing significant improvements. For entrepreneurs, this approach offers a method to enhance the quality and reliability of AI-generated content in a cost-effective manner.

Concept: Reflexion introduces a novel reinforcement learning framework where LLMs generate natural language feedback as a form of self-correction, promoting better decision-making in interactive environments.
Application and Impact: The framework’s versatility across various tasks, including coding challenges and language reasoning, highlights its potential for developing more adaptive and intuitive AI applications. This could be particularly valuable for startups aiming to create AI solutions that learn and evolve based on verbal interactions, mimicking human learning processes.

Concept: CRITIC enables LLMs to interact with external tools for validating and refining their outputs, addressing issues like inaccurate facts, flawed code, or offensive content. This process mirrors human methods of using resources for fact-checking and improvement.
Application and Impact: By demonstrating improvements in tasks such as question answering, code synthesis, and reducing toxicity, CRITIC offers a pathway to developing more reliable and accurate AI services. This framework could serve as a foundation for entrepreneurs to build AI applications that continuously improve through interaction with external tools, offering a competitive advantage in producing high-quality content.
Significant Improvements in Output Quality: Across all tasks evaluated, outputs generated with SELF-REFINE are consistently preferred over direct generation from the same models, showcasing approximately 20% absolute improvement on average in task performance.

The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.

The paper “Gorilla: Large Language Model Connected with Massive APIs” addresses a specific challenge encountered when using Large Language Models (LLMs) like GPT-4 for tasks that require interacting with external software tools through Application Programming Interfaces (APIs). By fine-tuning on a dataset specifically designed for API calls, Gorilla significantly improves the accuracy of the generated API calls compared to standard LLM outputs. This includes correctly generating the necessary input arguments and adhering to the syntax required by the API. Gorilla is combined with a document retriever system that allows it to reference the most current API documentation during the generation process. This means Gorilla can adapt to changes in API documentation, ensuring that the API calls it generates are always up to date.

Objective: To enable ChatGPT to perform multimodal reasoning and take actions based on both text and visual inputs, addressing the limitation of text-only processing.
Solution: MM-REACT introduces a framework where ChatGPT interacts with a set of ‘vision experts,’ external models specialized in processing and interpreting visual data. Through this interaction, ChatGPT can understand and respond to prompts that include images, diagrams, or other visual elements.
Impact: This approach broadens the application of ChatGPT to tasks that require understanding beyond text, such as image description, visual question answering, and actions based on visual cues, enhancing its utility in more diverse and realistic scenarios.

Objective: Many real-world tasks require reasoning that involves multiple steps, where the output of one step is the input for the next. For instance, solving a math problem might involve several calculations, each depending on the previous one, or finding a piece of information might require a series of searches. When LLMs, such as Toolformer, attempt to solve these tasks, they may need to call external tools (APIs) for specific information or computations. However, managing these tool calls efficiently within a multi-step reasoning process is challenging. LLMs might not plan these calls effectively, leading to redundant or unnecessary queries that waste computational resources and time.
Solution: The Chain-of-Abstraction (CoA) reasoning method trains LLMs to generate abstract reasoning chains with placeholders. Once the chain is defined, these placeholders are then filled with specific knowledge obtained from external tools, enabling the model to execute complex reasoning tasks effectively.
Impact: CoA reasoning allows LLMs to plan their use of external tools more strategically, reducing the computational cost and time delays associated with tool calls. It also enhances the accuracy of the LLMs’ outputs by ensuring that each step of the reasoning process is grounded in reliable external data.