Exploring the LLM Landscape: From Parametric Memory to Agent-Oriented Models

Deepak Babu P R
4 min readJun 28, 2023

--

LLMs are fast-evolving and we have a new model every week, showing up on leaderboards[1] beating previous SoTA model on multiple NLP benchmarks. There are multiple architectures with nuances to training and dataset generation. This post is an attempt to broadly categorize the LLMs and paint a picture of different ways to adopt LLMs to use-cases, discuss pros and cons of each approach. It is to be noted, the distinction among these categories of LLMs are not crystal-clear and they have blurred boundary lines. As an example, you could have parameteric LLM as an agent or parameteric LLM trained using instruct format.

Parametric Memory LLMs: Self-Reliant Titans of Information

Parametric memory LLMs are akin to colossal knowledge repositories, perfectly encapsulating a world of information within their intricate neural network structures. These self-reliant models, devoid of external memory dependencies, can effectively store and retrieve knowledge through fixed weights within the network. This unique trait enables them to scale seamlessly with increasing parameter counts, thus achieving state-of-the-art accuracies on many NLI/U tasks.

Prominent examples of these titan models include Google’s PaLM (Pathway LM) with a staggering 540B parameters, GPT-3 housing 175B parameters, and the relatively lighter Chinchila[2] carrying 65B parameters. Though their monumental size poses challenges for inference, the AI community has responded with dexterity. We’re witnessing a promising trend of smaller, yet powerful models that match their larger counterparts in performance, thanks to strategic techniques like self-instruct and parameter-efficient training. Nonetheless, these models do possess their unique quirks, such as their propensity to ‘hallucinate’ or fabricate compelling yet false facts — a challenge yet to be overcome.

👍 No dependency on external modules. Knowledge stored in parameter. benefits reasoning.

👎 Tendency to hallucinate since knowledge is hidden in parameters and has no way to verify

👎 Updating a model to recent events and facts requires full pretraining which is expensive

Non-Parametric or External Memory LLMs: Harnessing External Memory for Freedom

In contrast to their parametric counterparts, non-parametric LLMs ingeniously leverage external memory resources, liberating themselves from the constraints of their internal memory. This innovative approach allows these models to remain streamlined and current without necessitating constant retraining and gradient updates — a significant advantage that drastically reduces model hallucinations and ensures more reliable outputs.

However, every innovation brings its own set of challenges. In this instance, the added complexity of maintaining a supplementary retrieval model is an inevitable trade-off. We’re exploring various paradigms to manage this complexity, including ‘frozen’ LLM and plug-and-play KB techniques, and the ground-breaking Retrieval Augmented Generation (RAG) approach.

👍 Reduced Hallucination. Can be smaller LLMs, knowledge outsourced to exernal memory fetch at inference

👍 LLMs can preserve freshness without retraining, since knowledge is decoupled in form of external index

👎 Makes the architecture complex as it needs a retrieval from external index.

LLM as Agents: A New Age of Reasoning and Control

The AI research landscape is abuzz with an emerging and exciting prospect — LLMs as autonomous agents proficient in planning and control. The concept of an LLM agent capable of breaking down complex tasks into component questions and actions has unleashed a world of possibilities.

Innovations such as ReACT’s[3] LLM agent, or the various interpretations presented by Toolformer, ReACT, WebGPT, and DSP, are illuminating the way forward. These trailblazers are setting the stage for LLMs that can emulate human-like reasoning, invoke complex tools like Python code interpreters or mathematical calculators, and align with human values for a more dependable and meaningful response.

👍 Human-like. Outsources things that are hard for LLM to do. Significantly reduces hallucinations. For math uses calculators, for puzzles can invoke python interpreter or other models.

👍 Improved reasoning giving the LLM ability to accomplish task with super-human performance pushing LLMs towards AGI

👎 Needs all tools available as API. In case of large tools, can run into context length limitations forcing SFT which can be expensive to collect data.

Instruct Models: Charting New Courses with Human Instructions

Instruct models[4], though not a distinct architecture, are causing quite a stir. They represent a novel data paradigm that is revolutionizing our interaction with LLMs. By formulating tasks as human instructions and fine-tuning LLMs to heed these directives, we have been able to create versatile models. These models not only generalize across tasks without explicit programming but also maintain alignment with human expectations and values — an exciting prospect for the future of AI.

Needless to say, LLMs are changing reference architectures for search/recommendation engines, databases, web and app development with components like vector DB, ensemble of LLMs, tools/API, prompt engine, etc. becoming a central piece in the software stack of a production system. We will see this accelerate in next few months and new reference architectures evolve. I anticipate there will be new areas emerging related to latency optimization, cost/token optimization, training and inference which are typically topics considered after-thought to take a centre stage.

What are you working on ? what LLM and software architectures are you considering ? Feel free share it in comments.

[1] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
[2] https://arxiv.org/abs/2203.15556
[3] https://arxiv.org/abs/2210.03629
[4] https://arxiv.org/abs/2109.01652

--

--

Deepak Babu P R

Principal Scientist | ML/AI, NLP, IR and speech | love travelling, reading, trekking and photography. https://prdeepakbabu.github.io/