LLM Bootcamp Notes — Part 2: Augmented Language models

Anirudh Gokulaprasad
5 min readSep 16, 2023

--

The LLM Bootcamp series by FullStackDeepLearning offers great insights into the world of Generative AI by taking a very structured approach to the topic. The series goes from introducing LLMs all the way upto approaches and recommendation for production grade LLMs & Generative AI solutions.

I have consolidated my learning notes in a series of 4 articles, distilling to the 4 core focus areas for LLMs from the Bootcamp series — This part 2 article is on Augmented Language models.

What can Language models do?

LLMs are designed for general reasoning, not specific knowledge. Therefore, when it comes to any highly specialised or specific queries, they don’t tend to perform well — So what are the things they are good/bad at?

What are LLMs good at?

  • Understanding the language — this includes semantics, sentence patterns, entities and anything linguistic from a language standpoint
  • Following a defined instruction
  • Basic reasoning queries
  • Understanding a piece of code

What are LLMs bad at?

  • Up-to-date knowledge — the knowledge of the LLM is limited to its training data. Anything that happened after that point, the LLM does not know
  • Knowledge of external data — Any data that the model is not trained on can be considered as external data
  • Challenging reasoning — For Eg. an aptitude/olympiad problem
  • Interacting with the real-world — As no native live connection exists between the model and the real-world

One way to improve the weaknesses of LLMs is by leveraging their context window — Context window is the no.of tokens that models can take in a single exchange.

Context window is a way to give these models relevant or up-to-date information for the model to use in order to answer a specific question. However, the context window is of limited size. Even if they are increasing recently from 4K tokens all the way upto 100K tokens, this window can still be very small in the context of tweets, emails etc.

How to make the most of a limited context window by augmenting the language model?

  • Retrieval Augmentation— a data corpus to use as the baseline for the model
  • Chains — augment with more LLM calls
  • Tools — Augment with outside sources

Retrieval Augmentation

Why retrieval augmentation? — If we want our models to have access to data about users, one way is to put the data as part of the query or retrieve the the relevant users from a document or database, based on the query if the no.of users are huge.

Traditional information retrieval

  • Query: Formal statement of information needed
  • Object: Entity inside your content collection — EG: document
  • Relevance: Measure how well an object satisfies the information required
  • Ranking: Order of the relevant results based on the desirability

Limitations of sparse traditional search:

  • Only models simple word frequencies — complex relationships are difficult to model
  • Does not capture the semantics of the document or query, instead gives a heuristic based answer

AI-based retrieval using embeddings

Search and AI make each other better. This means that the search can give better information in the context while AI can create better representations of the data.

Embeddings:

Embeddings are an abstract, dense, compact, fixed-size, learned representation of data. This could represent any data. For Eg. Tabular, Audio, Text etc and be stored in a concise and maintainable way

What makes a good embedding?

  • Utility for downstream tasks — use for specific task or a broad benchmark
  • Similar things must be close together in vector space

Embeddings to know:

  • Word2Vec : The original embedding — tries to predict the word based on the context of its surrounding words
  • Sentence transformer: They are extracted from a transformer-based architecture and can give solid baseline embeddings. They are cheap and fast. Eg: S-Bert embeddings
  • A multi-modal option: CLIP — a good option for multi-model image-text usecases
  • Current preferred option: OpenAI Embeddings — They give near state-of-the-art (SOTA) performance. Use “text-embedding-Ada-002”
  • SOTA: Instructor — prepend the task description to the text and then embed it. At the embedding time, describe your task to get task-specific embeddings

Finding relevant objects with embeddings

We have all the embeddings that are required for the usecase. But how do we know if the retrieved embeddings are relevant to the question or task at hand? That where the similarity scores come in. These can be used to compute the similarity score between two embeddings to measure how similar or far apart they are.

Embedding Databases

Searching over vectors are not great for production — therefore, we can use an embedding database to do the heavy lifting as a regular DB might work more simple use-cases but may not scale for complicated queries or the highest scale.

The dream/ideal workflow is to:

  • Dump a bunch of data → Ask a query → Retrieve relevant documents

The challenges here are:

  • Dumping in data — scale & reliability of DB, Doc splitting & embedding management
  • Ask a query — the language of the query
  • retrieve relevant docs — search algorithm to use

There are many purpose-built embedding vector databases such as DeepLake, Pinecone, Weaviate, ChromaDB etc that solve a lot these pain points.

Chains

Retrieval based models have a limitation — if the answer for the query is not within the retrieved documents, then the model cannot give relevant answer to the question. Also, Sometimes the best context for your LLM does not exist directly in the corpus. Instead, the best context might be from the output of another LLM.

In such cases, Chaining can be used where the output of one LLM can be an input for another LLM.

There are multiple patterns for building the chains:

  • The QA pattern — is to answer the questions based on the retrieved embeddings which gives the similar/relevant documents to consider
  • Hypothetical Document embeddings (HyDE) — generates a hypothetical base document that will be used as the reference for rest of the QA Chain
  • Summarization — get a summary of the retrieved documents as the output.

Tools

Tools are powerful way of integrating existing purpose-built capabilities with the LLMs. Different tools can be integrated with the LLM like Wikipedia, PyPDF, Search engine etc

LangChain is an excellent framework for building chaining-based and tool-based applications

Chains Vs Plugins:

  • In a chaining, we design the interaction pattern between language model and tool by passing queries through a series of steps — chains are good for reliability and consistency in problem-solving
  • In a plugin-based approach, the language model gets to decide whether to use a tool or not — plugins are good for flexibility and interactivity.

Footnotes

To read the part 1 article of this series— Click here

To read the part 3 article of this series — Click here

To read the part 4 article of this series — Click here

Note — The article is a distilled consolidation of my understanding of the topic. If you find any conceptual errors, please leave a feedback so that I can fix it. Cheers!

References:

--

--