Unlocking the Potential of LLMs: Content Generation, Model Invocation and Training Patterns

11 min readDec 29, 2023

Large language models (LLMs) are gaining tremendous popularity thanks to their uncanny ability to understand and generate human language. I have been working with multiple customer teams and as adoption of LLMs accelerates, more teams are looking to integrate them into real-world applications.

However, developing with LLMs involves much more than just generating text from a single prompt. There are architectural patterns emerging around how to best leverage, customize, and apply LLMs. Understanding these patterns unlocks the true capability of language models.

In this blog post, we will explore the most salient LLM patterns in three key areas:

Model Content Generation Patterns — Approaches to producing outputs from freeform text to structured data
Model Invocation Patterns — Techniques for querying LLMs in dynamic and nuanced ways at runtime
Model Training Patterns — Methods for adapting general LLMs into specialized experts

While the Content Generation Patterns are different approaches by LLMs to generate content, model invocation and model training patterns address different ways to infuse the domain specific knowledge to the models. They together address two major concerns with the LLMs

LLMs are trained point in time. They do not understand the shift in domain data (current affairs).
LLMs are trained with datasets which may not be reflective of your domain. It may not understand the domain specific terminologies or concepts

Both model invocation patterns and training patterns try to address the concerns respectively. The first one tries to inject the required data as part of the context. It doesn’t involve any training or changes to weights to model parameters. Training patterns extend the model with fine-tuning or training and involve gradient update. The degrees of complexity for training patterns is higher as compared with invocation patterns.

Lets dive in!

Content Generation Patterns

LLMs are used by applications to generate a wide range of content. When it comes to categorization of the content, it can range from total creative content with minimal guidance to extracting specific information with a high degree of guidance. Let’s look at the LLM Content Generation patterns.

Creative Text Composition

This category involves open-ended text generation, where a LLM is given minimal guidance or constraints allowing it to unleash its creativity based on the input prompt. We let the LLM to create (hallucinate) to large extent without narrow specifications on output length, format, style or content genres. This can be instrumental for tasks requiring more of open-ended contents, such as blog posts, imaginative storytelling, or creative marketing content. LLMs are trained with massive volume of data with content cutting across every possible categories- academia, web, news, research etc. They can generate content swiftly with unconventional ideas or angles. However, the content might lack human finesse or nuanced understanding in certain contexts.

Content Compilation

In this category, the LLM digests the information contained within provided context and documents, then compiles a synthesized summary reflecting the key aspects. This leverages the LLM’s ability to analyze texts and extract salient points into a coherent digest form. The LLM has degrees of freedom to create the compiled content as long as the semantic meaning is retained. Example applications are legal contract summaries, extractive summaries, from academic papers or articles, or meeting/notes compilation. LLMs excel at sifting through large volumes of information swiftly, summarizing key points accurately. As compared to humans, LLMs can process and synthesize information from multiple sources more efficiently, providing concise summaries that save time and effort.

Targeted Information Retrieval

Here the LLM is focused on retrieving specific and precise information within a provided context, this category requires a high degree of guidance. It analyzes contextual documents to extract relevant information to directly address the query, generating an informative response. Examples are traditional Q&A capabilities, precise information extraction from texts (e.g. Who is the author?), classification tasks and output in the expected format (E.g. JSON).

Cross-Domain Proficiency

This category involves generation of content in diverse domains such as translation, code generation, abstractive summarization, domain specific content generation (e.g. Generating medical report patient’s symptoms and lab records) etc. Here the LLM creates content that possesses semantic meaning in a different knowledge domain compared to the original context. This made possible again with vast amount of training data from multiple disciplines and LLM can adeptly navigate between domains across different subject matters, languages, or technical domains.

Interactive Generation

This is a unique category depicting Inversion of Control (IOC), where the LLM assumes the role of determining the flow of execution. LLM is provided with interfaces (e.g. Tools) to query the environment, make observations and take a set of actions. This powers LLMs with more advanced back-and-forth interactions. The model does not just react based on a single static prompt, but allows dynamic dialog where it requests clarifying information before formulating responses tailored to the evolving context. Prompting techniques such as ReACT and Tree of Thoughts(ToT) exemplify this capability.

It should be noted that the categories can overlap in various combinations rather than being mutually exclusive. For example, a conversational response could incorporate open-ended text generation reflecting parts of categories Content Text composition and Interactive Generation. This allows for a more flexible, accurate framework.

Model Invocation Patterns

Below provides different patterns covering model invocations. They involve with injecting domain data into the context.

Contextual Prompt Engineering

This pattern involves utilizing the LLM as is. Application injects domain specific data into the prompt. However this context data is independent of user query. This can be either statically or dynamically added. Application maintains conversation state/history. Optionally add examples (One shot/ Few shots) to the prompt. This is suitable for tasks. Domain specific data can be in your transactional store, unstructured or sourced from a third-party SaaS applications.

In AWS, LLMs based on your needs can be hosted with SageMaker or as a service with Amazon Bedrock.

Retrieval Augmented Generation (RAG)

This is one of the popular patterns introduced by this paper. The core element of this architecture is to inject necessary context into the prompt based on the user query. During the runtime, the application tries to get the documents that are semantically similar to user question into the context. This is useful when you have a larger knowledge base spread across multiple documents or data stores and you want to selectively inject the context.

The architecture can be explained in three phases.

Ingestion happens independently of the application run-time, splits the knowledge-base into smaller parts called as chunks. Based on the application requirements, you can select the chunking size and strategy. The chunks are converted to embeddings using an embedding model. These embeddings are ingested into a vector store.

During runtime, the application queries the vector store for documents matching the query and adds them to the prompt. This completed prompt is used to invoke the LLM. The model uses the prompt and the context and provides the completion.

Amazon Bedrock provides integrated support for managed knowledge base to build RAG workflows. You can point your data location, specify an embedding model, automatically creates a vector index, integrate knowledge base with a model. You can define a custom chunking strategy and select a vector store. It supports Open Search serverless, Pinecone and Redis Enterprise Cloud.

Utilizing Agents

Agents are orchestration components to build applications with complex interactions. They automatically determine the sequence of actions to take. Typically you power the agents with granting access to a set of tools. These tools provide data to agents or take actions by interfacing with external systems, data stores or APIs. They observe the response from tools and plan next set of actions. Agents also have access to memory. Agents support in orchestration frameworks like LangChain, LLamaIndex etc.

Agents can be prompted by different techniques (e.g. Chain of Thought, ReACT, Reflextion types). Amazon Bedrock provides native support for agents. Reasoning to break down user-requested tasks into multiple steps. It supports traces and visibility into the agent’s reasoning (CoT). It automatically creates a prompt template & support for customization. Of course you can leverage open source orchestration tools with Amazon Bedrock or Amazon SageMaker.

Enterprise architecture with Agents and RAG

Below provides a reference architecture for an enterprise application that uses the above mentioned patterns. Make selection of the components is based on your use case. This provides only application architecture and doesn’t cover model development, hosting & LLMOps aspects.

Prompt catalog organizes and stores prompt templates to invoke models for different use cases. The orchestration that runs the core logic of the application, coordinates workflows through the components to power an intelligent augmented search and conversation experience while guardrails, monitoring and automated ML pipelines enable governance and continuous improvement.

Model Training Patterns

Fine Tuning

This is a very popular pattern where a pre-trained model is fine-tuned to perform a specialized task or train domain specific data. Fine tuning is typically done with a limited labelled datasets (in comparison with original training dataset). Fine tuning data is prepared going through data preparation process (ETL, data normalization, clean-up etc.). During the fine tuning process, model weights learn domain specific context. Requires exploration of different fine-tuning techniques- Instruction fine tuning and PEFT (LoRA, Adapter, Pruning).

In AWS, you can use Amazon SageMaker to perform fine-tuning and employ different strategies like PEFT. Amazon Bedrock supports fine-tuning and you can build a custom model with a labeled custom dataset.

Continued pre-training

This is conceptually similar to Fine-tuning. In the case of continued pre-training, a pre-trained models is updated with domain specific corpus. You train the LLM with vast amount of unlabeled domain specific data. It expands the pre-trained model and their weights learn domain specific context. You can further fine tune the model to train the LLM for specific tasks. This can be a cost effective solution as compared with training a model from scratch. However the model can encounter catastrophic forgetting where it forgets previously learned information due shift in the data distribution.

RLHF

With Reinforcement learning with Human Feedback, you improve model precision by aligning with human feedback. Instead of providing a human curated prompt/ response pairs (as in instructions tuning), a reward model provides feedback through its scoring mechanism about the quality and alignment of the model response. This mimics a human providing feedback but cost optimized. Model generates a response to a prompt sampled from a distribution. Model’s response is scored through the reward model and based on the reward, RL policy updates the weights of the model. RL policy is designed to maximize the reward.

In addition to maximizing reward, there is another constraint added to prevent excessive divergence from the underlying model’s behavior. This is done by comparing the responses of the pre-trained model and the trained model with KL divergence score and add it as part of the objective function. A divergent model could also generate inconsistent or incoherent text.

There is another model proposed — Direct Preference Optimization, which removes the need to train a reward model.

Training a LLM from scratch

Training LLM is a significantly complex task as compared with other patterns. This is highly resource intensive. Model training can take months, requires large volumes of training data. You use a ML pipeline to source data, extract /cleanup, develop mode architecture, select hyper parameters, setup training environment resources (CPU/GPU) training instance.

The first step in training a large language model is gathering a substantial text dataset from various sources such as websites, books, and other text-rich resources. This data then undergoes preprocessing which involves cleaning the data by removing irrelevant information as well as handling any missing or inconsistent data points. The preprocessed data is then split into separate training, validation, and test sets. Additional preprocessing techniques like tokenization are also applied at this stage.

After obtaining the preprocessed text corpus, the next step focuses on training a tokenizer which breaks down the text into smaller chunks known as tokens. Typically, large language models employ a subword tokenization algorithm. The tokenizer examines the corpus to identify which subword units occur most frequently. The exact methodology underpinning the tokenizer training is determined based on the specific type of tokenizer used.

With dataset and tokenizer ready, a suitable pre-trained model architecture like GPT, BERT or others is chosen based on the task and data at hand. Decisions regarding optimal model size and other architectural details are also made. Key training hyperparameters like learning rate, batch size, number of epochs etc. are configured.

Before initiating model training, the training environment comprising hardware resources like GPUs, CPUs and storage is set up. Software frameworks and libraries facilitating distributed, parallel model training are utilized.

The model is now trained on the prepared dataset utilizing next word prediction to predict subsequent words based on previous words. Task-specific fine-tuning adapts the model to desired needs. Training progress and performance guides parameter adjustment.

After training, unseen test data evaluates model performance on metrics like quality, relevance and error checking. Comparative benchmarking against state-of-the-art models provides additional validation.

Finally, the trained model and its parameters are saved and exported to formats suitable for integration into target platforms and applications. This facilitates deployment for real-world usage.

Conclusion

LLMs are rapidly moving beyond academic curiosities into integral components across many industry verticals. However, careful selection of architecture patterns to integrate into broader enterprise systems is crucial for overcoming limitations, balance constraints and unlocking their full potential. I hope with the framework of patterns you can start building enterprise grade solutions efficiently.

Thank you for taking the time to read and engage with this article. Your support in the form of following me and sharing the article is highly valued and appreciated. The views expressed in this article are my own and do not necessarily represent the views of my employer. If you have any feedback and topics you want to cover, please reach me at https://www.linkedin.com/in/gopinathk/