Get Started with LangChain: Your Key to Mastering LLM Pipelines🔗

Published in

The AI Espresso

11 min readMay 30, 2023

❓ Have you heard about LangChain before?

Quickly rose to fame with the boom from OpenAI’s release of GPT-3.5, LangChain became the best way to handle the new LLM pipeline due to its systematic approach to classifying different processes in the Generative AI workflow. Its popularity also comes with its inclusion and integration of other popular Large Language Models (LLMs) from companies such as Anthropic and Cohere.

In this blog, we will focus on providing a quick but in-depth analysis of the popular LangChain pipeline and applying LangChain on a small, simple request example. In future posts, we will be talking about the hidden features of LangChain and all the way to more complex service integrations.

What is LangChain?

On its first page of the documentation, LangChain has demonstrated the purpose and goal of the framework:

Data-aware: connect a language model to other sources of data
Agentic: allow a language model to interact with its environment

At its barebones, LangChain provides an abstraction of all the different types of LLM services, combines them with other existing tools, and provides a coherent language to work with all aspects of the LLM-as-a-Service pipeline. A straightforward example is the “playground” function of OpenAI.

As you see above, the playground takes in an input and completes the input as it sees fit. There can be many uses of this platform, just to name a few:

General Completion: Provide some text and ask the model to simply continue, such as story completion.
Prompt-based Zero-shot Completion: Provide an instruction to the model and ask it to return an output based on the input and whatever it sees fits based on the instruction.
Prompt-based Few-shot Completion: Provide an instruction with a few examples; then, ask the model to return an output based on the input, following the instruction and examples.
Chain-of-thought Completion: Provide a question for the model, ask the model to answer the questions, and justify or explain the process of arriving at the answer.

From a testing perspective, a playground is a useful tool to quickly get a useful response by the instruction of a user. However, from an application’s perspective, generalizing all functions into one single “submit” limits the precision and readability of the application. It is also hard to connect this platform with other existing tools, such as other public APIs.

This is where LangChain comes in. Based on the numerous possibility of LLM services, they break down the entire pipeline/services into the following modules (from least to most complex):

Models: Supported model types and integrations.
Prompts: Prompt management, optimization, and serialization.
Memory: Memory refers to the state that is persisted between calls of a chain/agent.
Indexes: Language models become much more powerful when combined with application-specific data — this module contains interfaces and integrations for loading, querying, and updating external data.
Chains: Chains are structured sequences of calls (to an LLM or to a different utility).
Agents: An agent is a Chain in which an LLM, given a high-level directive and a set of tools, repeatedly decides on an action, executes the action, and observes the outcome until the high-level directive is complete.
Callbacks: Callbacks let you log and stream the intermediate steps of any chain, making it easy to observe, debug, and evaluate the internals of an application.

How does it work?

That was a whole lot… Let’s jump right into an example as a way to talk about all these modules.

# llm
from langchain.llms import OpenAI
llm = OpenAI(temperature=0.7)
# prompt
from langchain.prompts import PromptTemplate
location_extractor_prompt = PromptTemplate(
    input_variables=["travel_request"],
    template="""
        You a travel agent AI that uses the chat_history to obtain the 
        theme to break down a {travel_request} into the start, pitstops,
        and end location of the trip. The pitstops should follow the theme
        of the trip if specified. 
        The output is a Python dictionary that contains the keys of 'start'
        (a string), 'pitstops' (a list of strings), and 'end' (a string).
        """
)
# chain
from langchain.chains import LLMChain
location_extractor_chain = LLMChain(
    llm=llm, 
    prompt=location_extractor_prompt
)
# execution
location_extractor_chain.run("I want to go to Seattle from New York City.")

# output
# \\n        trip = {\\n            "start": "New York City",
#             "pitstops": [],\\n            "end": "Seattle"\\n        }'

The above is an example of a simple LangChain code that performs as a location extractor. There are five parts to it, namely the LLM, prompt, chain, execution, and output. From a ChatGPT perspective, we can break down the above codebase as the following:

By mapping the ChatGPT result with LangChain, we can see how LangChain systematically restructures the functions in ChatGPT into a more programming-driven framework. Now that you’ve understood the general abstraction of LangChain, let’s now talk about each part one by one:

LLM

LLM refers to the selection of models from LangChain. This correlates to the simplest function in LangChain, the selection of models from various platforms. The most common model is the OpenAI GPT-3 model (shown as OpenAI(temperature=0.7)) and the OpenAI ChatGPT model (shown as ChatOpenAI(temperature=0)).

Due to the difference in structure between a general completion model (GPT-3) and a chat model (ChatGPT), the chat model would require more information in the input, from defining human input to creating a conversation flow in a list (e.g., chat([HumanMessage(content="Translate this sentence from English to French. I love programming.")]))

In addition, models accept different hyperparameters that define the actions of the model. For example, temperature controls the diversity of the results, with a higher temperature (close to 1) representing more diversity, and vice versa. And model lets you select a specific model from the LLM provider (e.g., gpt-turbo-3.5 for the ChatGPT model). Here is a list of other common parameters:

presence penalty/frequency penalty: penalizes repeated tokens/repeated tokens according to the frequency
max tokens/min tokens: limits the max/min number of tokens in the completion
num results: the number of completions generated
top P/ top K: Total probability mass/count of tokens to consider at each step to determine what to respond
streaming: Active returning of the output in sync with new input.

One thing to note is that different companies would have slightly different parameter names and may support fewer functions compared to other companies — LangChain is limited by the LLM providers.

Prompt

Prompts refer to the instruction users may command to their model. We can generally separate prompts into two types: zero-shot or few-shot prompts.

Zero-shot prompt refers to prompts that do not include any examples, just instructions. For example, the location extraction prompt above.
Few-shot prompt refers to prompts that include examples. This can help the model to know what type of response the users are looking for and give more relevant results with less diversity.

Each type of prompt has its own benefits and downsides. The general mindset is that the more examples given to the prompt, the less diversity the results would be independent of the hyperparameters, such as the temperature of the model. Therefore, if you wish the output to have a strict format of the output, make sure to provide some examples in the prompt.

To handle the need for examples, LangChain provides the ExampleSelector tool to help with smart decisions on selecting examples.

❓ Why do we need example selectors? Isn’t the more examples, the better?

Although more examples do give you more consistent results, there is a limit to how many examples you feed your model due to the input token limit of a model. In simplest terms, there is a limit to the number of words you can feed into your model without it breaking. This is because there is a constraint in the processing power used during the LLM training process. Hence, the ExampleSelector enables a couple of different variations to help you choose the best examples automatically:

Length-based: selects which examples to use based on length.
Maximal-Marginal Relevance: selects examples based on a combination of which examples are most similar to the inputs, while also optimizing for diversity.
N-Gram Overlap: selects and orders examples based on which examples are most similar to the input, according to an n-gram overlap score
Similarity: selects examples based on which examples are most similar to the inputs, based on a cosine similarity score against user-provided embeddings.

The construction of each selector is the same, here is a summarized example from the LangChain documentation:

You provide your list of examples in the following list of dictionaries format:

examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

2. Next, you put your examples into an example selector class with the prompt and other parameters :

# Prompt format of your examples
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\\nOutput: {output}",
)
# Example selector
example_selector = LengthBasedExampleSelector(
    # These are the examples it has available to choose from.
    examples=examples, 
    # This is the PromptTemplate being used to format the examples.
    example_prompt=example_prompt, 
	  # Max length of the examples 
    max_length=25, # (only applicable to length-based selector)
)

3. Finally, you add this example selector into a special class of prompt template to create a prompt for your chain.

prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\\nOutput:", 
    input_variables=["adjective"],
)

Finally, put your prompt into your chain as you would have with any other prompts. The example selector would dynamically choose whatever examples it deems useful or relevant based on the metric you’ve chosen.

In addition, due to the structural differences in chat-based models, they required a different prompt structure shown below:

You first need to decide on the prompt template for the system and humans individually.

# Prompt template for the system 
template="""You are a helpful assistant that translates {input_language} to {output_language}."""
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
# Prompt template for the human
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

2. Then, you combine these prompts using the ChatPromptTemplate class and format the chat out

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

This can then be inputted as the prompt for a chat model for further usage.

Chain

I would say chains are the fundamental unit of any LangChain application. Technically, it is the wrapper function for the prompt and model in order to create a function that takes in the input text and returns the output text. In other words, it is an abstraction of the complicated prompt pipeline that enables users to create a function that takes in some text and outputs some text similar to the OpenAI playground.

How I like to think about it is that we can imagine chains as cars. Regardless of the complicated structure of the car, the goal of a car is to take in some input (e.g. steering, current speed, type of gear, etc) and perform some output (e.g. turn left and right) based on complicated calculations on the input parameters.

Of course, we can break down a car into smaller systems that interact with each other. This is the same as LangChain chains.

The fundamental unit of these chains is called LLMChain. For a given prompt, LLM, and input, it would predict the output:

from langchain import PromptTemplate, OpenAI, LLMChain
# prompt
prompt_template = "What is a good name for a company that makes {product}?"
# llm
llm = OpenAI(temperature=0)
# chain
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template(prompt_template)
)
# predict
llm_chain("colorful socks")
# output
# {'product': 'colorful socks', 'text': '\\n\\nSocktastic!'}

However, LangChain allows you to do a lot more with this simple setup. For example, it allows you to chain the chains! Similar to the numerous system in a car, you can create a pipeline with different chains in order to complete a larger task. This can be done through the SimpleSequentialChain class:

# This is the overall chain where we run these two chains in sequence.
from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(
                  chains=[synopsis_chain, review_chain], 
                  verbose=True)

You take two chains and link the first output as the next input. In this occasion, we are linking a synopsis of a play and generating the review at the same time with the input being the title of the play. The benefit of this could be a better understanding of the play by giving the review chain more context of the play in contrast with only knowing the play title.

In addition, you can output both the synopsis and review at the same by using the SequentialChain class, which supports multiple inputs:

# This is the overall chain where we run these two chains in sequence.
from langchain.chains import SequentialChain
overall_chain = SequentialChain(
                  chains=[synopsis_chain, review_chain],
              		# supports multiple inputs
                  input_variables=["era", "title"], 
                  # Here we return multiple variables
                  output_variables=["synopsis", "review"],
                  verbose=True)

To execute this, you would need to specify the inputs in the prompt of the first class:

# This is an LLMChain to write a synopsis given a title of a play and the era it is set in.
template = 
"""
You are a playwright. Given the title of play and the era it is set in, 
      it is your job to write a synopsis for that title.
Title: {title}
Era: {era}
Playwright: This is a synopsis for the above play:"""
prompt_template = PromptTemplate(input_variables=["title", 'era'], 
                                template=template)
synopsis_chain = LLMChain(llm=llm, prompt=prompt_template, 
                            output_key="synopsis")

One thing to note is that only the first prompt accepts multiple inputs since the results of each LLM Chain is simply a string, the input of the further chains would be only one variable.

Due to the vast application of chains, we’ll have another blog post dedicated to the applications of chains other than the simplest sequential chains in the future.

Output

The last section of our pipeline is the output. Sometimes we want to alter the output of the LLM to a specific format for future usage. For example, in our location extractor example, I specify the output format in the prompt as the following:

# prompt
"""
The output is a Python dictionary that contains the keys of 
'start' (a string), 'pitstops' (a list of strings), and 'end' (a string).
"""
# output
# trip = '{
#        "start": "New York City",
#        "pitstops": [],
#        "end": "Seattle"
#}'

This allows the LLM to correct return a response in the correct format. However, with a higher temperature, it is possible that the response is incorrect. For example, rather than double quotation marks, it used single quotation marks:

# output
# trip = "{
#        'start': "New York City",
#        'pitstops': [],
#        'end': "Seattle"
#}"

This would be a problem if we are decoding this output using a JSON decoder since they only accept double quotation marks for keys. To solve this, we can either rewrite the prompt OR use the OutputFixingParser class in LangChain.

The OutputFixingParser takes another LLM with specific instructions and a simple output parser that specify the structure of the parser.

# OutputFixingParser
from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

For more information about the output parsers, refer to the documentation on LangChain here.

Conclusion

From a single request on ChatGPT, we’ve introduced the basic functions of LangChain and how it processes the same request in a more programming fashion. Furthermore, we also see the high customizability and wide possible applications of LangChain from a simple request. However, there is a lot more LangChain provides!

These are only the basics of what LangChain can do as it has the power to do a lot more, such as AutoGPT and Question and Answering. However, deep down, all the features are embedded in the “chain” structure introduced in this blog post, so this blog will prepare you for our future blog posts.

Hopefully, after reading this post, if someone asks you how LangChain works, you can confidently tell them that “LangChain is just a ‘car’ that executes your goal.”

Thanks for reading The AI Espresso! Subscribe for free to receive new posts and support our work.