First Steps in LangChain: The Ultimate Guide for Beginners (part 1)

Iryna Kondrashchenko
6 min readJun 4, 2023

--

What is LangChain?

LangChain is a framework built around large language models (LLMs). It includes integrations with a wide range of systems and tools. LangChain allows chaining of various modular components, such as a PromptTemplate and LLM, to create a sequence or Chain that takes an input, processes it, and generates a response. This makes it a versatile tool that can be used in a variety of different contexts and applications. Being an open-source project, LangChain benefits from the input and contributions of a large community of developers.

LangChain incorporates seven modules: Prompts, Models, Memory, Indexes, Chains, Agents, and Callbacks. In this article I would like to describe three of them: prompts, models, and memory.

Note: In order to use LangChain with OpenAI wrapper, we will need to install and import LangChain and OpenAI packages, as well as generate OpenAI API key. Please refer to my latest post to check how the API key can be generated.

#install the LangChain package
pip install langchain
# install the OpenAI package
pip install openai
# import langchain
import langchain
#import openai
import openai

# set openai_api_key as an environment variable
import os
os.environ["OPENAI_API_KEY"] = ""

Prompts

A prompt refers to the statement or question provided to the LLM to request information. Typically, it is a short text snippet or a few sentences that convey the user’s intent or query.

The Prompts module is responsible for the creation and management of prompts.

LangChain simplifies the creation of prompts by providing a PromptTemplate. This template can be thought of as a format or pattern for the prompts, with placeholders that can be filled with specific details or examples. This approach allows reusing of prompts, which becomes especially important as the prompt length increases.
There are two types of prompt templates: text prompt templates and chat prompt templates.

Text Prompt Templates take a string text as an input.

from langchain.prompts import PromptTemplate

# create a string template with `sample_text` input variable
template = """You will provided with the sample text. \
Your task is to rewrite the text to be gramatically correct. \
Sample text: ```{sample_text}``` \
Output:
"""
# create a prompt template using above-defined template string
prompt_template = PromptTemplate.from_template(
template=template
)
# specify the `sample_text` variable
sample_text = "Me likes cats not dogs. They jumps high so much!"
# generate a final prompt by passing `sample_text` variable
final_prompt = prompt_template.format(
sample_text=sample_text
)
print(final_prompt)

Chat prompt templates take a list of chat messages as an input. Each chat message is associated with a role (e.g, AI, Human or System).

from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate

# create a string template for a System role with two input variable: `output_language` and `max_words`
system_template = """You will provided with the sample text. \
Your task is to translate the text into {output_language} language \
and summarize the translated text in at most {max_words} words. \
"""
# create a prompt template for a System role
system_message_prompt_template = SystemMessagePromptTemplate.from_template(
system_template)
# create a string template for a System role with `sample_text` input variable
human_template = "{sample_text}"
# create a prompt template for a Human role
human_message_prompt_template = HumanMessagePromptTemplate.from_template(human_template)
# create chat prompt template out of one or several message prompt templates
chat_prompt_template = ChatPromptTemplate.from_messages(
[system_message_prompt_template, human_message_prompt_template])
# generate a final prompt by passing all three variables (`output_language`, `max_words`, `sample_text`)
final_prompt = chat_prompt_template.format_prompt(output_language="English", max_words=15,
sample_text="Estoy deseando que llegue el fin de semana.").to_messages()
print(final_prompt)

Models

The Models module is a core component of LangChain. It consists of two types of models: Language Models and Text Embedding Models.

Language Models

Language Models are particularly well-suited for text generation tasks. There are two variations of language models: LLMs and Chat Models. LLMs take a text string as input and output a text string as well.

from langchain.llms import OpenAI

# initialize GPT-3.5 model, remember that temperature parameter defines randomness of the response
llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)
template = """You will provided with the sample text. \
Your task is to rewrite the text to be gramatically correct. \
Sample text: ```{sample_text}``` \
Output:
"""
prompt_template = PromptTemplate.from_template(
template=template
)
sample_text = "Me likes cats not dogs. They jumps high so much!"
final_prompt = prompt_template.format(
sample_text=sample_text
)
# generate the output by calling GPT model and passing the prompt
completion = llm(final_prompt)
print(completion)

Chat Models

Chat Models are backed by language models. They take chat messages as input and return chat messages as output. This makes them particularly suitable for conversational AI applications.

from langchain.chat_models import ChatOpenAI
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate

# initialize ChatGPT model
chat = ChatOpenAI(temperature=0)
system_template = """You will provided with the sample text. \
Your task is to translate the text into {output_language} language \
and summarize the translated text in at most {max_words} words. \
"""
system_message_prompt_template = SystemMessagePromptTemplate.from_template(
system_template)
human_template = "{sample_text}"
human_message_prompt_template = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt_template = ChatPromptTemplate.from_messages(
[system_message_prompt_template, human_message_prompt_template])
final_prompt = chat_prompt_template.format_prompt(output_language="English", max_words=15,
sample_text="Estoy deseando que llegue el fin de semana.").to_messages()
# generate the output by calling ChatGPT model and passing the prompt
completion = chat(final_prompt)
print(completion)

Text Embedding Models

Text embedding is used to represent text data in a numerical format that can be understood and processed by ML models. It is used in many NLP tasks, such as similarity measurement and text classification.

Text Embedding Models take text as input and return a list of floats. In the example below we will use a wrapper aroung OpenAI​ embedding model.

from langchain.embeddings import OpenAIEmbeddings

# initialize OpenAI embedding model
embeddings = OpenAIEmbeddings(model = "text-embedding-ada-002")
# create a text to be embedded
text = "It is imperative that we work towards sustainable practices, reducing waste and conserving resources."
# generate embedding by calling OpenAI embedding model and passing the text
embedded_text = embeddings.embed_query(text)
print(embedded_text)

Memory

The Memory module in LangChain is designed to retain a concept of the state throughout a user’s interactions with a language model. Statelessness means that language model treats each incoming query independently. However, in certain applications like chatbots, it is crucial to remember previous interactions at both short-term and long-term levels. Passing the entire conversation as a context to the language model is not an optimal solution, as the number of tokens can quickly increase and potentially exceed the allowed limit. The Memory module facilitates the storage of conversation history, addressing these challenges.

One of the simplest memory types is ConversationBufferMemory, that simply stores messages and then extracts them in a variable.

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
# saving conversation
memory.save_context({"input": "Describe LSTM"}, {
"output": "LSTM is a type of recurrent neural network architecture that is widely used for sequential and time series data processing."})
# retrieving conversation from a memory
memory.load_memory_variables({})

Apart from ConversationBufferMemory, LangChain offers various other types of Memory. Please refer to the LangChain documentation to explore and determine which memory type best suits your specific requirements.

If you are not familiar with Vector Databases, I encourage you to read the excellent article from Pinecone about them.

It is worth mentioning that multiple types of memory can be used simultaneously. Additionally, it is possible to save the conversation in a database for long-term storage and retrieval.

Conclusion

LangChain is a powerful framework that streamlines the development of AI applications. In this article, we have provided an overview of three key LangChain modules: Prompt, Model, and Memory. In the next article, we will delve into the remaining modules, expanding our understanding of the full capabilities of LangChain.

--

--