LLM Study Diary: Comprehensive Review of LangChain — Part 2

Today, we’ll continue working through the remaining recipes from Greg Kamradt’s ‘The LangChain Cookbook: 7 Core Concepts’ as seen in his YouTube video. Our goal is to get these recipes running with the latest version of LangChain.

Text Embedding Model

The langchain.embeddings module imported in the original code below has been deprecated:

from langchain.embeddings import OpenAIEmbeddings

So, I rewrote it to use the langchain_openai module like this:

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

With this change, the following code also works without any issues:

text = "Hi! It's time for the beach"
text_embedding = embeddings.embed_query(text)
print (f"Here's a sample: {text_embedding[:5]}...")
print (f"Your embedding is length {len(text_embedding)}")

The output we get looks like this, showing that the text has been converted into vector data:

Here’s a sample: [-0.0001933947418589633, -0.0030791846713453044, -0.00105463973203673, -0.019258900286671155, -0.015191653342505319]… Your embedding is length 1536

Prompts — Text generally used as instructions to your model

PromptTemplate is something we’re certain to use heavily when working with LangChain. The original code worked with the gpt-3.5-turbo-instruct model by changing the OpenAI import from langchain.llms module to langchain_openai, but it lost compatibility with newer models.

# from langchain.llms import OpenAI
from langchain_openai import OpenAI
from langchain import PromptTemplate

Due to incompatibility with v1/chat/completions, after some research, I decided to abandon the OpenAI class and PromptTemplate class. Instead, I achieved similar results using the ChatOpenAI class and ChatPromptTemplate as follows:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

chat = ChatOpenAI(model_name="gpt-4o", openai_api_key=openai_api_key)

# Notice "location" below, that is a placeholder for another value later
human_template = """
I really want to travel to {location}. What should I do there?

Respond in one short sentence
"""

prompt = ChatPromptTemplate.from_messages([("human", human_template)])
chain = prompt | chat

result = chain.invoke(
{
"location": "Rome"
}
)
print(result.content)

When executed, it provides recommended spots in Rome like this:

Explore the ancient Colosseum, the Vatican Museums, and enjoy authentic Italian cuisine in local trattorias.

The great thing about ChatPromptTemplate is that by simply changing the value of “location” in the template like this:

result = chain.invoke(
{
"location": "Tokyo"
}
)
print(result.content)

We can easily get OpenAI to recommend places for a different location:

Explore the historic temples, bustling markets, and cutting-edge technology districts in Tokyo for a truly unique experience.

When building systems, there are many scenarios where we reuse the same template by just replacing keywords. Therefore, I think we’ll find ourselves using ChatPromptTemplate very frequently.

Example Selectors

I had to fundamentally change this sample because the OpenAI class and FewShotPromptTemplate are no longer compatible with the latest models. While searching for an alternative to FewShotPromptTemplate, I found FewShotChatMessagePromptTemplate here:

Fortunately, there were two samples of Example Selectors right there, so I used them to recreate the same functionality. Even in this sample, there were some parts that didn’t work, so I fixed those and got it running.

Implementation without Using Similarity Search

A simpler implementation for studying Example Selectors is one that generates answers based on example sentences without using similarity search.

from langchain_core.prompts import (
FewShotChatMessagePromptTemplate,
ChatPromptTemplate
)

examples = [
{"input": "pirate", "output": "ship"},
{"input": "pilot", "output": "plane"},
{"input": "driver", "output": "car"},
{"input": "tree", "output": "ground"},
{"input": "bird", "output": "nest"},
]

example_prompt = ChatPromptTemplate.from_messages(
[('human', '{input}'), ('ai', '{output}')]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
examples=examples,
# This is a prompt template used to format each individual example.
example_prompt=example_prompt,
)

final_prompt = ChatPromptTemplate.from_messages(
[
('system', 'You are a helpful AI Assistant'),
few_shot_prompt,
('human', '{input}'),
]
)
print('==== check final prompt with input data ====')
print(final_prompt.format(input="fish"))
print('============================================')
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(model_name="gpt-4o", openai_api_key=openai_api_key)

chain = final_prompt | chat
result = chain.invoke({"input": "fish"})
print(result.content)

This approach gives the AI ‘examples’ of pairs like pirate and ship, pilot and plane, driver and car, tree and ground, bird and nest, and asks it to respond to user input using a ‘similar rule’. We provide examples and example_template to FewShotChatMessagePromptTemplate, which is then used to generate the final_prompt. This is combined with ChatOpenAI to create a chain. When we input {“input”: “fish”} to this chain, we get the result:

water

This shows that the AI understood the pattern from pirate->ship, pilot->plane, driver->car, tree->ground, bird->nest combinations and deduced water as the most appropriate word for fish.

Example Selectors were the hardest for me to grasp when I first studied langchain. In the code above, this line outputs the final_prompt to the log using the format function:

print(final_prompt.format(input="fish"))

Using this format function, we can see what the final_prompt actually looks like. The output is as follows:

System: You are a helpful AI Assistant
Human: pirate
AI: ship
Human: pilot
AI: plane
Human: driver
AI: car
Human: tree
AI: ground
Human: bird
AI: nest
Human: fish

Looking at this, we can clearly see how langchain provides the example sentences and the final user input to the OpenAI model, which should help clarify how it works.

Implementation Using Similarity Search

This time, I tried to achieve the same thing using the SemanticSimilarityExampleSelector class with the same example sentences we used earlier. I based this on the second sample provided in the FewShotChatMessagePromptTemplate specifications, just like the previous example.


from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma


examples = [
{"input": "pirate", "output": "ship"},
{"input": "pilot", "output": "plane"},
{"input": "driver", "output": "car"},
{"input": "tree", "output": "ground"},
{"input": "bird", "output": "nest"},
]

to_vectorize = [
" ".join(example.values())
for example in examples
]
# embeddings = OpenAIEmbeddings()
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

vectorstore = Chroma.from_texts(
to_vectorize, embeddings, metadatas=examples
)
example_selector = SemanticSimilarityExampleSelector(
vectorstore=vectorstore
)

#from langchain_core import SystemMessage
from langchain.schema import HumanMessage, SystemMessage, AIMessage
# from langchain_core.prompts import HumanMessagePromptTemplate
from langchain_core.prompts import HumanMessagePromptTemplate, AIMessagePromptTemplate, SystemMessagePromptTemplate
from langchain_core.prompts.few_shot import FewShotChatMessagePromptTemplate

few_shot_prompt = FewShotChatMessagePromptTemplate(
# Which variable(s) will be passed to the example selector.
input_variables=["input"],
example_selector=example_selector,
# Define how each example will be formatted.
# In this case, each example will become 2 messages:
# 1 human, and 1 AI
example_prompt=(
HumanMessagePromptTemplate.from_template("{input}")
+ AIMessagePromptTemplate.from_template("{output}")
),
)
# Define the overall prompt.
final_prompt = (
SystemMessagePromptTemplate.from_template(
"You are a helpful AI Assistant"
)
+ few_shot_prompt
+ HumanMessagePromptTemplate.from_template("{input}")
)
# Show the prompt
print(final_prompt.format_messages(input="fish"))

# Use within an LLM
# from langchain_core.chat_models import ChatAnthropic
# chain = final_prompt | ChatAnthropic()
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(model_name="gpt-4o", openai_api_key=openai_api_key)
result = chain.invoke({"input": "fish"})
print(result.content)

The changes from the sample code include modifying the import sources due to changes in the modules of the following classes:

# from langchain_core.prompts import SemanticSimilarityExampleSelector
# from langchain_core.embeddings import OpenAIEmbeddings
# from langchain_core.vectorstores import Chroma
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

Additionally, we changed the modules for the classes below, and added import statements for some that were missing:

#from langchain_core import SystemMessage
from langchain.schema import HumanMessage, SystemMessage, AIMessage
# from langchain_core.prompts import HumanMessagePromptTemplate
from langchain_core.prompts import HumanMessagePromptTemplate, AIMessagePromptTemplate, SystemMessagePromptTemplate

We modified the OpenAIEmbeddings class initialization to pass the API key:

# embeddings = OpenAIEmbeddings()
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

Finally, we changed the model used from Anthropic to OpenAI as follows:

# from langchain_core.chat_models import ChatAnthropic
# chain = final_prompt | ChatAnthropic()
from langchain_openai import ChatOpenAI
chat = ChatOpenAI(model_name="gpt-4o", openai_api_key=openai_api_key)

We then used the same example data as before:

examples = [
{"input": "pirate", "output": "ship"},
{"input": "pilot", "output": "plane"},
{"input": "driver", "output": "car"},
{"input": "tree", "output": "ground"},
{"input": "bird", "output": "nest"},
]

When we ran it with “fish” as input like this:

result = chain.invoke({"input": "fish"})

We got the same answer:

water

This time, as before, the AI understood the pattern from the example data and deduced “water” as the most appropriate word for “fish”, but the method is slightly different.

print(final_prompt.format_messages(input="fish"))

By outputting the final_prompt using the format_messages() function midway through, we can see that the content looks like this.

[SystemMessage(content=’You are a helpful AI Assistant’), HumanMessage(content=’bird’), AIMessage(content=’nest’), HumanMessage(content=’bird’), AIMessage(content=’nest’), HumanMessage(content=’bird’), AIMessage(content=’nest’), HumanMessage(content=’bird’), AIMessage(content=’nest’), HumanMessage(content=’fish’)]

This time, you can see that during the creation of final_prompt, the example_selector searches for the most semantically similar input to ‘fish’ from the example data and uses only that example. In this case, SemanticSimilarityExampleSelector extracts only ‘bird’, which is closest to ‘fish’, from the data, and provides the AI with just the bird -> nest example (and it does so four times). Finally, it asks the AI what would come after ‘fish’.

The advantage of using SemanticSimilarityExampleSelector is that when you have a large dataset of examples with a lot of variation, you can make the AI consider only the examples that are semantically close to the user’s query.

Thank you for reading. I hope you found this post informative. Your interest motivates me to continue sharing my knowledge in AI and language technologies.

For any questions about this blog or to inquire about OpenAI API, LLM, or LangChain-related development projects with our company (Goldrush Computing), feel free to reach out to me directly at:

mizutori@goldrushcomputing.com

As a Japanese company with native speakers, my company specializes in creating prompts and RAG systems optimized for Japanese language and culture. If you’re looking for expertise in tailoring AI solutions for the Japanese market or Japanese-language applications, we’re ideally positioned to assist. Please don’t hesitate to reach out for collaborations or projects requiring Japan-specific AI optimization.

--

--

Taka Mizutori
LLM Study Diary: A Beginner’s Path Through AI

Founder and CEO of Goldrush Computing Inc (https://goldrushcomputing.com). Keep making with Swift, Kotlin, Java, C, Obj-C, C#, Python, JS, and Assembly.