Skander Tlili
11 min readApr 27, 2024

--

LangChain + OpenAI | Models, Prompts and Output Parsers Tutorial

Prerequesites

In order to to laverage this post you need to :

  • Have a prior experience with OpenAI API and get your own API key
  • Be familiar with jupyter notebooks
  • To install these dependencies :
! pip install openai
! pip install langchain

Overview

In This Post, we’ll be covering models, prompts, and parsers.

  • Models : refers to the language models underpinning a lot of it.
  • Prompts : refers to the style of creating inputs to pass into the models.
  • Parsers : is on the opposite end. It involves taking the output of these models and parsing it into a more structured format so that you can do things downstream with it.

So when you build an application using an LLM, they’ll often be reusable models. We have repeatedly prompted model, pauses outputs, and so LanChain gives an easy set of abstractions to do this type of operation. So with that, let’s jump in and take a look at models, prompts, and pauses.

Problem Statement

Let’s import the package we need for our development :

import os
import openai

openai.api_key = "Your API key"

This is a helper function that will call ChatGPT or technically the model GPT 3.5 Turbo to give you an answer back like this.

def get_completion(prompt, model="gpt-3.5-turbo"):
messages = [{"role": "user", "content": prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0,
)
return response.choices[0].message["content"]

we want to automate the process of translating emails to simple english with respect to a certain style.

let’s say you get an email from a customer in a language other than English. In order to make sure this is accessible, the other language I’m going to use is the English pirate language.

so we need to define the following variables :

# an example of an email to be translated
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

# the style used for translation
style = """American English \
in a calm and respectful tone
"""

# the prompt containing to instruction passed to the model
prompt = f"""Translate the text \
that is delimited by triple backticks
into a style that is {style}.
text: ```{customer_email}```
"""

print(prompt)
Translate the text that is delimited by triple backticks 
into a style that is American English in a calm and respectful tone
.
text: ```
Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse,the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!
```

And so what we’ve done is asking this LLM to translate the text to American English in a calm and respectful tone. So we’ve set style to American English in a calm and respectful tone. And so in order to actually accomplish this, if you’ve seen a little bit of prompting before, I‘ve specified the prompt using an F string with the instructions, translate the text that is delimited by triple backticks into style that is style and then plug in these two styles. And so this generates the prompt above.

Lets inspect the model response :

get_completion(prompt)
'I am quite upset that my blender lid came off and caused my smoothie to splatter all over my kitchen walls. Additionally, the warranty does not cover the cost of cleaning up the mess. Would you be able to assist me at this time, please? Thank you kindly.'

So if you have different customers writing reviews in different languages, you will have to genereate whole sequence of models to generate such translations, so using f”” as illustrated before won’t be an effective way to do task.

Note: LLM’s do not always produce the same results. When executing the code in your notebook, you may get slightly different answers

Chat API : LangChain

Lets do the same task but with more convenient way using LangChain

from langchain.chat_models import ChatOpenAI

# To control the randomness and creativity of the generated
# text by an LLM, use temperature = 0.0
chat = ChatOpenAI(temperature=0.0, model="gpt-3.5-turbo")

Prompt template

First of all we need to define a template string that is going to be used for all examples.

This string contains the instruction passed to model to perform the task along with 2 variables style and text that depends on each example :

template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""

And to repeatedly reuse this template (wich is the purpose), Let’s instantiate a prompt_templtae object from Langchain’s ChatPromptTemplate

from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(template_string)

From the prompt template, you can actually extract the original prompt

prompt_template.messages[0].prompt
PromptTemplate(input_variables=['style', 'text'], output_parser=None, partial_variables={}, template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n', template_format='f-string', validate_template=True)

it realizes that this prompt has two input variables, the style and the text, which were shown here with the curly braces

prompt_template.messages[0].prompt.input_variables

Now let’s specify the style. This is a style that I want the customer message to be translated to. So I’m gonna call this customer style. And here’s my same customer email as before. And now, if I create customer messages, this would generate the prompt. and will pass this a large language more than a minute to get a response. So if you want to look at the types, the custom message is actually a list. And if you look at the first element of the list, this is more or less that prompts that you would expect this to be creating. Lastly, let’s pass this prompt to the LLM.

customer_style = """American English \
in a calm and respectful tone
"""

customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

customer_messages = prompt_template.format_messages(
style=customer_style,
text=customer_email)

Let’s see the data structure of customer_messages object :

print(type(customer_messages))
print(type(customer_messages[0]))
<class 'list'>
<class 'langchain.schema.HumanMessage'>
print(customer_messages[0])
content="Translate the text that is delimited by triple backticks into a style that is American English in a calm and respectful tone\n. text: ```\nArrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!\n```\n" additional_kwargs={} example=False

Let’s pass the prompt template to our llm model :

# Call the LLM to translate to the style of the customer message
chat(customer_messages)
I'm really frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie! To add insult to injury, the warranty doesn't cover the cost of cleaning up my kitchen. Can you please help me out, friend?

And the cool thing about that, is that we can use the same template for multiple tasks, we just need to adjust style and text parameter based our needs

Why templates ?

You may be wondering, why are we using prompt templates instead of an f-string ?
The answer is that as you build sophesticated apps, prompts can be quite long, detailed and repeatetive.

And so prompt templates are a useful abstraction to help you reuse good prompts when you can.

This is an example of a relatively long prompt to grade a student’s submission for an online learning application. And a prompt like this can be quite long in which you can ask the LLM to first solve the problem and then have the output in a certain format and output in a certain format. And wrapping this in a langchain prompt makes it easier to reuse a prompt like this.

Also, LanChain provides prompts for some common operations such as summarization or question answering or connecting to SQL databases or connecting to different APIs. And so by using some of LanChain’s built-in prompts, you can quickly get an application working without needing to engineer your own prompts.

Output Parsers

One other aspect of LanChain’s prompt libraries is that it also supports output pausing, which we’ll get to in a minute. But when you’re building a complex application using an LLM, you often instruct the LLM to generate its output in a certain format, such as using specific keywords.

This example on the left illustrates using an LLM to carry out something called chain of thought reasoning using a framework called the React framework. But don’t worry about the technical details, but the keys of that is that the thought is what the LLM is thinking because by giving an LLM space to think, it can often get to more accurate conclusions. Then action as a keyword to carry the specific action and then observation to show what it learned from that action and so on. If you have a prompt that instructs the LLM to use these specific keywords, thought, action, and observation, then this prompt can be coupled with a parser to extract out the text that has been tagged with these specific keywords. And so that together gives a very nice abstraction to specify the input to an LLM, and then also have a parser correctly interpret the output that the LLM gives.

And so with that, let’s take a look at how you can have an LM output JSON and use LanChain to parse that output. And the running example that I’ll use will be to extract information from a product review and format that output in adjacent format.

So here’s an example of how you would like the output to be formatted.

{
"gift": False,
"delivery_days": 5,
"price_value": "pretty affordable!"
}

here is an example of customer review as well as a template to try to get to that JSON output.

customer_review = """\
This leaf blower is pretty amazing. It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

So the review template asks the LLM to take as input a customer review and extract these three fields and then format the output as JSON with the following keys.

so here’s how you can wrap this in langchain

from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)

let’s create the messages to pass to the OpenAI endpoint

messages = prompt_template.format_messages(text=customer_review)
chat = ChatOpenAI(temperature=0.0, model=llm_model)
response = chat(messages)
print(response.content)
{
"gift": true,
"delivery_days": 2,
"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}

But note that if we check the type, the response this is actually a string so it looks like json and looks like it has key value pairs but it’s actually not a dictionary this is just one long string

type(response.content)
str

So let’s see how we would use lanche’s um parser in order to do this.

Parse the LLM output string into a Python dictionary

We’re going to import response schema and structured output parser from LanChain.

from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

And I’m going to tell it what I wanted to parse by specifying these response schemas.

gift_schema = ResponseSchema(name="gift",
description="Was the item purchased\
as a gift for someone else? \
Answer True if yes,\
False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
description="How many days\
did it take for the product\
to arrive? If this \
information is not found,\
output -1.")
price_value_schema = ResponseSchema(name="price_value",
description="Extract any\
sentences about the value or \
price, and output them as a \
comma separated Python list.")

response_schemas = [gift_schema,
delivery_days_schema,
price_value_schema]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

Now that I’ve specified the schema for these, LandChain can actually give you the prompt itself by having the output parser tell you what instructions it wants you to send to the LLM

format_instructions = output_parser.get_format_instructions()
print(format_instructions)
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
"gift": string // Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
"delivery_days": string // How many days did it take for the product to arrive? If this information is not found, output -1.
"price_value": string // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
```

So as we can, it has a pretty precise set of instructions for the LLM that will cause it to generate an output that the output parser can process.

So here’s a new review template and the review template includes the format instructions that langchain generated and so we can create a prompt from the review template too and then create the messages that will pass to the OpenAI endpoint.

review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""

prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review,
format_instructions=format_instructions)

If you want, you can take a look at the actual prompt which gives the instructions to extract The fee is gift, delivery days, price value.

print(messages[0].content)

let’s take a look at what response we got after calling the API endpoint :

response = chat(messages)
print(response.content)
```json
{
"gift": true,
"delivery_days": "2",
"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}
```

And now if we use the output parser that we created earlier, you can then parse this into an output dictionary,

output_dict = output_parser.parse(response.content)
output_dict
{'giftCTrue,
'delivery_days': '2',
'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}

And notice that this is of type dictionary, not a string.

type(output_dict)
dict

which is why I can now extract the value associated with the key

output_dict.get('delivery_days')
'2'

So this is a nifty way to take your LLM output and parse it into a Python dictionary to make the output easier to use in downstream processing.

Conclusion

With these tools, hopefully you’ll be able to reuse your own prompt templates easily, share prompt templates with others that you’re collaborating with, even use LineChain’s built-in prompt templates, which, as you just saw, can often be coupled with an output parser so that the input prompt to output in a specific format and then the parser pauses that output to store the data in a Python dictionary or some other data structure that makes it easy for downstream processing. I hope you find this useful in many of your applications.

--

--

Skander Tlili

Easy Data Science ! I try to explain Data Science concepts and techniques using simple code snippets