3. How to invoke LLM from langchain

Terry Cho
4 min readDec 20, 2023

--

Model

Model is an object that abstracts the ChatGPT or PaLM model from Langchain. Usually, LLM models provide two models: a model for LLM functions that complete sentences (answering questions, summarizing documents, providing ideas, etc.), and a model for chatting where people interact. Langchain also provides an abstraction layer for these two models.

LLM

LLM is a model that provides answers according to the commands of the input prompt.

Provides abstraction objects for LLM models such as ChatGPT and PaLM API.

The method of creating an LLM model object varies depending on the model provider, and in particular, the tunable parameters supported vary depending on the model provider. For example, in the case of chatgpt, you can set the temperature value, and in the case of Google’s PaLM Vertex AI, you can additionally set values ​​such as temperature and Top-K/P.

For LLM models supported by Langchain, see langchain official documentationhttps://python.langchain.com/docs/integrations/llms/ Please refer to .

There are several ways to call an LLM object after creating it. The first way to simply ask a question to the LLM in a synchronous manner is to use the llm.invoke(prompt) method as follows.

# Invoke

from langchain import PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(openai_api_key="{YOUR_API_KEY}")

prompt = "What is famous street foods in Seoul Korea in 200 characters"
llm.invoke(prompt)

In the previous example, when calling LLM, only one prompt was simply called synchronously, but there can be various calling patterns, such as calling multiple prompts simultaneously or calling asynchronously. Langchain supports these various calling patterns. (Note. Supported call patterns may vary depending on the LLM being supported.https://python.langchain.com/docs/integrations/llms/ Be sure to check the supported calling patterns in advance.)

Batch

First, in Batch, if you need to issue multiple questions or commands, you can call multiple questions at once by saving the prompt list in a Python list and then using a batch call, rather than looping.

# Batch
prompts = [
"What is top 5 Korean Street food?",
"What is most famous place in Seoul?",
"What is the popular K-Pop group?"
]
llm.batch(prompts)

The following is the call result.

['\n\n1. Tteokbokki (Spicy Rice Cakes)\n2. Odeng (Fish Cake Skewers)\n3. Sundae (Korean Blood Sausage)\n4. Gimbap (Korean Sushi Roll)\n5. Jumping (Steamed Rice Cakes)',
'\n\nOne of the most famous places in Seoul is Gyeongbokgung Palace, the main royal palace of the Joseon Dynasty. It is considered one of the most iconic landmarks in the city and is a popular tourist destination.',
'\n\nThe most popular K-Pop group is BTS (Bangtan Boys). Other popular K-Pop groups include EXO, Blackpink, Twice, Red Velvet, and NCT.']

Streaming

Next, there are streaming calls, which can have slow response times when the model size is large or the prompt is large. So, it may take a lot of time to wait for all these responses and output the results. To solve this problem, Langchain supports streaming patterns. Responses can be returned in real time by streaming as they are generated. It is mainly used in applications that require real-time performance, such as chatbots.

The following is an example call. If you use stream() instead of invoke(), you can see that it continues to return results. (If you run the example, you can see that the sentences are output sequentially, like an animation.)

prompt = "What is famous street foods in Seoul Korea in 200 characters"
for chunk in llm.stream(prompt):
print(chunk, end="", flush=True)

Asynchronous call

These three methods, invoke, batch, and streaming, can be invoked not only synchronously but also asynchronously.

Below is an example code that compares the call times by calling the same prompt 10 times asynchronously using ainvoke and 10 times using invoke. (For reference, the code below was written and executed in a Jupyter notebook.)

import asyncio
import time

from langchain.llms import OpenAI

llm = OpenAI(openai_api_key="{YOUR_API_KEY}")
prompt = "What is famous Korean food? Explain in 50 characters"
# Async call
async def invoke_async(llm):
result = await llm.ainvoke(prompt)
print(result)

async def invoke_parallel():
tasks = [invoke_async(llm) for _ in range(10)]
await asyncio.gather(*tasks)

start_time = time.perf_counter()
await invoke_parallel()
end_time = time.perf_counter()
print("Async execution time:" , (end_time-start_time))

# Sync call
start_time = time.perf_counter()
for i in range(10):
result = llm.invoke(prompt)
print(result)
end_time = time.perf_counter()
print("Sync execution time:" ,(end_time-start_time))

When using the asynchronous API, the calls are made in parallel at the same time, so you can see that the call speed is faster at 1.9 seconds compared to 5.8 seconds for synchronous.

Kimchi, a fermented cabbage dish, is one of the most famous Korean foods.


Korean BBQ, kimchi, bibimbap, and japchae are all famous Korean dishes.
.......

Kimchi, a spicy fermented cabbage dish, is the most famous Korean food.
Async execution time: 1.911825390998274

Kimchi, a spicy fermented cabbage dish, is a famous Korean food.


Kimchi, a fermented vegetable dish, is one of the most famous Korean foods.
........

Kimchi, a spicy fermented cabbage dish, is the most famous Korean food.
Sync execution time: 5.830645257010474

Most of you may know the concepts of synchronous and asynchronous calls, but to summarize, synchronous calls are a calling method that requires a response when calling an API before entering the next code. In other words, if 10 APIs are called, a response to each API must be received before the next call can be made. On the other hand, asynchronous calls make a call without waiting for a response. While calling 10 APIs, proceed directly to the next code without waiting for a response. In the code above, await was used to wait until a response came.

<Figure. LLM model calling concept through synchronous/asynchronous calling>

If expressed graphically, an asynchronous call is a pattern on the left that calls 10 APIs at the same time, and a synchronous call is a pattern that calls each API one by one and receives a response.

--

--