LangChain Officially Supports OpenVINO™ Now!

Published in

OpenVINO-toolkit

4 min readMay 17, 2024

Author: Ethan Yang

LangChain is a powerful framework designed to help developers build end-to-end applications using language models. It provides a suite of tools, components, and interfaces that simplify the process of creating applications powered by large language models (LLMs). With LangChain, developers can easily build a high-level application such as a RAG or Agent pipeline. Now we can directly call OpenVINO™-based components in LangChain, including LLM, Text Embedding, and Reranker. This integration will help to improve the performance of local RAG and Agent services.

Installation:

In addition to regular LangChain installation steps, to call OpenVINO™ in LangChain, you only need to install the Optimum-intel library. Optimum-intel already includes all dependencies of OpenVINO™ such as model converter, runtime, and NNCF.

pip install langchain 

pip install --upgrade-strategy eager "optimum[openvino,nncf]"

LLM

A large language model is the core model component of the LangChain framework, which can generate final answers in an RAG system or make plan and call tools in an Agent system. We add OpenVINO™ as a backend in HuggingFace Pipeline and reuse its code directly, so developers can initialize the LLM with OpenVINO™ in HuggingFace Pipeline of LangChain in the following ways, where the model_id can be a model ID of HuggingFace, a local PyTorch model path or an OpenVINO model path:

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline 
 
ov_config = {"PERFORMANCE_HINT": "LATENCY", "NUM_STREAMS": "1", "CACHE_DIR": ""} 
 
ov_llm = HuggingFacePipeline.from_model_id( 
    model_id="gpt2", 
    task="text-generation", 
    backend=" openvino", 
    model_kwargs={"device": "CPU", "ov_config": ov_ config}, 
    pipeline_kwargs={"max_new_tokens": 10}, 
)

After creating the OpenVINO™ LLM model object, we can deploy the inference task as other LLM components in LangChain.

from langchain_core.prompts import PromptTemplate 
 
template = """Question: {question} 
 
Answer: Let's think step by step.""" 
prompt = PromptTemplate.from_template(template) 
 
chain = prompt | ov_llm 
 
question = "What is electroencephalography?" 
 
print(chain.invoke({"question": question}))

If you want to deploy LLMs on Intel’s GPUs, you can specify model_kwargs={“device”: “GPU”} to run inference on it. In addition, the model can also be exported to a local folder through Optimum-intel’s command-line tool, which can directly export the model with INT4 weights.

optimum-cli export openvino --model gpt2  --weight-format int4 ov_model_dir

More information about the OpenVINO™ LLM component and how to use it can be found at:

https://python.langchain.com/v0.1/docs/integrations/llms/openvino/

Text Embedding

The Text Embedding model is used to convert the text into feature vectors which can be further used to create a retriever based on text similarity. This model has been widely used in the RAG system, and it is expected to generate the top k candidate context from the Text Embedding task. Text Embedding model can be exported by feature-extraction task through Optimum-intel:

optimum-cli export openvino — model BAAI/bge-small-en — task feature-extraction

In LangChain, we can deploy the traditional BERT Embedding model and the BGE-based Embedding model through OpenVINOEmbeddings and OpenVINOBgeEmbeddings classes, the following step is a BGE embedding model example:

model_name = "BAAI/bge-small-en" 
model_kwargs = {"device": "CPU"} 
encode_kwargs = {"normalize_embeddings": True} 
ov_embeddings = OpenVINOBgeEmbeddings( 
    model_name_or_path=model_name, 
    model_kwargs=model_kwargs, 
    encode_kwargs=encode_kwargs, 
) 

embedding = ov_embeddings.embed_query("hi this is harrison")

More information about the OpenVINO™ Embedding component and how to use it can be found at:

https://python.langchain.com/v0.1/docs/integrations/text_embedding/openvino/

Reranker

Reranker is a sort of text classification model, through which we can get a list of the similarities between each candidate context and the query, and after sorting it, we can further filter the context in the RAG system. Reranker model can be exported through the text-classification task in Optimum-intel:

optimum-cli export openvino --model BAAI/bge-reranker-large --task text-classification

In the process of model deployment, an OpenVINO™ -based Renrank task can be created by the OpenVINOReranker class and called by ContextualCompressionRetriever to compress the search results of the retriever. For example, in the following example, we will reorder the top k search results of the retriever and select the top four results according to the similarity to Query, to further compress input prompt length.

model_name = "BAAI/bge-reranker-large" 

ov_compressor = OpenVINOReranker(model_name_or_path=model_name, top_n=4) 
compression_retriever = ContextualCompressionRetriever( 
    base_compressor=ov_compressor, base_retriever=retriever 
)

More information about the OpenVINO Reranker component and how to use it can be accessed:

https://python.langchain.com/v0.1/docs/integrations/document_transformers/openvino_rerank/

Conclusion

OpenVINO™-based model tasks have been integrated into the LangChain framework, allowing developers to improve the inference performance of key models’ tasks with LangChain more conveniently.

Resources

Example of RAG based on LangChain and OpenVINO:

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-rag-langchain

Example of an agent based on LangChain and OpenVINO:

https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-agent-langchain

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.