Ollama and LangChain: Run LLMs locally
Run open-source LLM, such as Llama 2,mistral locally
Introduction
In the realm of Large Language Models (LLMs), Ollama and LangChain emerge as powerful tools for developers and researchers. Ollama provides a seamless way to run open-source LLMs locally, while LangChain offers a flexible framework for integrating these models into applications. This article will guide you through the process of setting up and utilizing Ollama and LangChain, enabling you to harness the power of LLMs for your projects.
1. Setting Up Ollama
Installation and Configuration
To start using Ollama, you first need to install it on your system. For macOS users, Homebrew simplifies this process:
brew install ollama
brew services start ollama
After installation, Ollama listens on port 11434 for incoming requests. You can verify its operation by navigating to `http://localhost:11434/` in your browser. The next step involves selecting the LLM you wish to run locally. For instance, to run the `llama2`, `mistral` etc model, execute:
ollama pull mistral
This command downloads the model, optimizing setup and configuration details, including GPU usage.
Also you can download and install ollama from official site.
2. Running Models
To interact with your locally hosted LLM, you can use the command line directly or via an API. For command-line interaction, Ollama provides the `ollama run <name-of-model>` command. Alternatively, you can send a JSON request to the API endpoint of Ollama:
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
This flexibility allows you to integrate LLMs into various applications seamlessly.
3. Integrating Ollama with LangChain
LangChain is a framework designed to facilitate the integration of LLMs into applications. It supports a wide range of chat models, including Ollama, and provides an expressive language for chaining operations. To get started, you’ll need to install LangChain and its dependencies.
Official Documenation
Using Ollama in LangChain
To use Ollama within a LangChain application, you first import the necessary modules from the `langchain_community.llms` package:
from langchain_community.llms import Ollama
Then, initialize an instance of the Ollama model:
llm = Ollama(model="llama2")
You can now invoke the model to generate responses. For example:
llm.invoke("Tell me a joke")
This code snippet demonstrates how to use Ollama to generate a response to a given prompt.
Advanced Usage
from langchain_community.llms import Ollama
llm = Ollama(model="mistral")
llm("The first man on the summit of Mount Everest, the highest peak on Earth, was ...")
LangChain also supports more complex operations, such as streaming responses and using prompt templates. For instance, you can stream responses from the model as follows:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
llm = Ollama(
model="mistral", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
llm("The first man on the summit of Mount Everest, the highest peak on Earth, was ...")
This approach is particularly useful for applications requiring real-time interaction with LLMs.
4. Deploying with LangServe
For production environments, LangChain offers LangServe, a deployment tool that simplifies the process of running your application. You can use LangServe to deploy your LangChain application.
LangServe is an open-source library of LangChain that makes your process for creating API servers based on your chains easier. LangServe provides remote APIs for core LangChain Expression Language methods such as invoke, batch, and stream.
from typing import List
from fastapi import FastAPI
from langchain.llms import Ollama
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langserve import add_routes
import uvicorn
llama2 = Ollama(model="mistral")
template = PromptTemplate.from_template("Tell me a joke about {topic}.")
chain = template | llama2 | CommaSeparatedListOutputParser()
app = FastAPI(title="LangChain", version="1.0", description="The first server ever!")
add_routes(app, chain, path="/chain")
if __name__ == "__main__":
uvicorn.run(app, host="localhost", port=8000)
Run the below code to start your lagserve and head on to
http://localhost:9001/chain/playground/
to access your playground for to generate the joke on particular topic.Its your turn now to test with your custom prompt.
Conclusion
By integrating Ollama with LangChain, developers can leverage the capabilities of LLMs without the need for external APIs. This setup not only saves costs but also allows for greater flexibility and customization. Whether you’re building a chatbot, a content generation tool, or an interactive application, Ollama and LangChain provide the tools necessary to bring LLMs to life.