Ollama and LangChain: Run LLMs locally

Run open-source LLM, such as Llama 2,mistral locally

Abonia Sojasingarayar
4 min readFeb 29, 2024
Ollma+Langchain — Demo

Introduction

In the realm of Large Language Models (LLMs), Ollama and LangChain emerge as powerful tools for developers and researchers. Ollama provides a seamless way to run open-source LLMs locally, while LangChain offers a flexible framework for integrating these models into applications. This article will guide you through the process of setting up and utilizing Ollama and LangChain, enabling you to harness the power of LLMs for your projects.

1. Setting Up Ollama

Installation and Configuration

To start using Ollama, you first need to install it on your system. For macOS users, Homebrew simplifies this process:

brew install ollama
brew services start ollama

After installation, Ollama listens on port 11434 for incoming requests. You can verify its operation by navigating to `http://localhost:11434/` in your browser. The next step involves selecting the LLM you wish to run locally. For instance, to run the `llama2`, `mistral` etc model, execute:

ollama pull mistral

This command downloads the model, optimizing setup and configuration details, including GPU usage.

Also you can download and install ollama from official site.

2. Running Models

To interact with your locally hosted LLM, you can use the command line directly or via an API. For command-line interaction, Ollama provides the `ollama run <name-of-model>` command. Alternatively, you can send a JSON request to the API endpoint of Ollama:

curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'

This flexibility allows you to integrate LLMs into various applications seamlessly.

3. Integrating Ollama with LangChain

LangChain is a framework designed to facilitate the integration of LLMs into applications. It supports a wide range of chat models, including Ollama, and provides an expressive language for chaining operations. To get started, you’ll need to install LangChain and its dependencies.

Official Documenation

Using Ollama in LangChain

To use Ollama within a LangChain application, you first import the necessary modules from the `langchain_community.llms` package:

from langchain_community.llms import Ollama

Then, initialize an instance of the Ollama model:

llm = Ollama(model="llama2")

You can now invoke the model to generate responses. For example:

llm.invoke("Tell me a joke")

This code snippet demonstrates how to use Ollama to generate a response to a given prompt.

Advanced Usage

from langchain_community.llms import Ollama

llm = Ollama(model="mistral")
llm("The first man on the summit of Mount Everest, the highest peak on Earth, was ...")

LangChain also supports more complex operations, such as streaming responses and using prompt templates. For instance, you can stream responses from the model as follows:

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = Ollama(
model="mistral", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
llm("The first man on the summit of Mount Everest, the highest peak on Earth, was ...")

This approach is particularly useful for applications requiring real-time interaction with LLMs.

4. Deploying with LangServe

For production environments, LangChain offers LangServe, a deployment tool that simplifies the process of running your application. You can use LangServe to deploy your LangChain application.

LangServe is an open-source library of LangChain that makes your process for creating API servers based on your chains easier. LangServe provides remote APIs for core LangChain Expression Language methods such as invoke, batch, and stream.

from typing import List
from fastapi import FastAPI
from langchain.llms import Ollama
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langserve import add_routes
import uvicorn

llama2 = Ollama(model="mistral")
template = PromptTemplate.from_template("Tell me a joke about {topic}.")
chain = template | llama2 | CommaSeparatedListOutputParser()

app = FastAPI(title="LangChain", version="1.0", description="The first server ever!")
add_routes(app, chain, path="/chain")

if __name__ == "__main__":
uvicorn.run(app, host="localhost", port=8000)

Run the below code to start your lagserve and head on to

http://localhost:9001/chain/playground/

to access your playground for to generate the joke on particular topic.Its your turn now to test with your custom prompt.

Launch LangServe
Demo-Playground

Conclusion

By integrating Ollama with LangChain, developers can leverage the capabilities of LLMs without the need for external APIs. This setup not only saves costs but also allows for greater flexibility and customization. Whether you’re building a chatbot, a content generation tool, or an interactive application, Ollama and LangChain provide the tools necessary to bring LLMs to life.

Connect with me on Linkedin

Find me on Github

Visit my technical channel on Youtube

Thanks for Reading!

--

--

Abonia Sojasingarayar

Principal Research Scientist | Machine Learning & Ops Engineer | Data Scientist | NLP Engineer | Computer Vision Engineer | AI Analyst