Run your LLM Apps locally using Ollama and Debug with Langtrace

Langtrace
Langtrace
Published in
4 min readJul 3, 2024

By Yemi Adejumobi (Engineer)

In the rapidly evolving landscape of Artificial Intelligence, large language models (LLMs) have become increasingly powerful and ubiquitous. However, the costs and complexities associated with running these models in cloud environments can be prohibitive, especially for developers and small teams looking to experiment and innovate.

Enter Ollama, a game-changing tool that brings the power of LLMs to your local machine. This blog post will explore how Ollama can simplify your development process, allowing you to run LLM applications easily and efficiently while adding Langtrace. This open-source observability tool complements Ollama perfectly, providing crucial insights into your LLM application’s performance and behavior. Let’s dive in.

What is Ollama?

Ollama is an innovative tool that enables running large language models (LLMs) locally, providing a cost-effective solution for testing and development. By running LLMs locally, you can experiment and refine your ideas without incurring significant production costs.

By running LLMs locally, you can:

  • Reduce cloud costs: Save on cloud computing expenses by running LLMs on your local machine.
  • Faster experimentation: Quickly test and iterate on your ideas without relying on remote servers.
  • Improved data privacy: Keep your data local and secure, reducing the risk of data breaches.

Setting up Ollama and running LLMs locally

For this step, we will be using Meta’s latest open source model, Llama3. For most optimal performance with Ollama ensure your laptop has at least 16GB of RAM. If you do then follow these steps:

  1. Download and install Ollama https://ollama.com/download/Ollama-darwin.zip
  2. Download the desired LLM model (e.g., Llama3 or other open-source models). In a terminal window run the following to run llama3 locally for example
ollama run llama3
  • This is similar to docker commands, it will pull and run the llama3
  • Once it is done pulling, you should have a terminal prompt you can start chatting from.

For further customization and to use Modefile to create your own custom system prompt, refer to Ollama documentation here.

Instrumenting Ollama with Langtrace

Now that you have a local LLM, let’s say you are building a customer service bot and you would like to view detailed traces on the LLM requests, this is where Langtrace shines. Langtrace provides a Python SDK that enables observability for Ollama, allowing you to trace LLM calls and gain valuable insights into your application’s performance. To instrument Ollama with Langtrace:

  • Generate an API key from langtrace.ai — you can also self-host.
  • Install the Langtrace Python or Typescript SDK.
  • Import the SDK and initialize the SDK.
  • Start tracing!

Example code snippet:

from langtrace_python_sdk import langtrace, with_langtrace_root_span
import ollama
from dotenv import load_dotenv

load_dotenv()

# langtrace.init(write_spans_to_console=False)
langtrace.init(api_key = 'YOUR_API_KEY', write_spans_to_console=False)

@with_langtrace_root_span()
def give_recs():
response = ollama.chat(model='llama3', messages=[
{
'role': 'user',
'content': 'You are an AI assistant with expertise in mens clothing. Help me pick clothing for a black tie dinner at work.',
},
])
print(response['message']['content'])


if __name__ == "__main__":
print("Running fashionista bot...")
give_recs()

Here is what the trace looks like in Langtrace UI

Here is a link to a reference cookbook for Ollama integration with Langtrace.

Tracing LLM call

With Langtrace, you can now trace LLM calls and capture essential metadata, such as:

  • Input, Output and Total tokens
  • Latency
  • Error rates

This data provides valuable insights into your application’s performance, helping you optimize and improve it over time.

In the next blog in this series, we will cover how to use Langtrace to perform evaluations on your application’s accuracy and optimize its behavior.

Next steps

In conclusion, combining Ollama’s local LLM capabilities with Langtrace’s observability features unlocks a powerful toolset for building and optimizing LLM applications. By following the steps outlined in this post, you can leverage the benefits of running LLMs locally with Ollama, including reduced cloud costs, accelerated experimentation, and improved data privacy.

With Langtrace, you can gain valuable insights into your application’s performance, identify bottlenecks, and optimize its behavior. By integrating Ollama and Langtrace, you can build more efficient, effective, and innovative LLM applications. Try out Ollama and Langtrace today and discover the advantages of local LLM development and open-source observability for yourself!

Originally published at https://langtrace.ai on July 3, 2024.

--

--