#AI chat app powered by Ollama and Llama 3.1:8b (local RAG based)

Chamin Hewage
LiIDS MAGAZINE.am
Published in
3 min readAug 2, 2024
(Image courtesy: Pixabay)

Hello world,

As you already know, now I am back in the industry, and I work for HPE. HPE is a great place to meet a lot of interesting people. And today’s post is inspired from a quick chat I had with a colleague of mine — Saad Zaher.

During one of our casual conversations, I mentioned to Saad my interest in diving into Large Language Models (LLMs). Although I’ve dabbled in deep learning before starting my Ph.D. in databases and data systems, I haven’t kept up with all the exciting advancements happening in the AI field recently. I needed a quick overview of LLMs and a little external motivation to get back into the latest trends in AI. My colleague Saad was incredibly generous and took the time to walk me through an LLM-based app that he and his team have been developing.

This brief chat sparked the curiosity and interest I needed to dive into LLMs, ultimately inspiring me to build a very basic, yet fun AI chat app based on these models. Exploring the potential of LLMs through this project has been an exciting journey, reigniting my passion for AI and its endless possibilities.

In this post, I’m excited to share the full source code of my app. I won’t be diving into any technical details, as I believe the best way to start with LLMs is to experience just how cool they are and see for yourself how easily they can be built with just a few lines of code. Let’s embark on this journey together and discover the fun and simplicity of creating with LLMs.

This app, which I built and ran entirely on my local machine, was deliberately designed to work without using GPUs. I believe the steps I followed here should be sufficient for you to get started as well. You can easily tweak this code and use your own data to experiment with the app.

So without further ado, let me go in to the simple technical details.

This project was done in Python 3.12 environment.

  1. First install the required dependencies via pip

2. Once dependencies are installed, import the dependencies and create the LLM model.

As you can see, in this example, I’m using Llama 3.1:8 billion parameter model.

3. Load the document that you’d like to chat with and create document chunks.

In this example, I’m loading the call.docx word document. This word document contains one of the meeting transcript- based on a meeting I had with a few collogues at HPE.

If you like to use PDF file, you will have to install require unstructured python dependency package. If that’s the case, install it by typing:

pip install "unstructured[pdf]

4. Create Embeddings

This process may take sometime.

5. Do some more processing.

As stated, the objective of this post is not to go into learn what it each instruction does, but to enjoy the “coolness” of LLMs. So I’m not going to explain this code snippet.

With that, the document processing is done and you are ready to chat with your application. Let me share you some of the questions I asked from the agent. For privacy and security reasons, I hide certain information.

Image2: Conversations I had with the LLM

Ok folks, that’s it for this blog post. I hope you guys enjoyed it.
If so, please do show some love by hitting that “clap” button ;)

--

--

Chamin Hewage
LiIDS MAGAZINE.am

I am a Data Systems' scientist (PhD). I work as a senior Database Engineer at HPE. I aspire to bring state-of-the-art to mainstream through innovation.