Exploring Local Large Language Models and associated key challenges

Raj Uppadhyay
6 min readFeb 13, 2024

--

Local LLMs

As I delved into the realm of Large Language Models (LLMs), I embarked on a quest to gain a comprehensive understanding of their various facets. My exploration encompassed an investigation into the security implications, accessibility considerations, and feasibility of employing generative AI. Additionally, I examined the consequences and limitations associated with its usage, as well as the ethical and legal dimensions surrounding it.

In my pursuit of knowledge, I compared offerings from prominent cloud platforms such as Google, Azure, AWS, and OpenAI. I evaluated their respective options for running LLMs on local environments, as well as in isolated internet environments, with a focus on maintaining privacy and bolstering security.

Throughout my journey, I’ve faced various challenges in configuring and implementing methods to handle recurring issues. Along the way, I found some solutions and even developed my own personal copilot (a Visual Studio Code Extension).

In the spirit of sharing my learnings and experiences with other passionate individuals, I’m covering this topic around Local LLMs in a series of posts on Medium. I hope you’ll find my insights valuable and applicable to your own work.

Let’s get started!

For simplicity perspective and easy to digest the concept, I have broken this path into following sections.

  1. Setup LLM locally and understand some challenges
  2. Understanding RAG bit more and try some hands-on
  3. Trying Personal Copilot and connect it with local LLMs

In this post we will cover the part#1 i.e. setting up LLM locally.

Prerequisites:

  1. LM Studio: An easiest way for configuring local LLM(s) and playing around it. Download it from: https://lmstudio.ai/
  2. Python latest version, installed and configured on your machine. Basic understanding of downloading python packages.
  3. Visual Studio Code
  4. Node latest version configured
  5. Basic understanding of any local LLM e.g. we will go with Llama in this post (https://llama.meta.com/)
  6. Having 8 GB RAM and 10–15 GBs of free space for better performance. For more info on system requirements refer their official site: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

Running LLM Locally:

1. Open LM Studio and Search for Your Preferred Local LLM:

  • Launch LM Studio on your machine.
  • On the landing page, search for your favorite local language model (LLM).
  • As an example, let’s type “Llama” and click on “Go.”

2. Select the Recommended Model:

  • From the search results, select “TheBloke/Llama-2–7B-Chat-GGUF” from the left-hand side.
  • Then, choose “llama-2–7b-chat.Q5_K_S.gguf,” which is one of the recommended models on HuggingFace.

3. Wait for the Download to Complete:

  • Please be patient while the model downloads. It is approximately 4.5 GB in size and may take some time, depending on your internet speed.

4. Select the Downloaded Model in AI Chat:

  • On the left-hand panel in LM Studio, click on “AI Chat.”
  • From the top, select the model you just downloaded.

5. Enjoy Offline Interaction with Your LLM:

  • Congratulations! You’re all set.
  • To ensure a truly offline experience, turn off your internet connection.
    Now, you can interact with your offline LLM.

Let’s Explore some challenges with LLM (specially local one)

The capabilities of cutting-edge AI models, like Google’s BERT and OpenAI’s GPT-3, are remarkable in generating text that closely resembles human speech. These LLMs leverage vast text datasets to learn intricate linguistic patterns and nuances. Yet, despite their impressive performance, LLMs may occasionally produce responses that are irrelevant or inaccurate, as their access to factual information is limited.

LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs.

Examples:

Following are a few examples from my experience and experimentation while developing some apps with my local LLMs (Llama 2), where I notice these issues/limitations.

  1. I was looking for, “Current Top 5 companies worldwide in terms of their Revenues”, and the answers I was getting with figures were outdated (as the model was tuned with data up to 2022). And I was also struggling to find the source for getting some more details around the facts and figures but was not able to find one.
  2. Similarly, I was looking for some code generation for Playwright, but the code generated was with an older version, which was not compatible with the latest version available and hence was not working.
  3. When asked for a recent winner for Indian Big Boss, it answered with the winner from 2020. Whereas, it is season 19 and the year 2024.

Do you see some problems here?

The Problem:
Though these are few examples from my experiment, however, these will be challenges with LLMs in future as well. We definitely see retraining the model as the first solution, but information changing every second makes existing knowledge outdated in the same moment as well. Training will be a time consuming and expensive approach. Are there any solutions?

The Solution:
Answer is yes, and one of the solutions i.e. the process of bringing the appropriate information and inserting it into the model prompt is known as Retrieval Augmented Generation (RAG).

The Power of Retrieval Augmented Generation (RAG):

Retrieval Augmented Generation addresses the limitations of LLMs by incorporating an information retrieval component. RAG leverages external knowledge sources, such as web documents or structured databases, to retrieve relevant information related to the input query. This retrieved information is then used to guide the LLM’s generation process, ensuring that the output is both coherent and factually grounded.

Benefits and Applications of RAG:

The combination of LLMs and information retrieval in RAG offers numerous benefits and opens up a wide range of applications:

  1. Enhanced Factual Accuracy: By grounding the generation process in real-world information, RAG reduces the likelihood of generating false or misleading responses. This makes it particularly valuable for tasks that require accurate and reliable information, such as question answering and knowledge-based dialogue systems.
  2. Improved Contextualization: RAG enables LLMs to better understand the context of the input query by providing relevant background information. This contextualization leads to more coherent and focused responses, making RAG ideal for applications such as document summarization and text summarization.
  3. Knowledge-Intensive Tasks: RAG excels in knowledge-intensive tasks where LLMs alone may struggle. By accessing external knowledge sources, RAG can generate responses that are informed by up-to-date and comprehensive information. This capability makes RAG suitable for tasks like information retrieval, report generation, and customer support.
  4. Real-Time Information Integration: RAG’s ability to retrieve information in real-time makes it adaptable to dynamic and evolving environments. This enables the generation of responses that are not only accurate but also reflect the latest developments, making RAG invaluable for applications such as news reporting, financial analysis, and medical diagnosis.

To summarise, we explored understanding Large Language Models (LLMs) and explored some security implications, accessibility, and feasibility. We touched on offerings from prominent cloud platforms and evaluated options for running LLMs locally while maintaining privacy and security. The author also shares challenges faced and solutions found while configuring and implementing methods to handle recurring issues. They introduce the concept of Retrieval Augmented Generation (RAG) as a solution to the limitations of LLMs and highlight its benefits and various applications.

In my subsequent post, I will delve deeper into RAG (Retrieval-Augmented Generation) and explore practical hands-on solutions to address the issues discussed in this article. Stay tuned for more insights and actionable strategies.

--

--

Raj Uppadhyay

Testing professional with 17+ years of IT experience, specialized in the domain of Software Quality, Test Automation and Performance Architecture.