Run GPT LLMs Locally with Just 8 Lines of Python: A Hassle-Free AI Assistant

5 min readSep 20, 2023

In the world of AI and machine learning, setting up models on local machines can often be a daunting task. Especially when you’re dealing with state-of-the-art models like GPT-3 or its variants. I’ve personally grappled with the frustrating process of setting up Llama2 on my machine. After numerous attempts, I stumbled upon a gem: GPT4All.

What is GPT4All?

GPT4All is an open-source platform that offers a seamless way to run GPT-like models directly on your machine. The beauty of GPT4All lies in its simplicity. Not only does it provide an easy-to-use interface, but it also offers downloadable models, making the entire process straightforward.

Having tried multiple methods to set up GPT-like models on my machine, GPT4All stands out for several reasons:

Open-Source: Being open-source, it offers transparency and the freedom to modify as per individual requirements.
Ease of Use: With just a few lines of code, you can have a GPT-like model up and running.
Downloadable Models: The platform provides direct links to download models, eliminating the need to search elsewhere.
No API Costs: While many platforms charge for API usage, GPT4All allows you to run models without incurring additional costs.

Setting Up GPT4All on Python

Here’s a quick guide on how to set up and run a GPT-like model using GPT4All on python.

I highly recommend to create a virtual environment if you are going to use this for a project. In this article you can look how to set up a virtual environment in python .

first install the libraries needed:

pip install langchain, gpt4all

I used this versions gpt4all-1.0.12 , langchain-0.0.296

Then you need to download the models that you want to try. In the same web page provided before (just scroll a little bit more). I recomend this two models if you have 16GB or more in RAM.

Sample Code and Response

Here’s the code I used (all credits to this youtube short video of Nicholas Renotte )

# Import dependencies
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All

# Specify model weights path
PATH='./nous-hermes-13b.ggmlv3.q4_0.bin'

# Create LLM Class
llm = GPT4All(model=PATH, verbose=True)

# Create a prompt template
prompt = PromptTemplate(
    input_variables=['instruction', 'input', 'response'],
    template="""
    ### Instruction:
    {instruction}
    ### Input:
    {input}
    ### Response:
    {response}
    """ )

chain = LLMChain(prompt=prompt, llm=llm)

# Run the prompt
# I used a childen story to test https://cuentosparadormir.com/infantiles/cuento/barba-flamenco-y-el-recortador-de-cuentos
# its about 783 words long!
chain.run(instruction="""Resume esta historia, hazlo en español""",
input="""[...story content...]""",
response='A: ')

The model’s response was insightful, but took 7 minutes to my computer. From a 783 words long input prompt to 75 words long summary, not bad!:

“No hay duda de que el papá aprendió una valiosa lección sobre la importancia de contar los cuentos completos y sin recortes para sus hijos. Al final, descubrió que era mucho más importante mantener la tradición familiar que ahorrarse unos minutos de tiempo. Aunque el viaje por el mundo de los cuentos fue peligroso e impredecible, el papá se dio cuenta de que las historias son un tesoro que no debe ser desperdiciado ni recortado.”

It’s really impresive comparing it to other models that i used before on hugging face.

The Power of Instruction and Input

At the heart of GPT4All’s functionality lies the instruction and input segments. These segments dictate the nature of the response generated by the model. The instruction provides a directive to the model, guiding its thought process. For instance, in the code provided, the instruction is to “Resume esta historia, hazlo en español” (Summarize this story, do it in Spanish). This tells the model the desired action and the language preference.

The input, on the other hand, is the content you wish the model to process. In our case, it’s a captivating tale of a father’s adventurous journey through a storybook world. The richness and length of the input directly influence the processing time.

Performance Insights

It’s essential to understand that the length and complexity of the prompt and instruction can impact the processing time. In my experiments, I aimed to use GPT4All to summarize extensive texts, including those in Spanish. I tested the model with a story sourced from a children’s story webpage. While the results were promising, the summary was slightly lengthy, taking about 7 minutes to process on a notebook equipped with a 3070 RTX, i9, and 32 GB RAM. Anyways i also tried shorter and easier tasks in english and it works just like a ChatGPT (arguibly better cause it doesn’t have censorchip in several models!)

This experience underscores the importance of fine-tuning the model and instructions for optimal performance. While the initial results were satisfactory, achieving efficiency might require multiple iterations.

Hardware Considerations

One of the standout features of GPT4All is its adaptability to various hardware configurations. While my notebook boasts high-end specifications, GPT4All can be easily implemented on any system with a decent hardware level. This flexibility makes it an attractive option for a wide range of users.

GUI Interface Alternative

If you want to interact directly with the interface like it was Chat GPT you can download it on an GUI interface , where you can choose and download the models you want to try (there are plenty to choose depeding on you computer capacity). And then interact with that interface like it was chat gpt (and even customize the window).

My Experience

Before discovering GPT4All, I spent countless hours trying to set up Llama2. The process was cumbersome, and I faced multiple roadblocks. In contrast, GPT4All offered a breath of fresh air. The entire setup was completed in minutes, and I was able to run GPT-like models without any hiccups.

Conclusion

In the ever-evolving world of AI, tools like GPT4All are a boon. They simplify the process, save time, and allow enthusiasts and professionals alike to focus on what truly matters: building and experimenting. If you’ve been struggling with setting up GPT-like models on your machine, I highly recommend giving GPT4All a try. It’s been the easiest solution I’ve tried so far, and I’m sure you’ll feel the same.