Utilizing Open Source LLMs on Colab

3 min readJan 22, 2024

Since the introduction of ChatGPT by OpenAI, Large Language Models (LLMs) have been stealing the spotlight, making waves across diverse applications. The open-source community has showcased its charm, yet again, offering us some pretty good Large Language Models (LLMs) that we can easily load and use on our local machines. Thanks to the Hugging Face Model Hub and its APIs, you can explore a variety of models on the Hugging Face Chatbot Arena Leaderboard here.

In this article, we’ll be working with the Openhermes-2.5-mistral-7b model, specifically its quantized version. Opting for the quantized model is advantageous as it helps minimize hardware requirements. These quantized versions are conveniently offered by TheBloke on Hugging Face. For additional tutorials and various download methods for the model, you can explore more on TheBloke’s OpenHermes-2.5-Mistral-7B-16k-GGUF page.

Install Required Libraries

Begin by installing the necessary libraries. This includes LangChain, CTransformers, Hugging Face Model Hub, Torch, Accelerate, and Sentence Transformers. Run the following commands in your Colab notebook:

!pip install langchain
!pip install ctransformers
!pip install huggingface-hub
!pip install torch
!pip install accelerate
!pip install sentence-transformers

Configure Colab for GPU

Make sure to change the Colab runtime to use the T4 GPU, and verify the GPU’s availability with the following code:

import torch
torch.cuda.is_available()

Download and Install the LLM Model

Download the quantized gguf file of the Mistral-7B model from TheBloke on Hugging Face. This can be achieved using the Hugging Face CLI with the command:

!huggingface-cli download TheBloke/OpenHermes-2.5-Mistral-7B-16k-GGUF openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

This command creates a new ‘OpenHermes-2.5-Mistral-7B-16k-GGUF’ folder in your local directory containing the required gguf file.

Import Libraries and Modules

Now, import the necessary libraries and modules for working with language models:

from langchain.llms import CTransformers
from langchain.chains import LLMChain
from langchain import PromptTemplate
from accelerate import Accelerator

Initialize the LLM Model

Initialize the Mistral LLM using LangChain and CTransformers. Adjust the configuration parameters based on your requirements:

accelerator = Accelerator()

config = {'max_new_tokens': 10000, 'repetition_penalty': 1.1, 'context_length': 16000, 'temperature':0, 'gpu_layers':50}
llm = CTransformers(model = "./openhermes-2.5-mistral-7b-16k.Q4_K_M.gguf", model_type = "mistral", gpu_layers=50, config=config)

llm, config = accelerator.prepare(llm, config)

print("LLM Initialized...")

Specify the path to the gguf file in the ‘model’ parameter of CTransformers.

Create a Prompt Template

Construct a prompt template to guide the LLM. In this example, a template is defined that prompts the model to act as a helpful assistant:

prompt = PromptTemplate(
    input_variables=["text_input"],
    template = """
    You are a helpful assistant, who answers users' queries with helpful responses.

    ------------
    {text_input}
    ------------

    Do not miss any information
    """
)

Query the LLM and Generate Response

Write a query to the LLM and generate a response using LLMChain. Here’s an example query:

query = "Write a Python function to perform arithmetic operations on two numbers."

llm_chain = LLMChain(llm=llm, prompt=prompt)
response = llm_chain.run(query)
print(response)

The generated response will provide you with detailed instructions on writing a Python function for arithmetic operations.

------------
    The function should take in two numbers and an operation as parameters and return the result of that operation.
    ------------
    The possible operations are: addition, subtraction, multiplication, division, and exponentiation.
    ------------
    If the operation is invalid, the function should return None.
    ------------
    You can assume that the input will always be valid.
    ------------
    Here's an example of how you can test your function:

    ```python
    def arithmetic_operation(a, b, op):
        if op == 'addition':
            return a + b
        elif op == 'subtraction':
            return a - b
        elif op == 'multiplication':
            return a * b
        elif op == 'division':
            return a / b
        elif op == 'exponentiation':
            return a ** b
    ```

And there you have it! We have successfully navigated through the installation, setup, and utilization of Large Language Models (LLMs) on Colab. This is just getting started. Further exploration can involve experimenting with different models and building something useful and interesting. Happy Coding!

Additional Resources

https://python.langchain.com/docs/get_started/introduction

https://www.promptingguide.ai/

https://python.langchain.com/docs/integrations/providers/ctransformers