Running Open Source LLMs in Jupyter Python Notebook as an Alternative to ChatGPT: Step by Step Guide on installing, setting up & running/inferencing Open Source LLMs

11 min readJul 6, 2023

Introduction

ChatGPT, the new AI chatbot from OpenAI, has taken the world by storm with its impressive conversational abilities and wide knowledge. However, many have raised concerns about its potential risks and biases.

If you’re looking for an open-source alternative to ChatGPT that you can run locally, large language models (LLMs) hosted in a Jupyter Notebook provide a powerful and customizable option.

In this blog post, I’ll walk you through how to install, set up and run open-source LLMs like GPT-3, Grover, and BERT right in your Jupyter Notebook.

Given you have a basic understanding of the processes to do the actual training, iterative cycles can be shortened.

Note

If you are looking to quickly set up and explore AI/ML & Python Jupyter Notebook Kit, Techlatest.net provides an out-of-the-box setup for AI/ML & Python Jupyter Notebook Kit on AWS, Azure, and GCP. Please follow the below links for the step-by-step guide to set up the AI/ML & Python Jupyter Notebook Kit on your choice of cloud platform.

For AI/ML KIT: AWS, GCP & Azure.

Why did you choose Techlatest.net VM, AI/ML Kit & Python Jupyter Notebook?

In-browser editing of code
Ability to run and execute code in various programming languages
Supports rich media outputs like images, videos, charts, etc.
Supports connecting to external data sources
Supports collaborative editing by multiple users
Simple interface to create and manage notebooks
Ability to save and share notebooks

Step-by-Step Guide to Installing, setting up & running/infrencing Open Source Large Language Models(LLM)

During VM selection “We are selecting GPU instance by going to GPU tab and selecting the desired GPU instance type. GPU instance will give 10 to 15 times better performance compared to equivalent CPU instance, however, GPU instances will have a significantly higher cost, so choose the right instance type for your performance and budget requirement”.

I take all three examples of AWS, GCP & Azure for Your Reference.

After setup the VM, we can log in to Jupyter Hub, so below you can see step by step guide.

Step 1

This VM comes with the default Ubuntu as an admin user. So to access the Web UI and to install additional packages, log in with the Ubuntu user and the password you set during the first login to the Jupyter Notebook.

Step 2

Open a Terminal in your Jupyter Notebook and enter the below command to install the there package using pip.

sudo -E pip install there

Note: Don’t forget to use sudo in the above command.

Step 3

Choose an LLM model

Decide which open-source LLM model you want to use:

There are a lot of LLMs mentioned below, but we can run two or three LLMs for Example.

GPT-3

A large autoregressive language model. Good for generating text.

GPT-3 is a large language model developed by OpenAI that exhibits a wide range of capabilities including natural language generation, text summarization, question answering, and translation. GPT-3 uses a Transformer-based architecture and was trained on an immense dataset of over 750GB of Internet text. It has a staggering 175 billion parameters, making it the largest language model to date. GPT-3’s main strengths lie in its ability to produce human-like text and responses, due to being trained on massive amounts of real-world data. However, like all large language models, GPT-3 also suffers from issues like bias, lack of context, and inability to understand complex ideas. Despite its limitations, GPT-3 has proven to be a groundbreaking model that has the potential to change the way we interact with and build AI systems.

Used Cases of GPT-3

Here are some common use cases of GPT-3, the large language model developed by OpenAI:

1. Text Generation — GPT-3 can generate human-like text and responses based on a given prompt. This makes it useful for applications like chatbots, story generators, content writing assistants, and more.

2. Question Answering — GPT-3 can accurately answer complex questions across many domains due to being trained on a vast amount of information. It can be used for building question-answering systems.

3. Summarization — GPT-3 can summarize long texts and documents in a concise and coherent manner. This allows it to be used for summarizing news articles, research papers, legal documents, and other texts.

4. Translation — GPT-3 is capable of performing text translation between different languages with decent accuracy. This makes it useful for machine translation systems.

5. Code Generation — GPT-3 has been shown to be able to generate basic code snippets in programming languages like Python and JavaScript given natural language descriptions. This opens up possibilities for code assistants and AI pair programming.

6. Conversational AI — Due to its ability to generate human-like responses, GPT-3 can be used to build more natural and engaging conversational AI systems like chatbots and virtual assistants.

7. Data Augmentation — GPT-3’s text generation capabilities allow it to be used for augmenting datasets by synthetically generating more training examples. This helps improve the performance of machine learning models.

In summary, as a large language model with broad capabilities, GPT-3 can be applied to a wide range of use cases involving natural language generation, understanding, and reasoning. However, like all AI systems, it also has limitations that need to be considered for real-world deployment.

Grover

A large bidirectional Transformer for text generation and question answering.

Grover is an open-source large language model developed by the Hugging Face organization. It uses a Transformer-based architecture and has 2.6 billion parameters, making it one of the largest openly available models. Grover was trained on a very large dataset consisting of over 750 GB of web text. Unlike many other large language models that focus only on language generation, Grover was trained for both language generation and question answering, making it well-suited for tasks like summarization, text completion, and fact-checking. In evaluations, Grover has demonstrated state-of-the-art or near state-of-the-art performance on a wide range of NLP tasks including question answering, reading comprehension, and text generation. Grover’s multi-task training approach and ability to perform both language understanding and language generation tasks make it a versatile open-source alternative to commercial large language models.

Used Cases of Grover

Here are some common use cases of Grover, the large language model developed by Hugging Face:

1. Text Generation — Grover can generate human-like text given a prompt or context. This makes it useful for applications like chatbots, story generators, content writing assistants, etc.

2. Question Answering — Grover was trained for both language generation and question answering, so it can accurately answer complex questions across different domains. This allows it to be used for building question-answering systems.

3. Summarization — Due to its dual training for language generation and understanding, Grover can summarize long texts and documents in a concise and coherent manner. It can be used for summarizing news articles, research papers, etc.

4. Text Classification — Grover’s understanding of language and context enables it to perform text classification tasks like sentiment analysis, topic classification, and spam detection.

5. Named Entity Recognition — Grover can identify and classify named entities like people, locations, and organizations in text. This allows it to be used for information extraction from documents.

6. Relation Extraction — Grover can extract semantic relations between entities in text. This makes it useful for knowledge graph construction from unstructured data.

7. Conversational AI — Grover’s ability to generate relevant and interesting responses given a context allows it to be used for building more natural conversational AI systems like chatbots and assistants.

8. Data Augmentation — Grover’s text generation capabilities enable it to synthetically generate more training examples, which helps improve the performance of machine learning models through data augmentation.

In summary, Grover’s multi-task training approach and dual strengths in language generation and understanding allow it to be used for a wide variety of natural language processing use cases. However, like all large language models, it also faces challenges in terms of robustness, reliability, and safety for real-world deployments.

BERT

A Transformer-based model for language understanding tasks like question answering and sentiment analysis.

BERT stands for Bidirectional Encoder Representations from Transformers. It is a large language model created by Google in 2018 that pioneered the technique of pre-training deep learning models on large text corpora. BERT uses a multi-layer Transformer encoder and is trained on two tasks: masked language modeling and next-sentence prediction. This pre-training approach allows BERT to learn contextual relationships between words that can be used for a wide range of downstream natural language processing tasks. After pre-training, BERT can be fine-tuned with a small amount of task-specific labeled data for various applications like question answering, text classification, sentiment analysis, named entity recognition, and more. BERT has significantly improved state-of-the-art results on many NLP tasks and has led to the development of many BERT-based models. It has become an important building block for many natural language processing applications.

Used Cases of BERT

Here are some common use cases of BERT, the large language model developed by Google:

1. Question Answering — BERT can accurately answer complex questions by understanding the context of the question and identifying the relevant parts of the text to extract the answer. This allows it to be used for building question-answering systems.

2. Text Classification — BERT’s pre-trained representations enable it to perform text classification tasks like sentiment analysis, topic classification, and spam detection with high accuracy.

3. Named Entity Recognition — BERT can identify and classify named entities like people, locations, and organizations in text. This allows it to be used for information extraction from documents.

4. Relation Extraction — BERT can extract semantic relations between entities in text. This makes it useful for knowledge graph construction from unstructured data.

5. Semantic Similarity — BERT’s contextual word representations allow it to determine the semantic similarity between words and sentences. This can be used for applications like plagiarism detection, recommendation systems, etc.

6. Text Summarization — BERT can summarize long texts and documents in a concise and coherent manner by identifying the most important parts of the text. It can be used for summarizing news articles, research papers, legal documents, and other texts.

7. Sentiment Analysis — BERT’s pre-trained language representations enable it to accurately determine the sentiment of text, whether positive, negative, or neutral. This allows it to be used for sentiment analysis applications.

8. Language Inference — BERT can perform natural language inference tasks like determining whether one sentence entails or contradicts another sentence.

In summary, BERT has revolutionized natural language processing by introducing the concept of pre-training transformer-based models. Its diverse use cases range from fundamental NLP tasks to more complex applications involving language understanding and reasoning.

I am taking one or two LLMs for demo Purposes.

1. Install Transformers

Transformers is a popular Python library for state-of-the-art natural language processing. We’ll use it to interface with the LLMs.

In a terminal or command prompt, run:

pip install transformers

This will install Transformers and all its dependencies.

2. Choose an LLM model

Decide which open-source LLM model you want to use:

GPT-3 — A large autoregressive language model. Good for generating text.
Grover — A large bidirectional Transformer for text generation and question answering.
BERT — A Transformer-based model for language understanding tasks like question answering and sentiment analysis.

3. Load the model into a Jupyter Notebook

In a Jupyter Notebook, import Transformers and load your chosen LLM model. For example, for GPT-3:

  from transformers import GPT2LMHeadModel, GPT2Tokenizer

  tokenizer = GPT2Tokenizer.from_pretrained('gpt2')  
  model = GPT2LMHeadModel.from_pretrained('gpt2')

4. Generate text or answer questions

Use the model to generate text continuations, answer questions, etc. For example:

  input_text = "Today I went to the park and "

  input_ids = tokenizer.encode(input_text, return_tensors='pt')

  output = model.generate(input_ids, max_length=100)

  print(tokenizer.decode(output[0], skip_special_tokens=True))

You’ll get text output from the model continuing the input text.

Conclusion

In conclusion, setting up and running open-source large language models like GPT-3, Grover, and BERT in a Jupyter Notebook provides an ethical and customizable alternative to closed-source chatbots like ChatGPT. Some key takeaways:

• Open source LLMs allow you to have full control over the model, how it is used, and what data it has access to. This helps address issues of bias, toxicity, and lack of transparency in commercial models.

• With the right setup, you can fine-tune LLMs for your specific domain and tasks, achieving better performance than general-purpose models.

• Running LLMs locally allows you to keep your data private and secure, without exposing it to external APIs.

• Interfacing with LLMs through Jupyter Notebooks provides an interactive and flexible development environment, making it easy to experiment, test and improve the models.

• While open-source LLMs are still limited compared to commercial models like GPT-3, they are improving rapidly and becoming viable alternatives for many use cases.

• Setting up LLMs in the cloud on GPU-enabled instances can provide the computational power required to train large models from scratch.

• Open source LLMs represent an important step towards democratizing access to the capabilities of large language models, helping researchers, developers, and the general public.

So in summary, setting up and running open-source LLMs in Jupyter Notebooks provides an ethical, customizable, and secure alternative to commercial chatbots. With continued development, open-source LLMs have the potential to match or exceed the capabilities of closed-source models while avoiding some of their risks and limitations.