Launch Your own ChatGPT Clone for Free on Colab: Shareable and Online in less than 10 Minutes

Bartek Lewicz
8 min readMar 21, 2024

--

Dalle-3 Generated Image. Prompt: A cartoon llama with a playful expression, joyfully jumping on a giant spider’s web, which is strung between two colorful trees. The scene is vivid and playful.

In my previous article (that can be found here!), I explained how to deploy a multimodal AI model in the cloud leveraging Google Colab’s free tier, providing a step-by-step guide on initiating a notebook from ground zero.

To engage with that model, we previously had to send requests via terminal commands, Python scripts, or other programming interfaces. However, what if we want more intuitive interaction, similar to what ChatGPT or Gemini offers? Or if we want to experiment with various models, downloading them on the fly? Perhaps, even sharing a link of your own private GPT instance with other people.

Can all these functionalities be also integrated and accessible within the Google Colab free tier?

“Is such a thing even possible?” Source: https://giphy.com/

Yes, it is!

We will use Ollama together with one of the open-source ChatGPT-like UI interfaces and Localtunnel to make this happen.

Solution overview created with Python Diagrams library. Image by the Author.

Let’s break this down, step by step.

Getting started (again..)

Just as in the previous guide, accessing and running notebooks in Colaboratory (Colab) requires a Google account. For those yet to acquaint themselves with Google Colab, I recommend checking the “Getting Started” section in my previous article:

Step 1. Install & Run Ollama

Our initial step involves securing a platform capable of efficiently launching Large Language Models (LLMs). For this purpose, we will use an open-source tool called Ollama.

Ollama is a platform designed to facilitate the local operation of open large language models, such as LLaMA2, Mistral, Gemma, LLaVa. It features a REST API that simplifies the execution and management of these models, making it a user-friendly solution for developers and researchers.

To install and run Ollama add and run the following code block in your notebook:

# Pull Ollama install script and execute
!curl -fsSL https://ollama.com/install.sh | sh
# Run Ollama service in the background
!sudo ollama serve &>/dev/null&

The above block downloads the official Ollama installer shell script and subsequently executes it. Following this, the second command launches the Ollama service to run in the background, with all its outputs being redirected to /dev/null. This effectively silences any output from the second command, which means we won’t see the full output of the second command — service should be running in the background however.

Step 2. Update Colab’s Node.js to the latest version

Before diving into integrating a user interface into our platform, we must first address certain dependencies.

For this tutorial, we’ll utilize the NextJS Ollama LLM UI developed by Jakob Hoeg Mørk (available on GitHub as jakobhoeg). Among the various UI options listed in the official Ollama GitHub project, Jakob’s version strikes a good balance by offering some added functionality — such as the direct download of models via the interface — and ease of installation, with a minimal dependency requirement for running locally.

Jakob’s project requires Node.js version 18 or higher for smooth functioning. As of the writing of this notebook, Colab defaults to Node.js version v14.16.1. Consequently, we’ll need to update our environment.

To achieve this, we will employ the official Node.js script to obtain and install the most recent version available:

# Download the install script for the current Node.js version and run it.
!curl -fsSL https://deb.nodesource.com/setup_current.x | sh
# Update the system's package index.
!sudo apt-get update
# Install new Node.js version.
!sudo apt-get install -y nodejs

In this step, we download and execute the installer for the latest version of Node.js, refresh our system’s package index, and install the updated Node.js version in our Colab environment.

Now, it’s time to launch our UI!

Step 3. Download and Run UI application

As previously mentioned, we’ll employ one of the Ollama-suggested user interfaces: NextJS Ollama LLM UI, created by Jakob Hoeg Mørk.

His Next.JS application provides an experience similar to the ChatGPT interface in terms of its look and feel, and basic functionality. Additionally, it facilitates the direct download of new models through the application itself which will come in handy in the way we deploy Ollama on Colab.

To add this user interface into our setup, we will utilize the following block of code:

# Clone the repository from GitHub.
!git clone https://github.com/jakobhoeg/nextjs-ollama-llm-ui
# Change directory to the downloaded project directory
%cd nextjs-ollama-llm-ui
# Rename example.env file to .env to be used as our default configuration
!mv .example.env .env
# Install dependencies using Node Package Manager
!npm install
# Start the web server, discard all outputs and run in the background.
!npm run dev &>/dev/null&

The above code snippet fetches the project repository from GitHub, sets up the default configuration for the application server, leverages the NPM package manager to install its dependencies, and finally runs the app in the background. By default, the user interface is set to run on local port 3000. This information we will use later to make our server accessible on the web.

Step 4. Check Colab’s instance Public IP address

Next, we’ll determine the public IP address of our Google Colab instance, which will serve as the password for the security check implemented by the Localtunnel service. Remember, each time you initiate a new Colab runtime, your IP address may change.

To retrieve the public IP address of the Google Colab instance, add and run block with the following command:

# Retrieve Public IP address of Google Colab's instance.
!curl ifconfig.me

In my example, the IP address was 34.135.155.176.

Step 5. Install Localtunnel and expose UI to the internet.

To make our application accessible over the internet, we require a mechanism that can expose Colab’s localhost interface — where our UI is running — to the internet, through a public URL. For this purpose, we’ll employ Localtunnel, a tool designed to bring local web services online for testing, demonstration, or development purposes, all without need for extra services or complex configurations. To install and initiate Localtunnel, simply incorporate the following two commands into a new code block:

# Install localtunnel globally using NPM.
!npm install localtunnel
# Start localtunnel and expose port 3000 to the internet.
!npx localtunnel --port 3000

The above commands make use of the NPM package manager to install Localtunnel, followed by the execution of the service using the NPX package runner. By design, our interface is configured to accept incoming requests on port 3000, the same port we specify when setting up our tunnel.

As a result, we will be provided with a randomly generated URL, making our service accessible online. In my example, the URL provided was:

your url is: https://busy-kings-throw.loca.lt

Navigate to the link in your browser. On your first attempt, you will encounter a Tunnel security page, similar to the one below:

To gain access to our application, input the public IP address we obtained in Step 4 as the Tunnel Password. This action is necessary only on your first visit (and later every 7 days, which wouldn’t normally be the case on Colab). Note: any user you intend to share the link with will also need to use this password for initial access.

Step 6. Access Ollama UI, Download Models, and Start Chatting!

At last, we’re ready to access our very own ChatGPT-like interface. Upon your initial login, you’ll be prompted to enter a name, which the app will use to address you.

Afterward, it’s necessary to download the models for our application to function. This can be done by selecting the “pull model” option found within the menu shown below:

Pull the LLM model using the UI. Image by the Author.

A modal window will appear, prompting you to enter the name of the model you would like to download.

You can explore available models by visiting the Ollama Model Library at: https://ollama.com/library

For starters, I recommend beginning with base models such as Mistral, LLaMa2, or Gemma. However, virtually any model compatible with Colab’s free tier limitations is viable.

Pull the Mistral Modem UI example. Image by the Author.

Be prepared for a wait of about 2–3 minutes, time needed for Ollama to download model in the background. After the download, you may need to refresh the page or select the “New chat” option to ensure the application updates its inventory of available models.

For a full walkthrough refer to the recorded walkthrough available via GIF below:

UI Walkthrough. Image by the Author.

Closing Thoughts and Complete Code Notebooks

When using Google Colab, there are several factors to consider:

  • Colab’s free tier allows notebooks to run for up to 12 hours and supports only up to two simultaneous sessions.
  • During periods of high demand, access to the free tier’s T4 GPU may be limited. While it’s possible to execute the notebook using only a CPU, expect a noticeable drop in inference performance, meaning slower response times.
  • To make the most of resource units or active sessions, it’s recommended to operate the notebook only during the development or experimentation phase.
  • Colab offers storage capacity extending to 80–100GB, sufficient for downloading numerous models. However, it’s important to note that larger models, due to their increased parameter size, may exceed the available RAM/VRAM capacity, resulting in slower performance, even if fit into storage.
  • The approach outlined here is designed purely for experimental and developmental use. It’s not suitable for deployment in production environments, even with Colab’s paid subscription options.

If you’ve found these insights helpful or inspiring, consider:

Your engagement and support mean the world to me. It encourages me to continue sharing my knowledge and experiences, contributing to a community passionate about AI and technology. Let’s stay connected and keep the conversation going!

Complete Google Colab notebook used in this article is available on my GitHub:

References:

#Colab #LLM #AI #Models #CloudDeployment #FreeTier #API #Localtunnel #Ollama #ChatGPT #GPTClone #Google #OpenModels #OpenSource #ExperimentalDevelopment #ColabNotebooks #ModelDeployment

“That’s all Folks!” Source: https://giphy.com/

--

--

Bartek Lewicz

Tech Leader specializing in enhancing product experiences with AI, product quality and & data-driven decision-making.