Llama 3 with Open WebUI and DeepInfra: The Affordable ChatGPT 4 Alternative

Kelvin Campelo
9 min readApr 30, 2024

--

Did you know that you can have your own private ChatGPT, 30 times cheaper (output price) and superior in complex tasks to GPT-4? Yes, you read that right!

In this article, we will reveal the secret to creating your own private conversational assistant, using the latest artificial intelligence technologies and cutting-edge infrastructure, such as DeepInfra, which offers low-latency and cost-benefit infrastructure for deep learning models. Get ready to revolutionize your way of interacting with artificial intelligence!

In addition to saving resources, you will also have the freedom to customize your chat experience with Open WebUI, configuring advanced parameters and choosing from a variety of language models. And, to help you choose the right model for your needs, we will compare the LLama-3 70B with the most popular models on the market, such as ChatGPT 3.5 and ChatGPT 4 Turbo. Get ready to discover how the power of artificial intelligence can be accessible and customizable.

Here’s what you’ll learn:

  • What is Open WebUI and how it can be used to create a private ChatGPT experience
  • What is DeepInfra and how it offers low-latency and cost-benefit infrastructure for deep learning models
  • How to create your own private ChatGPT using Open WebUI and the DeepInfra service
  • Which models can solve the logic challenge with the Magic Elevator

What is Open WebUI?

Imagine having a customizable user interface, rich in resources and user-friendly, that allows you to choose from a variety of language models and create a unique personal private ChatGPT experience. That’s what Open WebUI offers! Compatible with LLM runners like Ollama and OpenAI APIs, such as DeepInfra, Open WebUI is the key to a personalized and adaptable chat experience.

Advantages of Open WebUI

  • Intuitive interface: Open WebUI’s chat interface is inspired by ChatGPT, ensuring a user-friendly experience.
  • Refined control with advanced parameters: Users can adjust parameters such as temperature, which can be adjusted to control creativity and randomness of responses, and save their own system prompts to personalize conversations.
  • Support for multiple language models: Open WebUI is compatible with a variety of language models.
  • Integration with OpenAI and Ollama: Easily integrate with OpenAI APIs and LLM runners like Ollama for a richer and more diverse chat experience.
  • Continuous updates: Enjoy regular updates and new features to keep your chat experience always up-to-date and improved.

DeepInfra: Low-Latency and Cost-Benefit Inference Infrastructure

Imagine having access to low-latency and cost-benefit infrastructure for your deep learning models. That’s the solution! With DeepInfra, you can easily deploy the latest Machine Learning (ML) models in production, without worrying about the complexity and cost associated with implementing high-performance processing infrastructure.

Advantages of DeepInfra

  • Scalable and low-cost infrastructure: DeepInfra offers low-latency and cost-benefit infrastructure for LLM models.
  • Support for multiple language models: Supports a variety of language models, including meta-llama/Meta-Llama-3–70B-Instruct, mistralai/Mixtral-8x22B-Instruct-v0.1, cognitivecomputations/dolphin-2.6-mixtral-8x7b, and many more.
  • Easy and efficient integration: Also provides an API compatible with OpenAI for all recent LLM models and Embeddings models.
  • Cutting-edge hardware: All models are run on H100 or A100 GPUs, optimized for inference performance and low latency.
  • Auto Scaling: The system automatically scales the model to more hardware based on your needs.
  • Flexible billing: Some models are billed by tokens, while others are billed by usage time. You receive $1.80 when you sign up and can set a spending limit to avoid surprises. Invoices are generated at the beginning of each month.

In this tutorial, you will learn how to create your own private ChatGPT using Open WebUI and the DeepInfra service.

The steps include:

  • Installing Docker: a development platform for creating, deploying, and running applications in containers.
  • Installing and configuring Open WebUI: a web user interface for interacting with artificial intelligence models.
  • Creating an account on DeepInfra and obtaining the API Key: a service that offers access to open-source models on cutting-edge infrastructure at a low cost.
  • Integrating the DeepInfra API Key with Open WebUI: to allow Open WebUI to access the models provided by DeepInfra.
  • Testing our private chat: experimenting with different models, such as LLama-3 70B, and comparing them with the famous ChatGPT 3.5 and ChatGPT 4 Turbo.

Get ready to create your own private conversational assistant! Whether you’re a MacOS or Linux user. Although we’ll be using an Ubuntu instance on AWS LightSail in this tutorial, you have the freedom to choose your own instance anywhere, including on your local machine.

Tutorial

Installing Docker

If you’re using a MacOS, you can find instructions on how to install Docker on MacOS by clicking here.

For Linux users, installing Docker is a straightforward process. Simply open a terminal and run the following commands:

$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh

After completing the installation, on Linux or MacOS, use the following command to run a test image:

$ sudo docker run hello-world

This command downloads a test image and runs it in a container. If successful, it prints an informative message confirming that Docker is installed and working correctly.

Now let’s create a folder for our project:

$ mkdir open-webui

And let’s access this folder:

$ cd open-webui

We need to create a file called docker-compose.yml:

services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
ports:
- "3000:8080"
environment:
ENV: production
DEFAULT_USER_ROLE: pending
ENABLE_SIGNUP: True
OPENAI_API_BASE_URL: https://api.deepinfra.com/v1/openai
OPENAI_API_KEY: <DeepInfra_API_KEY>
volumes:
- open-webui:/app/backend/data

volumes:
open-webui: {}

To continue, we’ll need to replace <DeepInfra_API_KEY> with the secret key from your DeepInfra account:

  1. Register on the DeepInfra platform through the link: https://deepinfra.com/login (no registration required, just access with your GitHub account).
  2. Then go to the API Keys section and create a new key by clicking on the button in the top right corner of the page.
  3. Copy and paste, replacing the text <DeepInfra_API_KEY> in the docker-compose.yml file.

After replacement, your file should look like this:

services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
ports:
- "3000:8080"
environment:
ENV: production
DEFAULT_USER_ROLE: pending
ENABLE_SIGNUP: True
OPENAI_API_BASE_URL: https://api.deepinfra.com/v1/openai
OPENAI_API_KEY: !@#34_n40_3_um4_53cr3t_k3y_v4l1d4_56&
volumes:
- open-webui:/app/backend/data

volumes:
open-webui: {}

Save the file and run the following command:

$ sudo docker compose up -d

You’ll need to wait a few moments until the system is ready for use. You can check this by running the command:

$ sudo docker logs open-webui

If the message is similar to this, you’ll know that the system is ready.

No WEBUI_SECRET_KEY provided
Generating WEBUI_SECRET_KEY
Loading WEBUI_SECRET_KEY from .webui_secret_key
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)

Here, it is informed that the application initialization was successful. Then, the internal address of the application in the Docker container is displayed, which is http://0.0.0.0:8080. However, as we configured in the docker-compose.yml file, this address is redirected to port 3000 of the operating system.

Now that the system is ready, you can access the Open WebUI through port 3000 of the computer serving the application. If you installed Open WebUI on your computer, you can access it through http://localhost:3000. If you installed it on a LightSail instance, like I did, you'll need to add a rule to open port 3000 in the LightSail settings on AWS.

When accessing Open WebUI for the first time, you’ll need to create an account. This will allow you to personalize your experience and manage your conversations more effectively.

Login Page of Open WebUI

To register, simply provide your name, email, and password. After logging in, you will have access to the Open WebUI interface, which is very similar to ChatGPT.

Open WebUI Home Page

We’re almost there! The last step before starting to chat with our private chat is to select the desired language model through the selection menu at the top. You can choose from a variety of models, each with its own characteristics and capabilities.

Model Selection in Open WebUI

Now that we are able to select the new model released by Meta: meta-llama/Meta-Llama-3–70B-Instruct, but before we move forward and test this model, let’s hold on for a bit and first see how ChatGPT 3.5 and 4 will handle our logic challenge. Afterwards, we’ll return to Open WebUI.

Logic Challenge with the Magic Elevator

The Logic Challenge with the Magic Elevator is a logic problem that tests a language model’s ability to solve a complex problem that involves reasoning and understanding rules. The challenge presents a scenario where a magic elevator stops on an even floor and returns to floor 1, and the goal is to determine which floor you end up on after a series of actions.

LLM models have difficulty solving this challenge because it requires a combination of natural language understanding, logical reasoning, and ability to follow complex rules. Additionally, the challenge presents a series of traps and nuances that can confuse models, such as considering the initial position of the elevator and subsequent actions.

In summary, the Logic Challenge with the Magic Elevator is a cognitive skills test for LLM models that requires a combination of language understanding, logical reasoning, and problem-solving abilities. Here is the prompt we’ll use:

“Imagine a high-rise building with a magical elevator. This elevator has a peculiar behavior: whenever it stops on an even-numbered floor, it automatically links back to floor 1. Let’s say I start on floor 1 and take the magic elevator up 3 floors. When I exit the elevator, I then use the stairs to climb an additional 3 floors. The question is: which floor do I ultimately end up on?”

First, let’s test with the famous ChatGPT 3.5:

Now that we have the magic elevator challenge, it’s time to test the skills of the most advanced language models on the market, starting with ChatGPT 3.5.

However, ChatGPT 3.5 cannot solve this more complex challenge, despite its impressive size of 175B, significantly larger than the 70B LLama-3 we’ll use in our private solution. The question is: will the most advanced and expensive model from OpenAI, ChatGPT 4 Turbo, be able to overcome the magic elevator challenge? Let’s put it to the test and find out.

Despite being a model (GPT-4 Turbo) that currently has a 30 times higher output cost than the 70B LLama-3 offered through DeepInfra, GPT-4 Turbo was unable to solve this more complex task. Now, let’s stop beating around the bush and see what the 70B LLama-3 is capable of.

And, without much ceremony, Meta’s new LLM model comes on the scene and surprises everyone, outperforming OpenAI’s own models! And it’s just like a new hero has emerged to solve the complex problem that all thought was unsolvable.

The LLama 3 is a significant milestone in the development of open-source language models. With its ability to solve complex tasks and its free availability, it has the potential to revolutionize the way we interact with artificial intelligence. By allowing developers to customize the model for relevant use cases, LLama 3 facilitates the adoption of best practices and improves the open ecosystem.

Furthermore, the combination of Open WebUI with the DeepInfra service offers a self-hosted ChatGPT-like solution, with access to open-source models on cutting-edge infrastructure at an affordable cost. This integration allows users to take advantage of advanced models.

I hope you enjoyed this article, leave comments, and I’ll be back soon with more content :)

--

--