Getting started with LLMs

Matt Proetsch
NightShift Codes
Published in
7 min readFeb 16, 2024

Large language models (LLMs) are a new class of AI models which power modern, realistic chat experiences like ChatGPT. While OpenAI’s proprietary, state-of-the-art LLM is accessible only from their website, many other organizations have made their LLMs freely available, and you can even run one on your computer and chat with it! In this article, we will walk through setting up your machine to run LLMs starting from zero. By the end of this post, you’ll be chatting with a fully-customizable open-source LLM running on your computer.

What’s an LLM?

An LLM, like OpenAI’s GPT-3.5/GPT-4 or Meta’s LLaMA 2, is an algorithm which reads some text (for instance, a conversational dialog) and generates some reasonable-looking output in order to continue the text (such as a response to the user’s most recent message).

We won’t get into the details of training and building an LLM, but let’s take a look at how to use them.

LLMs read the entire text input and then generate a single “token” as an output. Tokens are only a few characters long. On average, 2 tokens comprise a full English word. This token is then appended to the input and the result is run through the LLM again and again. This process repeats until the LLM generates the special token </s>, which indicates the end of output.

To see this in action, take a look at the diagram below. We have made the simplifying assumption that a single token represents a word. The colorful text at the end of each output shows the token generated by the LLM during that iteration, which we add to the end of the input and feed to the next iteration.

LLM generating a response token by token

Now, let’s set up your machine so that you can run one of these yourself!

First, we need to make sure your environment is set up and able to use your accelerator with all the necessary deep learning libraries installed. We’ll use Torch and HuggingFace Transformers.

Set up your environment

The method to follow will depend on your operating system (OS).

1. What’s your OS?

MacOS (13.0 and greater)
Get started by installing brew , the community-driven package manager for Mac. Open a terminal and run the command below, which will install or update your brew installation.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Next we’ll use brew to install pipx and then poetry, which will help us manage the environment containing the LLM libraries.

pipx provides a way to quickly run Python-based apps in a sandbox which isolates them from the rest of your system. Think npx , but for Python. We’ll use it to run poetry , a tool which lets us replace requirements.txt and the old distutils/setuptools method of package installation with a single pyproject.toml file containing our package installation instructions.

brew install pipx       # Install pipx to manage our poetry installation
pipx ensurepath # pipx installs to $HOME/.local/bin
source ~/.zshrc # Update $PATH in current shell without restart
pipx install poetry # Install poetry to manage environment

If everything installed correctly, try running poetry --version and you should see an output like Poetry (version 1.7.1) displayed to the console.

Next, we’ll finish building our environment. If you see a dialog box on this step asking to install command-line developer tools on MacOS, press Install and follow the instructions (defaults are fine).

git clone https://github.com/nightshift-codes/llm-environment.git
cd llm-environment
poetry install

This clones the repo https://github.com/NightShift-Codes/llm-environment and installs the dependencies like torch and transformers into a virtual environment, which is an isolated set of packages that can safely be installed without affecting the rest of your system.

Finally, we’re ready to boot up the environment!

./start.sh

This will start a Jupyter Server with everything installed and opens a browser window into the Jupyter Lab environment. If you are prompted for a password, enter luna.

Keep this terminal app running to keep the Jupyter Server up. To stop it, close the terminal or press Ctrl+C twice.

Now you can jump down to the First Steps with an LLM section!

Linux/Windows with Nvidia GPU (or CPU-only)
For Linux and Windows users, we’ll run a Docker image with all the dependencies in a container image to simplify setup. This lets us run the LLM code without affecting the rest of your system.

First, make sure you have the latest Nvidia driver for your platform.

Next, install git :

  • Linux (Debian/Ubuntu): sudo apt update && sudo apt install -y git
  • Linux (RHEL/Fedora/CentOS): sudo dnf install -y git
  • Windows: Download installer

Lastly, install Docker by following the setup instructions for your platform.

After installing Docker, open a terminal and run the following command to pull the base image that we will be working with:

docker pull nightshiftcodes/llm-environment:intro

If you get an error like error during connect: or cannot connect to the Docker daemon at <...> :

Linux: Running systemctl status docker should output running. If not, run systemctl start docker. You may also want to run systemctl enable docker so that Docker starts on login.

Windows: Make sure Docker is running in the system tray. If you don’t see the Docker icon, launch it from the Start menu.

Next, pull the companion repository for this post and cd into it:

git clone https://github.com/nightshift-codes/llm-environment.git

cd llm-environment

Finally, let’s run the image and connect to it in your browser.

On Linux bash with Nvidia GPU (omit the --gpus all if you have no GPU):

docker run -it --rm \
-v ./notebooks:/home/user/llm-environment/notebooks \
--network host \
--gpus all \
nightshiftcodes/llm-environment:intro

Or, on Windows cmd with Nvidia GPU (omit --gpus all if you have no GPU):

docker run -it --rm ^
-v .\notebooks:/home/user/llm-environment/notebooks ^
-p 8888:8888 ^
--gpus all ^
nightshiftcodes/llm-environment:intro --ip 0.0.0.0

If you get an error like docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]] :

Linux: Make sure you have the latest Nvidia driver, then install the Nvidia container toolkit and restart your system.

Windows: Make sure you have the latest Nvidia driver. Then, run:
wsl --update
wsl --set-default-version 2
wsl --shutdown
Then quit and restart Docker.

Now open a browser window and go to http://localhost:8888 and enter the password luna to log in and see the Jupyter Lab interface.

Now that you have Jupyter Lab configured and running, scroll down to First Steps with an LLM!

Linux with AMD GPU
PyTorch recently added support for AMD GPUs to accelerate neural network operations using ROCm. Instructions for installing PyTorch with ROCm support are provided here. In the selection grid, choose Stable, Linux, Pip, Python, ROCm. After you install, clone this repo, cd into it, then run:
pip install transformers jupyterlab ipywidgets
Lastly, cd notebooks/ and launch Jupyter Lab by running jupyter lab .

2. First steps with an LLM

After you log into Jupyter Lab (password: luna), you should see a screen that looks like the following:

Jupyter Lab

The files you see in the web UI are also available in the notebooks/ folder under the current directory of your terminal. Open the llm-test.ipynb notebook and run all the cells by selecting the first one and pressing Shift+Enter until you reach the “Try other models” section.

In the cell %run platform_settings.py , if you get the error “UserError: CUDA initialization: CUDA unknown error: …”, try restarting your machine.

This will:

  1. Import the deep learning libraries
  2. Build a HuggingFace pipeline to run the LLM
  3. Start a new chat with the LLM and ask it some questions
bot.send_user_chat_message("What's the square root of 49?")

My model gave an output like:

The square root of 49 is 4.2861996813799.

(It’s not.)

Your outputs may be different! There is some “temperature”, or randomness, when sampling outputs from an LLM which we’ll discuss in a later post.

Generating text with an LLM as part of a chat

Call bot.send_user_chat_message("...") with your own messages to change the inputs sent to the LLM and receive a response. It will remember your chat history until you reset it by calling bot.start_new_chat().

We will come back to this environment in a later post. Your notebooks will be saved under the notebooks/ folder in your OS.

Leave a comment with your outputs, or with the results when you change the prompt! Then go back to the notebook and follow the instructions under the “Try other models” section. Do you get better results with another model? Keep in mind that if you’re using an accelerator, you may run out of memory when running larger models.

3. Ready to go!

In the next article, we’ll add code to allow asking questions from your documents (PDF, DOCX, text files, images, and more) using Retrieval-Augmented Generation (RAG):

  1. Tokenizing and embedding your files
  2. Storing them in a vector database
  3. Modifying the prompt to use context from the vector database

After that, we’ll build this all into a web app and demonstrate deploying it to the cloud so that it can be used as a standalone chatbot, or used as part of another piece of software within your company.

Resources
Repo: https://github.com/NightShift-Codes/llm-environment
Docker images: https://hub.docker.com/r/nightshiftcodes/llm-environment

--

--

Matt Proetsch
NightShift Codes

Programmer, data enthusiast, co-founder of @nightshiftcodes