Install and Run Llama2 on Windows/WSL Ubuntu distribution in 1 hour, Llama2 is a large language model (LLM) released by Meta-Facebook AI

5 min readDec 17, 2023

Welcome to our comprehensive guide on setting up Llama2 on your local server.
The original text is on our Wordpress Blog : https://altf1be.wordpress.com/2023/12/17/install-and-run-llama2-on-windows-wsl-ubuntu-distribution-in-1-hour-llama2-is-a-large-language-model-llm-released-by-meta-facebook-ai

This tutorial is meticulously designed to walk you through the process of installing all necessary prerequisites to efficiently run Llama2, leveraging the robust capabilities of an Nvidia GPU 4080.

Let’s begin this journey to enhance your server’s capabilities with advanced GPU acceleration

Need assistance ? https://www.alt-f1.be

Clone the Github repository Llama
Download the Llama2 models
Install Ubuntu on WSL2 on Windows 10 — Windows 11
Install Conda on WSL
Install PyTorch using Conda
Update Conda packages and dependencies
Install an accelerated version of Scikit-learn
Create a Conda environment dedicated to LLama2
Prepare yourself to run the models
Run the Example Chat Completion on the llama-2–7b-chat model
Run the Example Text Completion on the llama-2–7b model
Server configuration
Links

Clone the Github repository Llama

Close the repository : https://github.com/facebookresearch/llama
/mnt/d/dev/gh/llama

The abovemetionned directory will be used along the tutorial. Use the directory that suits you.

Download the Llama2 models

Follow the process to download the models on your hard disk drive [I]

[I] Journey towards the usage of Llama, a large language model (LLM) released by Meta-Facebook

After a few hours the directory structure contains the model named “consolidated.00.pth” file

Install Ubuntu on WSL2 on Windows 10 — Windows 11

Windows Subsystem for Linux is a feature of Windows that allows developers to run a Linux environment without the need for a separate virtual machine or dual booting.
See https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux

Follow the tutorial on the Ubuntu portal : https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10#1-overview

Or, follow this shortcut:

Run the command prompt as administrator :
CTRL-ESC + input ‘cmd” + Enter
Run the command :
wsl –install
Reboot the machine
Start WSL
CTRL+ESC + input WSL
Set a login name
Set a password

Install Conda on WSL

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. See https://www.anaconda.com/

Read this tutorial if necessary : https://gist.github.com/kauffmanes/5e74916617f9993bc3479f401dfec7da

Here is a short version:

Create a directory where to store Anaconda
mkdir /mnt/d/dev/tools
cd /mnt/d/dev/tools
Choose the distribution that matches your CPU and Operation System version
https://repo.anaconda.com/archive/

Download the latest version of Anaconda into /mnt/d/dev/tools
wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
Run the installation script
bash Anaconda3–2023.09–0-Linux-x86_64.sh

Install PyTorch using Conda

PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. https://pytorch.org/

Copy the command line generated by PyTorch
https://pytorch.org/get-started/locally/

Run the command based on the command line generated here above
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Update Conda packages and dependencies

update the Conda package and its dependencies in the base environment
conda update -n base -c defaults conda

Install an accelerated version of Scikit-learn

Install the acceelerated version of Scikit-learn-intelex
conda install scikit-learn-intelex

Create a Conda environment dedicated to LLama2

Create a Llama2 environment running Python 3.11.5
conda create -n llama2 -python3=3.11.5
Check that your python version integrates Conda
which python

Prepare yourself to run the models

Move to the directory where you have cloned the github repository /mnt/d/dev/gh/llama
https://github.com/facebookresearch/llama
cd /mnt/d/dev/gh/llama
Install the python depencies — several Gbytes
pip install -e .

Now you should be ready to run the models!

Run the Example Chat Completion on the llama-2–7b-chat model

Run the command line described in the README.md of the Github repository : https://github.com/facebookresearch/llama/blob/main/README.md

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat/ --tokenizer_path tokenizer.model  --max_seq_len 512 --max_batch_size 6

It takes ~60 seconnds to display the results

Run the Example Text Completion on the llama-2–7b model

Run the command line described in the README.md of the Github repository : https://github.com/facebookresearch/llama/blob/main/README.md

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

It takes ~50 seconnds to display the results

Server configuration

Display Name NVIDIA GeForce RTX 4080 https://www.nvidia.com/fr-be/geforce/graphics-cards/40-series/rtx-4080/
System Manufacturer ASUS
Processor 13th Gen Intel(R) Core(TM) i9–13900K, 3000 Mhz, 24 Core(s), 32 Logical Processor(s) [intel]
BaseBoard Manufacturer ASUSTeK COMPUTER INC. [Asus]
BaseBoard Product ROG MAXIMUS Z790 EXTREME [ROG]
Disk Model Samsung SSD 870 EVO 2TB [Samsung]
Disk Model Samsung SSD 980 PRO 2TB [Samsung]

Links

[1] Research paper: https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models

[2] LLaMA: Open and Efficient Foundation Language Models (Paper Explained) : https://www.youtube.com/watch?v=E5OnoYF2oAk

[3] Github repository: https://github.com/facebookresearch/llama

[4] Form to receive the token: https://forms.gle/jk851eBVbX1m5TAv5

[5] Model card: https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md

[6] Wikipedia: https://en.wikipedia.org/wiki/LLaMA

[7] Saving and loading checkpoints (basic) : https://lightning.ai/docs/pytorch/stable/common/checkpointing_basic.html

[8] “What is a checkpoint? When a model is training, the performance changes as it continues to see more data. It is a best practice to save the state of a model throughout the training process. This gives you a version of the model, a checkpoint, at each key point during the development of the model. Once training has completed, use the checkpoint that corresponds to the best performance you found during the training process.” Source: https://lightning.ai/docs/pytorch/stable/common/checkpointing_basic.html

[9] “Tokenizers are one of the core components of the NLP pipeline. They serve one purpose: to translate text into data that can be processed by the model. Models can only process numbers, so tokenizers need to convert our text inputs to numerical data.” Source: https://huggingface.co/learn/nlp-course/chapter2/4?fw=tf

[10] Code llama commercial license — extract of the email

You’re all set to start building with Code Llama.

The models listed below are now available to you as a commercial license holder. By downloading a model, you are agreeing to the terms and conditions of the license, acceptable use policy and Meta’s privacy policy.

Model weights available:

CodeLlama-7b
CodeLlama-13b
CodeLlama-34b
CodeLlama-7b-Python
CodeLlama-13b-Python
CodeLlama-34b-Python
CodeLlama-7b-Instruct
CodeLlama-13b-Instruct
CodeLlama-34b-Instruct

[11] Llama 2 commercial license https://github.com/facebookresearch/llama/blob/main/LICENSE