Install and Run Llama2 on Windows/WSL Ubuntu distribution in 1 hour, Llama2 is a large language model (LLM) released by Meta-Facebook AI

Abdelkrim
5 min readDec 17, 2023

--

Welcome to our comprehensive guide on setting up Llama2 on your local server.
The original text is on our Wordpress Blog : https://altf1be.wordpress.com/2023/12/17/install-and-run-llama2-on-windows-wsl-ubuntu-distribution-in-1-hour-llama2-is-a-large-language-model-llm-released-by-meta-facebook-ai

This tutorial is meticulously designed to walk you through the process of installing all necessary prerequisites to efficiently run Llama2, leveraging the robust capabilities of an Nvidia GPU 4080.

Let’s begin this journey to enhance your server’s capabilities with advanced GPU acceleration

Need assistance ? https://www.alt-f1.be

Table of contents

  1. Clone the Github repository Llama
  2. Download the Llama2 models
  3. Install Ubuntu on WSL2 on Windows 10 — Windows 11
  4. Install Conda on WSL
  5. Install PyTorch using Conda
  6. Update Conda packages and dependencies
  7. Install an accelerated version of Scikit-learn
  8. Create a Conda environment dedicated to LLama2
  9. Prepare yourself to run the models
  10. Run the Example Chat Completion on the llama-2–7b-chat model
  11. Run the Example Text Completion on the llama-2–7b model
  12. Server configuration
  13. Links

Clone the Github repository Llama

  1. Close the repository : https://github.com/facebookresearch/llama
  2. /mnt/d/dev/gh/llama

The abovemetionned directory will be used along the tutorial. Use the directory that suits you.

Download the Llama2 models

Follow the process to download the models on your hard disk drive [I]

[I] Journey towards the usage of Llama, a large language model (LLM) released by Meta-Facebook

After a few hours the directory structure contains the model named “consolidated.00.pth” file

Install Ubuntu on WSL2 on Windows 10 — Windows 11

Windows Subsystem for Linux is a feature of Windows that allows developers to run a Linux environment without the need for a separate virtual machine or dual booting.
See https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux

Follow the tutorial on the Ubuntu portal : https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10#1-overview

Or, follow this shortcut:

  1. Run the command prompt as administrator :
  2. CTRL-ESC + input ‘cmd” + Enter
  3. Run the command :
  4. wsl –install
  5. Reboot the machine
  6. Start WSL
  7. CTRL+ESC + input WSL
  8. Set a login name
  9. Set a password

Install Conda on WSL

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. See https://www.anaconda.com/

Read this tutorial if necessary : https://gist.github.com/kauffmanes/5e74916617f9993bc3479f401dfec7da

Here is a short version:

  1. Create a directory where to store Anaconda
  2. mkdir /mnt/d/dev/tools
    cd /mnt/d/dev/tools
  3. Choose the distribution that matches your CPU and Operation System version
  4. https://repo.anaconda.com/archive/
  1. Download the latest version of Anaconda into /mnt/d/dev/tools
  2. wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
  3. Run the installation script
  4. bash Anaconda3–2023.09–0-Linux-x86_64.sh

Install PyTorch using Conda

PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. https://pytorch.org/

  1. Copy the command line generated by PyTorch
  2. https://pytorch.org/get-started/locally/
  1. Run the command based on the command line generated here above
  2. conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Update Conda packages and dependencies

  1. update the Conda package and its dependencies in the base environment
  2. conda update -n base -c defaults conda

Install an accelerated version of Scikit-learn

  1. Install the acceelerated version of Scikit-learn-intelex
  2. conda install scikit-learn-intelex

Create a Conda environment dedicated to LLama2

  1. Create a Llama2 environment running Python 3.11.5
  2. conda create -n llama2 -python3=3.11.5
  3. Check that your python version integrates Conda
  4. which python

Prepare yourself to run the models

  1. Move to the directory where you have cloned the github repository /mnt/d/dev/gh/llama
    https://github.com/facebookresearch/llama
  2. cd /mnt/d/dev/gh/llama
  3. Install the python depencies — several Gbytes
  4. pip install -e .

Now you should be ready to run the models!

Run the Example Chat Completion on the llama-2–7b-chat model

Run the command line described in the README.md of the Github repository : https://github.com/facebookresearch/llama/blob/main/README.md

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat/ --tokenizer_path tokenizer.model  --max_seq_len 512 --max_batch_size 6

It takes ~60 seconnds to display the results

Run the Example Text Completion on the llama-2–7b model

Run the command line described in the README.md of the Github repository : https://github.com/facebookresearch/llama/blob/main/README.md

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4

It takes ~50 seconnds to display the results

Server configuration

Links

[1] Research paper: https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models

[2] LLaMA: Open and Efficient Foundation Language Models (Paper Explained) : https://www.youtube.com/watch?v=E5OnoYF2oAk

[3] Github repository: https://github.com/facebookresearch/llama

[4] Form to receive the token: https://forms.gle/jk851eBVbX1m5TAv5

[5] Model card: https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md

[6] Wikipedia: https://en.wikipedia.org/wiki/LLaMA

[7] Saving and loading checkpoints (basic) : https://lightning.ai/docs/pytorch/stable/common/checkpointing_basic.html

[8] “What is a checkpoint? When a model is training, the performance changes as it continues to see more data. It is a best practice to save the state of a model throughout the training process. This gives you a version of the model, a checkpoint, at each key point during the development of the model. Once training has completed, use the checkpoint that corresponds to the best performance you found during the training process.” Source: https://lightning.ai/docs/pytorch/stable/common/checkpointing_basic.html

[9] “Tokenizers are one of the core components of the NLP pipeline. They serve one purpose: to translate text into data that can be processed by the model. Models can only process numbers, so tokenizers need to convert our text inputs to numerical data.” Source: https://huggingface.co/learn/nlp-course/chapter2/4?fw=tf

[10] Code llama commercial license — extract of the email

You’re all set to start building with Code Llama.

The models listed below are now available to you as a commercial license holder. By downloading a model, you are agreeing to the terms and conditions of the license, acceptable use policy and Meta’s privacy policy.

Model weights available:

  • CodeLlama-7b
  • CodeLlama-13b
  • CodeLlama-34b
  • CodeLlama-7b-Python
  • CodeLlama-13b-Python
  • CodeLlama-34b-Python
  • CodeLlama-7b-Instruct
  • CodeLlama-13b-Instruct
  • CodeLlama-34b-Instruct

[11] Llama 2 commercial license https://github.com/facebookresearch/llama/blob/main/LICENSE

--

--

Abdelkrim

@altf1be @stratexapp @eventspulseapp Founder, @amiasystems Founder & Investor & BD, @SolvayPontsMBA, @Solvayalumni, http://www.asbl-semaphore.org, @coderdojobe