Install and Run Llama2 on Windows/WSL Ubuntu distribution in 1 hour, Llama2 is a large language model (LLM) released by Meta-Facebook AI
Welcome to our comprehensive guide on setting up Llama2 on your local server.
The original text is on our Wordpress Blog : https://altf1be.wordpress.com/2023/12/17/install-and-run-llama2-on-windows-wsl-ubuntu-distribution-in-1-hour-llama2-is-a-large-language-model-llm-released-by-meta-facebook-ai
This tutorial is meticulously designed to walk you through the process of installing all necessary prerequisites to efficiently run Llama2, leveraging the robust capabilities of an Nvidia GPU 4080.
Let’s begin this journey to enhance your server’s capabilities with advanced GPU acceleration
Need assistance ? https://www.alt-f1.be
Table of contents
- Clone the Github repository Llama
- Download the Llama2 models
- Install Ubuntu on WSL2 on Windows 10 — Windows 11
- Install Conda on WSL
- Install PyTorch using Conda
- Update Conda packages and dependencies
- Install an accelerated version of Scikit-learn
- Create a Conda environment dedicated to LLama2
- Prepare yourself to run the models
- Run the Example Chat Completion on the llama-2–7b-chat model
- Run the Example Text Completion on the llama-2–7b model
- Server configuration
- Links
Clone the Github repository Llama
- Close the repository : https://github.com/facebookresearch/llama
- /mnt/d/dev/gh/llama
The abovemetionned directory will be used along the tutorial. Use the directory that suits you.
Download the Llama2 models
Follow the process to download the models on your hard disk drive [I]
[I] Journey towards the usage of Llama, a large language model (LLM) released by Meta-Facebook
After a few hours the directory structure contains the model named “consolidated.00.pth” file
Install Ubuntu on WSL2 on Windows 10 — Windows 11
Windows Subsystem for Linux is a feature of Windows that allows developers to run a Linux environment without the need for a separate virtual machine or dual booting.
See https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux
Follow the tutorial on the Ubuntu portal : https://ubuntu.com/tutorials/install-ubuntu-on-wsl2-on-windows-10#1-overview
Or, follow this shortcut:
- Run the command prompt as administrator :
- CTRL-ESC + input ‘cmd” + Enter
- Run the command :
- wsl –install
- Reboot the machine
- Start WSL
- CTRL+ESC + input WSL
- Set a login name
- Set a password
Install Conda on WSL
Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. See https://www.anaconda.com/
Read this tutorial if necessary : https://gist.github.com/kauffmanes/5e74916617f9993bc3479f401dfec7da
Here is a short version:
- Create a directory where to store Anaconda
- mkdir /mnt/d/dev/tools
cd /mnt/d/dev/tools - Choose the distribution that matches your CPU and Operation System version
- https://repo.anaconda.com/archive/
- Download the latest version of Anaconda into /mnt/d/dev/tools
- wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
- Run the installation script
- bash Anaconda3–2023.09–0-Linux-x86_64.sh
Install PyTorch using Conda
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. https://pytorch.org/
- Copy the command line generated by PyTorch
- https://pytorch.org/get-started/locally/
- Run the command based on the command line generated here above
- conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
Update Conda packages and dependencies
- update the Conda package and its dependencies in the base environment
- conda update -n base -c defaults conda
Install an accelerated version of Scikit-learn
- Install the acceelerated version of Scikit-learn-intelex
- conda install scikit-learn-intelex
Create a Conda environment dedicated to LLama2
- Create a Llama2 environment running Python 3.11.5
- conda create -n llama2 -python3=3.11.5
- Check that your python version integrates Conda
- which python
Prepare yourself to run the models
- Move to the directory where you have cloned the github repository /mnt/d/dev/gh/llama
https://github.com/facebookresearch/llama - cd /mnt/d/dev/gh/llama
- Install the python depencies — several Gbytes
- pip install -e .
Now you should be ready to run the models!
Run the Example Chat Completion on the llama-2–7b-chat model
Run the command line described in the README.md of the Github repository : https://github.com/facebookresearch/llama/blob/main/README.md
torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat/ --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6
It takes ~60 seconnds to display the results
Run the Example Text Completion on the llama-2–7b model
Run the command line described in the README.md of the Github repository : https://github.com/facebookresearch/llama/blob/main/README.md
torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4
It takes ~50 seconnds to display the results
Server configuration
- Display Name NVIDIA GeForce RTX 4080 https://www.nvidia.com/fr-be/geforce/graphics-cards/40-series/rtx-4080/
- System Manufacturer ASUS
- Processor 13th Gen Intel(R) Core(TM) i9–13900K, 3000 Mhz, 24 Core(s), 32 Logical Processor(s) [intel]
- BaseBoard Manufacturer ASUSTeK COMPUTER INC. [Asus]
- BaseBoard Product ROG MAXIMUS Z790 EXTREME [ROG]
- Disk Model Samsung SSD 870 EVO 2TB [Samsung]
- Disk Model Samsung SSD 980 PRO 2TB [Samsung]
Links
[1] Research paper: https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models
[2] LLaMA: Open and Efficient Foundation Language Models (Paper Explained) : https://www.youtube.com/watch?v=E5OnoYF2oAk
[3] Github repository: https://github.com/facebookresearch/llama
[4] Form to receive the token: https://forms.gle/jk851eBVbX1m5TAv5
[5] Model card: https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md
[6] Wikipedia: https://en.wikipedia.org/wiki/LLaMA
[7] Saving and loading checkpoints (basic) : https://lightning.ai/docs/pytorch/stable/common/checkpointing_basic.html
[8] “What is a checkpoint? When a model is training, the performance changes as it continues to see more data. It is a best practice to save the state of a model throughout the training process. This gives you a version of the model, a checkpoint, at each key point during the development of the model. Once training has completed, use the checkpoint that corresponds to the best performance you found during the training process.” Source: https://lightning.ai/docs/pytorch/stable/common/checkpointing_basic.html
[9] “Tokenizers are one of the core components of the NLP pipeline. They serve one purpose: to translate text into data that can be processed by the model. Models can only process numbers, so tokenizers need to convert our text inputs to numerical data.” Source: https://huggingface.co/learn/nlp-course/chapter2/4?fw=tf
[10] Code llama commercial license — extract of the email
You’re all set to start building with Code Llama.
The models listed below are now available to you as a commercial license holder. By downloading a model, you are agreeing to the terms and conditions of the license, acceptable use policy and Meta’s privacy policy.
Model weights available:
- CodeLlama-7b
- CodeLlama-13b
- CodeLlama-34b
- CodeLlama-7b-Python
- CodeLlama-13b-Python
- CodeLlama-34b-Python
- CodeLlama-7b-Instruct
- CodeLlama-13b-Instruct
- CodeLlama-34b-Instruct
[11] Llama 2 commercial license https://github.com/facebookresearch/llama/blob/main/LICENSE