Running Llama 2 on Your Local GPU

3 min readJul 23, 2023

Meta and Microsoft recently introduced the Next Generation of Llama (Llama 2) on July 18, 2023 and it was since integrated into Hugging Face ecosystem. In this post, I’ll guide you through the minimum steps to set up Llama 2 on your local machine, assuming you have a medium-spec GPU like the RTX 3090.

Hugging Face recommends using 1x Nvidia A10G for Llama 7B models. Since I have an RTX 3090 on my desk, which is slightly better than Nvidia A10G, I assume it will work well with the Llama 2 7B models.

Get Accesses to the Llama models

First sign up at https://ai.meta.com/resources/models-and-libraries/llama-downloads/ to get approval for model download. Next, obtain a User Access Tokenfrom Hugging Face. Once you’ve completed all the steps, you should have your User Access Tokenready for the next stage. Mine looks like this:

Run Llama 2 model on your local environment

My local environment:

OS: Ubuntu 20.04.5 LTS
Hardware:
 CPU: 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz
 Memory: 16GB
 GPU: RTX 3090 (24GB)

I followed James Briggs’ excellent YouTube video Llama 2 in LangChain — FIRST Open Source Conversational Agent! and made some modifications to the initial part, creating a minimum…

Running Llama 2 on Your Local GPU

Get Accesses to the Llama models

Run Llama 2 model on your local environment

Written by Lei Shang