Meta and Microsoft recently introduced the Next Generation of Llama (Llama 2) on July 18, 2023 and it was since integrated into Hugging Face ecosystem. In this post, I’ll guide you through the minimum steps to set up Llama 2 on your local machine, assuming you have a medium-spec GPU like the RTX 3090.
Hugging Face recommends using 1x Nvidia A10G for Llama 7B models
. Since I have an RTX 3090 on my desk, which is slightly better than Nvidia A10G, I assume it will work well with the Llama 2 7B models.
Get Accesses to the Llama models
First sign up at https://ai.meta.com/resources/models-and-libraries/llama-downloads/ to get approval for model download. Next, obtain a User Access Token
from Hugging Face. Once you’ve completed all the steps, you should have your User Access Token
ready for the next stage. Mine looks like this:
Run Llama 2 model on your local environment
My local environment:
OS: Ubuntu 20.04.5 LTS
Hardware:
CPU: 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz
Memory: 16GB
GPU: RTX 3090 (24GB)
I followed James Briggs’ excellent YouTube video Llama 2 in LangChain — FIRST Open Source Conversational Agent! and made some modifications to the initial part, creating a minimum…