Full parameters Fintune Mistral 7b v0.2 instructs on 96 Gb GPUs with accelrate and FSDP

Mustafa Alahmid
2 min readApr 8, 2024

--

In the rapidly advancing field of artificial intelligence, the fine-tuning of large language models has become a critical process for enhancing model performance and tailoring knowledge to specific tasks. One significant development in this area is the full parameters finetune of the Mistral 7b v0.2 model. This approach, distinguished from lighter methods such as LoRA or QLoRA finetuning, is essential for tasks that demand a deep integration and modification of the model’s knowledge base. Full parameter finetuning allows for comprehensive adjustments across the entire model, paving the way for substantial improvements in specialized capabilities and the integration of entirely new knowledge.

However, the implementation of such extensive finetuning processes presents notable challenges, primarily due to the hardware demands involved. High-performance computing resources, especially large GPUs, are crucial for handling the immense computational load. Given the scarcity and expense of such hardware, utilizing 96 GB GPUs like the A10G or L4 could be prohibitively costly and difficult for many organizations. To address this, we turn to innovative solutions like Fully Sharded Data Parallel (FSDP). FSDP optimizes memory usage across multiple devices, making it possible to undertake such extensive training on more accessible hardware. By employing FSDP with a configuration of four A10G or L4 GPUs, we can effectively distribute the workload, enabling the full parameter finetuning of Mistral 7b v0.2 without requiring prohibitively expensive single units of hardware.

So let’s begin

I’m using Lightining AI Platform for this which is provide easy and cheap access to GPUs such as 4 A10G GPUs

I’ll also use and mofdify the implementation of zephyer 7b model by HuggingFace Alignment-handbook

1- open studio in Lightingn AI and Choose 4 A10G Gpus

PS* this method works also on 4 L4 gpu 24gb each

2- open termenal and write write these:git clone

https://github.com/huggingface/alignment-handbook.git

then

cd alignment-handbook

and follow the instalation guide here

Now lets creat FSDP configuration File because it is not exist there yet

Create the yaml file in the dirctory:

alignment-handbook/recipes/accelerate_configs/fsdp.yaml

Now edit the SFT training confugration

in the directory:

alignment-handbook/recipes/zephyr-7b-beta/sft/config_full.yaml

now let’s back to terminal again and run this:

run this command with yout HF token keys

huggingface-cli login — token YOUR_TOKEN

wand: optional for monitoring training on wand :

wandb login YOUR_TOKEN

now lets start finetuning

ACCELERATE_LOG_LEVEL=info accelerate launch — config_file recipes/accelerate_configs/fsdp.yaml scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_full.yaml

The model now should start fintuning

Notes:

  • This method could be slow so it works well if your dataset is not so big
  • You can see the result after 1 epoch

For more details

Contact: Linkedin

--

--

Mustafa Alahmid

Master degree in CS, interested in Machine learning and deep learning