Full parameters Fintune Mistral 7b v0.2 instructs on 96 Gb GPUs with accelrate and FSDP
In the rapidly advancing field of artificial intelligence, the fine-tuning of large language models has become a critical process for enhancing model performance and tailoring knowledge to specific tasks. One significant development in this area is the full parameters finetune of the Mistral 7b v0.2 model. This approach, distinguished from lighter methods such as LoRA or QLoRA finetuning, is essential for tasks that demand a deep integration and modification of the model’s knowledge base. Full parameter finetuning allows for comprehensive adjustments across the entire model, paving the way for substantial improvements in specialized capabilities and the integration of entirely new knowledge.
However, the implementation of such extensive finetuning processes presents notable challenges, primarily due to the hardware demands involved. High-performance computing resources, especially large GPUs, are crucial for handling the immense computational load. Given the scarcity and expense of such hardware, utilizing 96 GB GPUs like the A10G or L4 could be prohibitively costly and difficult for many organizations. To address this, we turn to innovative solutions like Fully Sharded Data Parallel (FSDP). FSDP optimizes memory usage across multiple devices, making it possible to undertake such extensive training on more accessible hardware. By employing FSDP with a configuration of four A10G or L4 GPUs, we can effectively distribute the workload, enabling the full parameter finetuning of Mistral 7b v0.2 without requiring prohibitively expensive single units of hardware.
So let’s begin
I’m using Lightining AI Platform for this which is provide easy and cheap access to GPUs such as 4 A10G GPUs
I’ll also use and mofdify the implementation of zephyer 7b model by HuggingFace Alignment-handbook
1- open studio in Lightingn AI and Choose 4 A10G Gpus
PS* this method works also on 4 L4 gpu 24gb each
2- open termenal and write write these:git clone
then
cd alignment-handbook
and follow the instalation guide here
Now lets creat FSDP configuration File because it is not exist there yet
Create the yaml file in the dirctory:
alignment-handbook/recipes/accelerate_configs/fsdp.yaml
Now edit the SFT training confugration
in the directory:
alignment-handbook/recipes/zephyr-7b-beta/sft/config_full.yaml
now let’s back to terminal again and run this:
run this command with yout HF token keys
huggingface-cli login — token YOUR_TOKEN
wand: optional for monitoring training on wand :
wandb login YOUR_TOKEN
now lets start finetuning
ACCELERATE_LOG_LEVEL=info accelerate launch — config_file recipes/accelerate_configs/fsdp.yaml scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_full.yaml
The model now should start fintuning
Notes:
- This method could be slow so it works well if your dataset is not so big
- You can see the result after 1 epoch
For more details
Contact: Linkedin