DoRA for LLM Fine-Tuning explained

An alternate for LoRA for fine-tuning LLMs

Mehul Gupta
Data Science in your pocket

--

Generative AI is at its peak right now and the most sought-after topic is LLM Fine-Tuning where different firms/individuals are coming out with new LLMs versions by fine-tuning it according to their usecase.

In this post, we will be discussing a new technique introduced by Nvidia recently for fine-tuning LLMs called as DoRA (Weight-Decomposed Low-Rank Adaptation), an upgrade of LoRA (Low Rank Adaptation) fine-tuning.

My debut book: LangChain in your Pocket (Packt) is out now

So, for a Recap

LoRA fine-tuning technique significantly reduces the number of trainable parameters by introducing two low-rank matrices, A and B, which are much smaller than the original weight matrix W. This allows for fine-tuning without updating the entire model, making the process more computationally efficient while maintaining almost the same performance.

For a detailed study, read here:

Issues with LoRA

Though being the goto method for fine-tuning, there are certain issues that have been noticed by explorers.

  1. Limited adjustment capabilities: LoRA tends to make simpler, proportional changes to the model’s weights, increasing or decreasing them in a more straightforward manner. This means it struggles to make the subtle, nuanced adjustments that full fine-tuning can achieve.
  2. Difficulty with complex learning: LoRA has trouble simultaneously learning two important aspects of model updates — how much to change the weights (magnitude) and in which direction to change them. This dual learning task appears to be too complex for LoRA’s approach.

DoRA (not the explorer) to the rescue

If you’ve noticed, there was a mention of LoRA’s incapabilities of handling both “magnitude” & “direction” for weight updation together. DoRA’s main focus is on this particular issue and hence improving the learning capabilities of fine-tuned LLMs compared to LoRA.

How DoRA works?

  1. So, DoRA first of all breaks the weights matrix (W), using Matrix Decomposition, into 2 parts:

Magnitude: How large the weights are

Direction: Which way the weights are pointing

2. Both of these weight matrices are fine-tuned separately:

The Magnitude matrix is fine-tuned directly (similar to general fine-tuning)

The Direction matrix is fine-tuned using LoRA.

3. Once fine-tuned separately, they are again merged using Matrix multiplication. You get your weights matrix back!!

So, if you understand, methods like DoRA break down the learning process into two parts — the size of the updates and the direction of the updates. This lets them get a better handle on how the fine-tuning is going down, you feel me? They can then use this info to make more targeted tweaks to improve the learning power and stability of the training. It’s a smart way to really dive into what’s happening under the hood and fine-tune things more precisely.

How DoRA can be implemented?

I guess hands down HuggingFace has done some tremendous work in providing SOTA algorithms implementation for common users with easy to use integration. You have to do nothing, just

  • Copy the code for LoRA implementation from below blog as it is:
  • Add use_dora=True to your LoraConfig.

This particular snippet from LoRA’s code

peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
use_dora=True
)

That’s it, you can now fine-tune your LLM using DoRA

So, with this, we will wrap this post. See you next!!

--

--