What is Reflection Tuning for LLMs?
Improving LLM Fine-Tuning using Self Reflection
I’ve been in the Data Science industry for 5 years now and I’ve seen any topic evolve as fast as Generative AI. Almost every week, we’re having a major development coming up. This week’s major breakthrough is Reflection Tuning LLMs which has made Llama 3.1 70B model, the best open-sourced model so far by Fine-Tuning it using this technique.
Before we jump ahead,
What is Fine-Tuning?
Fine-tuning is the process of adapting a pre-trained LLMs to specific tasks or datasets by continuing its training on a smaller, specialized dataset.Here’s an example illustrating the process of fine-tuning for an LLM:
General fine-tuning sample
Input: 'Compose a narrative involving the theme of nature'
Output: '........'
Input: 'Identify elements in this passage: the house is blue'
Output: '......'
Reflection Fine-Tuning
An add on over general fine-tuning, Reflection Fine-Tuning
You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.
Hence, a final Reflection Fine-tuning prompt may look like (you can change the tags according to your wish, but these yielded the best results)
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>
what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
LLM’s output after Reflection
Reasoning Phase:
- The model begins by generating its reasoning within
<thinking>
tags. This section contains the model's internal thought process as it analyzes the input query.
<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>
Error Detection and Correction:
- Within the
<thinking>
section, the model may include<reflection>
tags if it identifies any mistakes in its reasoning. This indicates that the model is capable of recognizing errors and will attempt to correct them before finalizing its answer.
<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>
Final Output:
- Once the model is satisfied with its reasoning, it provides the final answer within
<output>
tags. This section presents the conclusion derived from the reasoning phase.
<output>
The answer is 4.
</output>
Hence, the final output is
<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>
<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>
<output>
The answer is 4.
</output>
Reflection Fine Tuning is appearing to be a gun solution which has even made Llama3.1 70B the best open-sourced model so far by just reflection fine-tuning it. Do try out Reflection Tuning on other LLMs using unsloth and this prompt. You can check the Llama3.1 Reflection fine-tuned model below:
Hope this was useful, see you soon