What is Reflection Tuning for LLMs?

Improving LLM Fine-Tuning using Self Reflection

Mehul Gupta
Data Science in your pocket

--

Photo by Stephen Andrews on Unsplash

I’ve been in the Data Science industry for 5 years now and I’ve seen any topic evolve as fast as Generative AI. Almost every week, we’re having a major development coming up. This week’s major breakthrough is Reflection Tuning LLMs which has made Llama 3.1 70B model, the best open-sourced model so far by Fine-Tuning it using this technique.

Before we jump ahead,

What is Fine-Tuning?

Fine-tuning is the process of adapting a pre-trained LLMs to specific tasks or datasets by continuing its training on a smaller, specialized dataset.Here’s an example illustrating the process of fine-tuning for an LLM:

General fine-tuning sample


Input: 'Compose a narrative involving the theme of nature'
Output: '........'

Input: 'Identify elements in this passage: the house is blue'
Output: '......'

Reflection Fine-Tuning

An add on over general fine-tuning, Reflection Fine-Tuning

You are a world-class AI system, capable of complex reasoning and reflection. 
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.

Hence, a final Reflection Fine-tuning prompt may look like (you can change the tags according to your wish, but these yielded the best results)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and
then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point,
correct yourself inside <reflection> tags.
<|eot_id|><|start_header_id|>user<|end_header_id|>

what is 2+2?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

LLM’s output after Reflection

Reasoning Phase:

  • The model begins by generating its reasoning within <thinking> tags. This section contains the model's internal thought process as it analyzes the input query.
<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>

Error Detection and Correction:

  • Within the <thinking> section, the model may include <reflection> tags if it identifies any mistakes in its reasoning. This indicates that the model is capable of recognizing errors and will attempt to correct them before finalizing its answer.
<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>

Final Output:

  • Once the model is satisfied with its reasoning, it provides the final answer within <output> tags. This section presents the conclusion derived from the reasoning phase.
<output>
The answer is 4.
</output>

Hence, the final output is

<thinking>
I need to determine the sum of 2 and 2.
This is a simple arithmetic problem.
</thinking>

<thinking>
I initially thought the answer was 5, but that's incorrect.
The correct approach is to add the two numbers.
<reflection>
Correcting my previous thought: 2 + 2 = 4.
</reflection>
</thinking>

<output>
The answer is 4.
</output>

Reflection Fine Tuning is appearing to be a gun solution which has even made Llama3.1 70B the best open-sourced model so far by just reflection fine-tuning it. Do try out Reflection Tuning on other LLMs using unsloth and this prompt. You can check the Llama3.1 Reflection fine-tuned model below:

Hope this was useful, see you soon

--

--