Fine-Tuning vs. Human Guidance: SFT and RLHF in Language Model Tuning

Viraj Shah
3 min readDec 21, 2023

--

This article primarily focuses on briefly comparing Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).

Approach

  • RLHF involves an iterative approach: it trains a reward model using human feedback on the Language Model’s (LLM) output. This model is then used to enhance the LLM’s performance through reinforcement learning. However, it’s intricate, as it demands creating and training a distinct reward model. This task is often challenging as it involves managing various human preferences and addressing biases.
  • SFT involves direct training by refining the Language Model (LLM) directly on a carefully curated dataset containing annotated examples that depict the intended task or domain. This method is simpler, necessitating only labelled data and conventional training methods.

Complexity

  • RLHF tends to be computationally expensive due to the resources required for training and interacting with the reward model. Additionally, there’s a risk of instability because the optimisation in RL can be sensitive to inaccuracies in the reward model, potentially causing unexpected behaviour.
  • On the other hand, SFT is computationally cheaper as it is generally faster to train compared to RLHF. Moreover, it’s more stable because it’s less prone to unexpected behaviour since it directly trains on labelled data.

Outcome

  • RLHF has the potential for greater alignment with human preferences by producing more accurate and desired outputs when the reward model truly represents human values. However, this approach tends to limit output diversity, resulting in less creativity and surprise as the Language Model strives to maximise the reward signal.
  • On the other hand, while SFT might have lower performance on certain tasks compared to RLHF, particularly on intricate tasks, it generally maintains higher output diversity. This diversity stems from the Language Model’s inherent flexibility, allowing it to generate a wider range of creative responses.

Some additional factors to consider

  • Data quality forms the backbone for both approaches, but RLHF, in particular, is highly sensitive to biases and inaccuracies present within the reward model. Ensuring high-quality, diverse labelled data is essential for the success of both methods, but especially critical for RLHF due to its reliance on accurate human feedback to shape the reward model and subsequent learning process.
  • In specific applications demanding strict alignment with human values, such as crafting legal documents, RLHF might be the preferred method due to its ability to precisely adhere to these values based on the feedback-driven reward model. Conversely, for tasks that prioritise creativity and diverse outputs, like composing poetry or other open-ended endeavours, SFT could be more suitable because it retains the Language Model’s inherent flexibility, allowing for more varied and imaginative results.
  • Recent research trends indicate that with high-quality data, Supervised Fine Tuning (SFT) has shown the potential to achieve comparable or even superior results compared to Reinforcement Learning from Human Feedback (RLHF) in certain scenarios. This finding positions SFT as a more straightforward and efficient alternative in these particular cases, showcasing its capability to rival or outperform RLHF under specific conditions when equipped with excellent quality data.

Choosing between RLHF and SFT hinges on various factors like the task’s nature, available resources, and desired outcomes. Each method comes with its own set of strengths and weaknesses, making it essential to comprehend their disparities for fine-tuning Language Models (LLMs) effectively. Assessing the specific requirements alongside the pros and cons of each approach helps in making an informed decision tailored to the task at hand.

I trust this concise explanation has provided clarity on the differences between RLHF and SFT and enabled you to make the right choice.

Hey! Look, you made it to end 🎉

Thank you for your time!

--

--