Foundation Model Alignment with RAFTđŸ›¶ in LMFlow

OptimalScale
9 min readApr 17, 2023

Introduction

General-purpose foundation models, especially large language models (LLMs) such as ChatGPT, have demonstrated extraordinary capabilities in performing various tasks that were once challenging. However, we believe that one model cannot rule them all. Further fine-tuning is necessary to achieve better performance in specialized tasks or domains. The standard approaches for fine-tuning these models include:

  • Continuous pretraining on specific domains so that LLMs can acquire knowledge in those domains
  • Task tuning on specific tasks so that LLMs can deal with downstream tasks
  • Instruction tuning to endow LLMs the ability to comply with specialized natural language instructions and complete tasks required by those instructions
  • Alignment tuning to teach LLMs conversational skills in accordance with human preferences.

Alignment, in particular, is crucial for ensuring the safety of LLMs before deployment in the real world. Today we introduce a new alignment algorithm RAFT [1] which is more effective than traditional methods such as PPO. RAFT mitigates the issue of bias that could emerge in LLM responses. Using RAFT for aligning LLMs offers numerous benefits, including the ability to disentangle unwanted biases from the LLM’s language production while maintaining fluency levels consistently.

Check out the paper https://arxiv.org/abs/2304.06767.

Its implementation is available from https://github.com/OptimalScale/LMFlow.

RAFT Alignment

Alignment is a critical aspect of training large language models (LLMs) like ChatGPT. One key benefit of alignment is that it helps the model conform to human language habits, improving its performance in tasks such as question answering.

A common approach for alignment involves using reinforcement learning with human feedback (RLHF), as described in InstructGPT [2]. In this approach, human labeled data is used to train a reward model. A reinforcement learning algorithm (e.g., PPO) is then used to adjust the model’s behavior according to the reward model. However, PPO and other reinforcement learning algorithms heavily rely on backpropagation, resulting in high training costs and instability.

To address these issues, we proposed a new alignment algorithm called RAFT (Reward rAnked Fine-Tuning), which uses sample ranking to select the most preferred samples from large models (or samples that align with human values/objective facts), aimed at training AI models that are more human-friendly.

This approach improves the quality of alignment. It is more efficient and stable in training, and it is also easier to implement. We have tested RAFT on both large language models and diffusion models, verifying its effectiveness in question answering and text-to-image generation tasks.

Algorithm Details

Specifically, RAFT is composed of three core steps:

(1) Data collection: To collect candidate samples before ranking, we can simply use the training generative model as the generator. Furthermore, in order to improve the diversity of generative data, we can also combine sampled results from other pre-trained experts (e.g., LLaMA, ChatGPT, or even human).

(2) Data ranking: Similar to RLHF, we have a classifier or regressor to calculate reward aligned with the target demand. Based on such reward models, we rank the candidate samples and select those with higher reward, which means they better meet human needs.

(3) Model fine-tuning: the samples that best meet human needs are used to fine-tune the model, so that the trained model can match human needs.

Notably, RAFT does not require calculating gradients for every single sampling point. Given a fixed number of data that are used for fine-tuning, RAFT performs more forward passes of sampling and then filters out most low-quality data by the reward function, which makes the model more stable and robust. At the same time, in some cases, due to the lower sensitivity of supervised fine-tuning to hyperparameters and more robust convergence, under the same reward conditions, we found that RAFT can have better perplexity (corresponding to better generation diversity and fluency).

The full algorithm is shown as follows:

We performed experiments on a range of tasks to evaluate the effectiveness of RAFT.

Firstly, we evaluated the performance in completing positive movie reviews. Before fine-tuning, LLaMA’s output movie reviews were random and occasionally negative. However, after fine-tuning with RAFT, it excelled at generating more positive, fluent movie reviews when given a starting sentence for the review. As shown in the figure below, unadjusted movie reviews by LLaMA would randomly output positive and negative reviews, while both RAFT and PPO were able to incline towards positive reviews.

The authors also created a psychological companion robot based on Vicuna. The authors simulate a conversation between a person who is feeling down due to failing an exam and the robot. Before using RAFT for alignment (left image), the model claimed to have no emotions or feelings and refused to be friends with humans. However, after RAFT alignment (right image), the model’s empathetic abilities were significantly enhanced and it repeatedly comforted the human by saying, “Although I am an AI, I will try my best to be your friend.”

Vicuna-13B
RAFT-Aligned Vicuna-7B

In addition to evaluating RAFT’s effectiveness on language models, we also tested its ability to improve text-to-image generation in diffusion models. As it is well known, the original stable diffusion does not perform well at 256*256 resolution and PPO cannot be directly applied to stable diffusion models. In contrast, RAFT provides a natural way to improve it. After fine-tuning with RAFT, stable diffusion is able to generate good results. This is undoubtedly a benefit for AIGC enthusiasts with limited computing resources, as the time required for 256*256 resolution is only 20% of the original version. The following figure shows the results before and after fine-tuning with RAFT. As can be seen, prior to fine-tuning, stable diffusion struggled to generate good 256*256 resolution images, but the model was greatly improved in terms of image generation quality after fine-tuning.

In addition to improving the generation ability of 256*256 images, RAFT can also align the generated images with the prompts, enabling the model to generate images that better match the prompt description. As shown in the figure below, given the prompt “Monet style cat” the original stable diffusion generated pictures that mostly did not include a cat, but instead generated other works in the style of Monet. This was because cats are rarely seen in Monet’s works, and stable diffusion did not fully understand the meaning of the text. However, after fine-tuning with RAFT, stable diffusion was able to understand the concept of a “cat,” and so there is a cat in every generated image.

About LMFlow: An Extensible Toolkit for Fine-Tuning and Inference of Large Foundation Models

The LMFlow open-source project aims to establish a fully open research platform for large models, supporting various experiments with limited machine resources. The platform also aims to improve existing data utilization methods and optimize algorithm efficiency to develop a more efficient large model training system. The ultimate goal of the project is to help everyone train specialized large models under limited resources. Researchers and developers are interested in large models are welcome to help improve this open system. Please refer to the following link for project codes and evaluation results.

⭐ https://github.com/OptimalScale/LMFlow

LMFlow has a complete fine-tuning workflow for a large foundation model to support personalized training with limited computing resources. It supports the following essential features:

  • Continuous pretraining, task tuning, instruction tuning, and alignment tuning on datasets defined by the user.
  • Parameter-efficient fine-tuning with LoRA
  • A new alignment algorithm RAFT (Reward rAnked Fine Tuning), which streamlines the alignment pipeline for generative models.
  • A straightforward and easily adaptable API for developers.
  • A simplified model inference framework.

Based on a 7 billion parameter LLaMA model, it only takes one Nvidia 3090 GPU and five hours to train a personalized model. We used this framework to train a 33-billion-parameter version of LLaMA on a single machine and have released the model weights for academic research. The trained model weights can be immediately used for a question-and-answer service on the website (lmflow.com).

Using LMFlow, anyone can train their own personalized model! Each person can choose the appropriate model according to their available resources, for tasks such as Q&A, companionship, writing, translation, and expert consultations in various fields. The larger the model and data size, the longer the training time provided the better the results. Currently, we trained a 33B model and achieved comparable or even better performance than ChatGPT.

Tuning Workflow

LMFlow offers a complete solution for tuning large models. It is an extensible, convenient, and efficient toolbox for fine-tuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community. There are four features of LMFlow:

  1. Extensible: LMFlow is seamlessly integrated with đŸ€— Transformers, đŸ€— Accelerate and Deepspeed. It is extremely easy to integrate with our pipeline because most of the code is based on huggingface’s/transformers.
  2. Light-weight: With LoRA [3], It is extremely light-weight in training and easy to share with others.
  3. Task-oriented: The workflow is targeted to a specific downstream task.
  4. Open: The whole pipeline, including data, models, tuning and inference methods are open-source.

Acknowledgments

LMFlow draws inspiration from various studies, including but not limited to:

Disclaimer

This package aims to provide a streamlined and user-friendly pipeline for large model tuning. Its functionalities serve as a reference and are intended for use by the user. However, it is important to note that the responsibility for the preparation of the data and pretrained models lies solely with the user. This package does not guarantee the accuracy, completeness, applicability, or legality of the components from the user’s preparation. Users must be aware of and assume all risks and liabilities associated with the preparation of the models and data, and obtain legal, commercial, and technical advice before utilizing this package. The pipeline shall not be held responsible for any direct, indirect, special, incidental, or consequential damages resulting from the user’s improper preparation of the data and pretrained models.

Our checkpoints, which include both English and Chinese versions, are provided solely for research purposes. The training data contained within these checkpoints includes generated results from the ChatGPT language model. We do not endorse or encourage the distribution or usage of these checkpoints for commercial purposes. Users of these checkpoints are solely responsible for ensuring that they are used correctly and appropriately.

It is also crucial to highlight that the results generated by the model are based on probabilistic models and not directly related to this pipeline. The accuracy, reliability, applicability, and legality of the results are not guaranteed by this pipeline. Therefore, users must also be aware of the risks and liabilities associated with the results and seek legal, commercial, and technical advice before relying on the model-generated outcomes. This pipeline shall not be accountable for any direct, indirect, special, incidental, or consequential damages resulting from the user’s reliance on the model-generated results.

Reference

[1] Hanze, Dong, et al. “RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment” https://arxiv.org/abs/2304.06767

[2] Ouyang, Long, et al. “Training language models to follow instructions with human feedback.” Advances in Neural Information Processing Systems 35 (2022): 27730–27744.

[3] Hu, Edward J., et al. “LoRA: Low-Rank Adaptation of Large Language Models.” International Conference on Learning Representations.

--

--