NeuralChat: Simplifying Supervised Instruction Fine-Tuning and Reinforcement Aligning

Easily Create High-Performance Chatbots

Intel(R) Neural Compressor

Published in

Intel Analytics Software

4 min readSep 27, 2023

Kaokao Lv, Wenxin Zhang, and Haihao Shen, Intel Corporation

Chatbots are becoming prevalent and have been widely used in assistant-based applications. There is increasing demand for open-source chatbot models based on high-performance pretrained large language models (LLMs) like MPT, Llama2, and Falcon. Today, we released Intel Extension for Transformers v1.2 and officially announced NeuralChat, a unified framework supporting supervised instruction fine-tuning, reinforcement aligning, and customization of chatbots.

NeuralChat: A Customizable Chatbot Framework

Creating Your Own Chatbot in Just a Few Minutes

medium.com

In this article, we describe how to perform supervised fine-tuning and reinforcement aligning. We also present the benchmarking results and instructions to accelerate inference using Intel Neural Compressor.

Supervised Instruction Fine-Tuning

We selected MPT-7B from MosaicML as the base model to begin supervised instruction fine tuning because of its commercially friendly license. Let’s discuss the instruction dataset and training details including hyperparameters and convergence.

Inspired by Alpaca, there are many active instruction-based finetuning efforts-. As a result, the instruction dataset is critical to achieve high-quality models. We used popular datasets like databricks-dolly-15k, HC3, and other selected open-source datasets to give good coverage of domain and language. The total number of instruction samples and tokens is about 1.1M and 326M, respectively. Our instruction dataset is publicly available on Hugging Face: https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1.

We leveraged the fine-tuning pipeline provided by Intel Extension for Transformers and performed the training. We referred to the hyperparameters for MPT pretraining and figured out an effective set for fine-tuning. The chart below shows the training loss.

The detailed hyperparameters are described in the model card. We could do further hyperparameter tuning using a sophisticated tool like SigOpt, but we’ll leave this for future work.

Direct Preference Optimization

For better aligning human preferences, we apply the direct preference optimization (DPO) algorithm, which is stable and computationally lightweight. DPO derives the probability of human preference data for the optimal policy to replace the reward model that reinforcement learning from human feedback requires and formulates a maximum likelihood objective for a parameterized policy.

The preference dataset contains 12K examples selected from an Orca-like dataset: Open-Orca/OpenOrca. We also leveraged both commercial (GPT-4/GPT-3.5) and open-source chatbots (llama-2–13b-chat) to generate the response as part of preference dataset for alignment. See the preference dataset and DPO training code for more details.

Evaluating Results

We use the same evaluation metrics as open_llm_leaderboard, which uses Eleuther AI Language Model Evaluation Harness (short for LM Eval-Harness), a unified framework to test generative language models on a large number of different evaluation tasks. The following table shows the benchmark results on NeuralChat-7B and the other popular models of similar size, where the result is either directly from open_llm_leaderboard or measured using LM Eval-Harness. NeuralChat outperforms popular chat/instruct models such as MPT-7B-Chat and Falcon-7B-Instruct in both average and majority individual metrics.

We then evaluated the benefits of DPO on our fine-tuned model NeuralChat-7B and a Llama-2–7B model fine-tuned by 3rd party (Llama2–7B-3P for short) against the open_llm_leaderboard. The following table shows that DPO significantly improves model performance. We can even push the limits of the top ranking model on the open_llm_leaderboard.

Inference and Deployment

Intel Extension for Transformers offers INT4 inference for LLMs by extending the Hugging Face Transformers API. Use the sample code below with the model’s name “Intel/neural-chat-7b-v1–1” to perform INT4 inference on Intel platforms.

You can also refer to this notebook on how to deploy the chatbot on the Hugging Face space. Below is a snapshot of deployed chatbot on 4th Gen Intel Xeon Scalable Processors with AMX.

Ethics Statement

We released the code, model, and dataset to the open-source community to advance LLM and chatbot development. Though we strive to address the risks of hallucination, toxicity, and other potential ethics issues during fine tuning, like other LLMs, NeuralChat-7B is not free from such issues. We also carefully performed the low-precision quantization for inference acceleration to ensure the quantized model performed similarly to the baseline. We hope to collaborate with the community to improve these issues to make AI beneficial for everyone.

Summary

We are excited to release NeuralChat to the open-source community, together with the dataset, model, and code:

Instruction Dataset: https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1
Preference Dataset: https://huggingface.co/datasets/Intel/orca_dpo_pairs
Codebase: https://github.com/intel/intel-extension-for-transformers
Model: https://huggingface.co/Intel/neural-chat-7b-v1-1

We hope that NeuralChat will empower the creation of high-performance chatbots and deliver AI for the benefit of everyone. We encourage you to try NeuralChat and we look forward to hearing your feedback and suggestions.