NeuralChat 7B: Intel’s Chat Model Trained with DPO
Distilled DPO by Intel
2 min readNov 22, 2023
The new chat model released by Intel is now at the top of the OpenLLM leaderboard (among the 7B models).
To achieve this performance, Intel used a strategy similar to what Hugging Face did to train Zephyr 7B:
- Supervised fine-tuning (SFT) of Mistral 7B on a dataset generated by other LLMs
- DPO training, using the model trained with SFT as a reference model, on a dataset also generated by other LLMs
The main differences with Zephyr 7B are in the hyperparameters and the datasets. For steps 1 and 2, Intel used the following datasets:
- Open-Orca/SlimOrca
- Intel/orca_dpo_pairs: In this dataset, Intel systematically chose the output of ChatGPT for the “chosen” output and the output of Llama 2 13b for the “rejected” output. In other words, they assumed that ChatGPT is always better than Llama 2 13b, which of course might not always be the case. While I expect this strategy to introduce some noise in the DPO training data (with some Llama 2 13b’s outputs better than ChatGPT’s outputs), it seems to have worked very well given the performance of NeuralChat.
If you want to train a similar model using these datasets, have a look at my tutorial and recipe for…