NeuralChat 7B: Intel’s Chat Model Trained with DPO

Distilled DPO by Intel

2 min readNov 22, 2023

The new chat model released by Intel is now at the top of the OpenLLM leaderboard (among the 7B models).

To achieve this performance, Intel used a strategy similar to what Hugging Face did to train Zephyr 7B:

Supervised fine-tuning (SFT) of Mistral 7B on a dataset generated by other LLMs
DPO training, using the model trained with SFT as a reference model, on a dataset also generated by other LLMs

The main differences with Zephyr 7B are in the hyperparameters and the datasets. For steps 1 and 2, Intel used the following datasets:

Open-Orca/SlimOrca
Intel/orca_dpo_pairs: In this dataset, Intel systematically chose the output of ChatGPT for the “chosen” output and the output of Llama 2 13b for the “rejected” output. In other words, they assumed that ChatGPT is always better than Llama 2 13b, which of course might not always be the case. While I expect this strategy to introduce some noise in the DPO training data (with some Llama 2 13b’s outputs better than ChatGPT’s outputs), it seems to have worked very well given the performance of NeuralChat.

If you want to train a similar model using these datasets, have a look at my tutorial and recipe for…