NeuralChat 7B: Intel’s Chat Model Trained with DPO

Distilled DPO by Intel

Benjamin Marie
2 min readNov 22, 2023

The new chat model released by Intel is now at the top of the OpenLLM leaderboard (among the 7B models).

To achieve this performance, Intel used a strategy similar to what Hugging Face did to train Zephyr 7B:

  1. Supervised fine-tuning (SFT) of Mistral 7B on a dataset generated by other LLMs
  2. DPO training, using the model trained with SFT as a reference model, on a dataset also generated by other LLMs

The main differences with Zephyr 7B are in the hyperparameters and the datasets. For steps 1 and 2, Intel used the following datasets:

  • Open-Orca/SlimOrca
  • Intel/orca_dpo_pairs: In this dataset, Intel systematically chose the output of ChatGPT for the “chosen” output and the output of Llama 2 13b for the “rejected” output. In other words, they assumed that ChatGPT is always better than Llama 2 13b, which of course might not always be the case. While I expect this strategy to introduce some noise in the DPO training data (with some Llama 2 13b’s outputs better than ChatGPT’s outputs), it seems to have worked very well given the performance of NeuralChat.

If you want to train a similar model using these datasets, have a look at my tutorial and recipe for…

--

--

Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/