The trilogy of Zefiro

3 min readFeb 21, 2024

The trilogy of Zefiro is a set of three open sourced #llms suited for speaking the Italian Language made by following the Zephyr recipes from alignment handbook by#HuggingFace.

Opensource LLMs

Zefiro base: is a continual pre-trained LLM model for the Italian language based on Mistral. I used a random subset of data from oscar_it and wikipedia_it. Unfortunately I lost the code. The story is also funny I left a runpod cluster of H100s for more days than I prevented and I paid a very big bill. So once I got it, I saved the model on huggingface and stoped the cluster loosing some horrible code and the data. But I love the fact that now Zefiro base is a sort alien, I don’t know where it comes from.
Zefiro SFT: is a supervised finetuned version of Zefiro Base where I used the recipe from the alignment handobook using the qlora code from #Huggingface. The dataset I used is ultrafeedback-ita a translation ENG to ITA of the popular UltraChat dataset using different translation tools. Zefiro SFT can be reproduced adapting the recipe to the dataset, mainly I only change starting models and some parameters.
Zefiro DPO: is an aligned version of Zefiro SFT the code can be seen in this colab. It uses as dataset the ultrafeedback preference dataset translated ENG to ITA using argostranslate. I’m enough proud of the final result.

Evaluation

For the evaluation we used lm-evaluation-harness from EleutherAI that provides many pre configured tasks. Since, in Italian, there were some missing tasks we contributed with a series of PRs mainly for multilingual m_mmul and arc_c. We are very proud of this contribution to an open source project. Below the list of tasks supported for Italian and other languages:

xcopa_it
hellaswag_it
lambada_openai_mt_it
belebele_ita_Latn
arc_it
m_mmul_it

The command can be launch as:

lm-eval - model hf - model_args pretrained=giux78/zefiro-7b-dpo-qlora-ITA-v0.7 - tasks xcopa_it,hellaswag_it,lambada_openai_mt_it,belebele_ita_Latn,arc_it,m_mmul_it - device cuda:0 - batch_size 8

In the next iteration we will also evaluate on an Italian version of MT-Bench we are working on it.

We are also maintaining the Italian leaderboard.

Evaluation Results

For evaluating the Zefiro model evolutions we used, mainly, three tasks arc-c, hellaswag and mmul because they are also used for the Italian language in the mixtral paper.

The Zefiro series, as in the table above, is not so far from the best 70 billion open source models on Italian tasks. As you can see on the average column in the table, and is better or very close in some tasks. I’m sure with the help of the community we will be able to create a small better model for the Italian language. If you want to help join the mii-community on huggingface or our discord.

The trilogy of Zefiro

Opensource LLMs

Evaluation

Evaluation Results

Written by Alessandro Ercolani