TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

FanFabler: Fine-Tuning Llama 3 to Be a Multilingual Fanfic Writing Assistant

How I used a custom training dataset and information retrieval for global storytelling. 好样的! Bravo! वाह! ¡Guau! 브라보!

Robert A. Gonsalves
TDS Archive
Published in
21 min readMay 7, 2024

--

FanFabler: A Multilingual Fanfic Writing Assistant, Image created using an AI image creation program, DALL-E 3, edited by the Author

The rise of Large Language Models (LLMs) has ushered in a new era of text-based AI systems. Although these models are very good and highly capable, their training predominantly focuses on English. The largest commercial LLMs generate text well using “low resource” languages, while the smaller open-source models don’t fare well with non-European languages.

However, Meta trained the new Llama 3 model with a wider variety of languages, as they announced in a post when it was released[1].

To train the best language model, the curation of a large, high-quality training dataset is paramount. In line with our design principles, we invested heavily in pretraining data. … To prepare for upcoming multilingual use cases, over 5% of the Llama 3 pretraining dataset consists of high-quality non-English data that covers over 30 languages. However, we do not expect the same level of performance in these languages as in English. — Meta

Five percent doesn’t sound like much, but it’s more than its previous versions of Llama [2] and other small LLMs like Mistral [3]…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Robert A. Gonsalves
Robert A. Gonsalves

Written by Robert A. Gonsalves

Robert A. Gonsalves is an artist, inventor, and engineer who writes about the creative uses of AI. Ask questions https://chat.openai.com/g/g-b1kqByRsT-robgonbot

Responses (1)