Fine-tune an LLM on your personal data: create a “The Lord of the Rings” storyteller

Jeremy Arancio
12 min readMay 23, 2023

OpenAI launched the most significant AI revolution with the release of ChatGPT. Everybody was amazed by the possibilities provided by this generative AI.

Organizations started to use this technology to accelerate their work and the value they can bring to their customers: chatbots, writing assistants, tasks automation, etc …

However, using OpenAI models come with a price not all organizations are ready to pay: the lack of data privacy. Indeed, the generative model uses the text provided by users to improve itself.

But the recent leakage of Samsung's personal information drew attention to this major issue.

At the same time, with the success of this AI, we witnessed the emergence of open-source Large Language Models (LLMs) instantiated by Meta with LLaMA: Vicuna, Alpaca, GPT4All, …

If you’re interesting in the topic, check my article where I introduce these models you can run on your laptop!

However, even if the LLaMa’s weights leaked after its release, allowing anybody to use the pre-trained version (which cost around 5M$ to train), it’s important to remind everybody that any

--

--

Jeremy Arancio

NLP Engineer & AI-ndependant - I help companies leveraging texts using Machine Learning! - Website: https://linktr.ee/jeremyarancio