Company-Specific AI: Getting Started with Fine-Tuning Large Language Models

Published in

Odicis-Data-Engineering

7 min readSep 4, 2023

In the dynamic landscape of Natural Language Processing (NLP), we are currently experiencing nothing short of a renaissance. Spearheaded by breakthroughs in deep learning and the inception of transformative models, the capability of large language models to mirror human-like text comprehension and generation has reached unprecedented heights. Today, we Odicis cast the spotlight on a remarkable protagonist of this groundbreaking narrative — the Llama2 model.

Before we delve into the technical aspects, let’s shed light on some fundamental questions and the vibrant ecosystem that is paving the way for a new era in language modeling.

Will open source AI models like LLaMA 2 transform radically the way companies operate ?

The Boom of Open-Source Large Language Models

Embracing open-source Large Language Models (LLMs) like Llama2 can be a game-changer for organizations. Firstly, self-hosting the model alleviates data security concerns, as sensitive information remains confined within the company’s premises.

Furthermore, companies can fine-tune the model on their internal documents, moulding it to understand and cater to their specific use cases, thus unlocking a treasure trove of opportunities, from enhancing customer service to automating content creation, and much more.

Unveiling the Llama2 Model

The Llama2 is one of the newer breeds in the lineage of large language models. It is known for its ability to understand and generate text in a manner that is incredibly close to human-like comprehension and articulation. What sets it apart is not just its size, but also the advanced training techniques and optimizations that make it a powerhouse of NLP tasks. Fine-tuning this model for specific tasks can unlock a plethora of applications, from content creation to information retrieval, and much more.

Meta’s Game-Changer: The Future of Personalized AI Chatbots

On July 18, 2023, the tech giant Meta made waves in the technology world with the unveiling of LLaMA 2, a shiny new version of their Large Language Model (LLM). I can’t help but feel that this is a huge step forward in changing the landscape of AI technology. It seems like in the near future, we’ll see more personalized, fine-tuned chatbots taking the center stage, instead of the general chatbots we are accustomed to.

During the first half of 2023, Meta rolled up their sleeves and got to work, shaping LLaMA 2 with a training regimen on a staggering 2 trillion tokens. This impressive process has led to superior performance, making it a star player in various tests including reasoning and coding proficiency.

To ensure everyone finds something that suits their needs, Meta is offering LLaMA 2 in three different sizes: 7B, 13B, and a massive 70B version. The best part? These models are available for everyone to use, whether for business or research, and all in English!

Now, armed with an understanding of the Llama2 model, let’s jump into the fine-tuning process step by step.

Delving Deeper into Fine-Tuning

In the ever-evolving world of language models, a universal solution is elusive. Different applications such as content generation, sentiment analysis, and others have unique nuances and requirements.

Fine-tuning emerges as a pivotal process here, tailoring the extensive knowledge of the Llama2 model to cater to specific tasks, optimizing its efficiency and output dynamically. Essentially, it’s like honing a specialized skill, aligning more closely with your project’s objectives and enhancing performance markedly.
We wrote an article that provides some guidances of whether you should use fine-tuning or prompt engineering.

Why Hugging Face?

Opting for the Hugging Face platform is a tactical move grounded in convenience and collaborative innovation. Known for its extensive repository of pre-trained models and datasets, it has become a nurturing ground for machine learning enthusiasts and professionals to work together and forge new paths. Its user-centric interface and community-driven ethos make it a favored destination for fostering and deploying language models, offering a seamless, integrated experience from inception to deployment.

Diving into the Dataset: A Real-World Example

Before we dive deep into the intricacies of fine-tuning the Llama2 model, let’s familiarize ourselves with the kind of dataset that we would be using in this tutorial. We will be using the “timdettmers/openassistant-guanaco” dataset as a sandbox to explore and understand the processes involved. This dataset is an excellent starting point for beginners to get a taste of working with real data without getting overwhelmed.

About the Dataset

For this tutorial, we have chosen a dataset titled “(timdettmers/openassistant-guanaco)” that serves as a playground to illustrate the fine-tuning process vividly. As you venture into actual project scenarios, you would ideally choose a dataset that aligns seamlessly with the objectives of your project, thereby enabling the model to learn and adapt to your specific use case more efficiently.

Before we delve further, let’s examine a snippet from our dataset to give you a glimpse of what lies ahead:

Example Entry:

### Human: 
Can you write a short introduction about the relevance of the term "monopsony" in economics? Please use examples related to potential monopsonies in the labour market and cite relevant research.

### Assistant: 
"Monopsony" refers to a market structure where there is only one buyer for a particular good or service. In economics, this term has a significant role, particularly in the labour market, where a monopsony employer has considerable sway over wages and working conditions. A monopsony can potentially lead to lower wages and fewer employment opportunities, as there's little incentive for the employer to offer better conditions or higher pay.
Recent studies have pinpointed potential monopsonies in sectors like retail and fast food, where a handful of large corporations dominate a significant market share (Bivens & Mishel, 2013). Here, workers often encounter low wages, scarce benefits, and diminished bargaining power, creating a dependency on the employers for their livelihoods. This dependency can spiral into further wage suppression and deteriorating working conditions.
Understanding the concept of monopsony is crucial in grasping the dynamics of labour markets and comprehending the market power's impact on workers. Continuous research is essential to gauge the extent of monopsonies in the economy and to forge policies that can address these issues effectively.

References: 
Bivens, J., & Mishel, L. (2013). The Pay of Corporate Executives and Financial Professionals as Evidence of Rents in Top 1 Percent Incomes. Journal of Economic Perspectives, 27(3), 57-78.

### Human: 
Now explain it to a dog.

This dataset comprises an extensive collection of dialogues, amounting to a substantial size of 9.85K. Each entry is a rich resource of information and serves as an excellent base to fine-tune our model for a more natural and informed response system.

Now, let’s venture into the enthralling realm of the Llama2 model, a beacon of progress in the contemporary language modeling landscape, and explore the avenues it opens up in the realm of language processing

Step 1: Setting Up the Environment

Before we dive into the coding part, let’s set up our environment with all the necessary dependencies. Execute the following command to install them.

# Installing the dependecies
pip install accelerate==0.21.0 
pip install peft==0.4.0 
pip install bitsandbytes==0.40.2 
pip install transformers==4.31.0 
pip install trl==0.4.7

These libraries will facilitate the optimization of our fine-tuning process and leverage various advanced features.

Step 2: Logging into Hugging Face

Next, we will log in to the Hugging Face platform, a hub hosting a plethora of pre-trained models and datasets crucial for our task.

# Login to the Huggingface with the credentials
from huggingface_hub import notebook_login
notebook_login()

This command will prompt you to log in, granting access to various resources from the Hugging Face hub.

Step 3: Loading Necessary Libraries and the Dataset

Once logged in, it’s time to load the necessary libraries and the dataset using the load_dataset method.

# Importing the required libraries
import torch
from trl import SFTTrainer
from datasets import load_dataset
from peft import LoraConfig, get_peft_config
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments

# Loading the dataset

data= load_dataset("timdettmers/openassistant-guanaco", split="train")

Step 4: Initializing the Tokenizer

Our next step is to initialize the tokenizer which will convert our textual data into a format that the model can understand.

# Initializing the tokenizer for pretained model
tokenizer= AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer.pad_token=tokenizer.eos_token

Step 5: Configuring the Model with BitsAndBytes and Lora

We then configure our model using BitsAndBytes and Lora, techniques that enhance the training process.

# Configurations of the model

bnb_config=BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True
)


peft_config= LoraConfig(
    r=64,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

Step 6: Loading and Configuring the Model

Next, we load and configure our model with various parameters for an optimized training process.

# Loading and configurations of params
model= AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", 
    quantization_config = bnb_config,
    device_map = {"":0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Step 7: Setting Up Training Arguments

In this step, we set up the training arguments to dictate the training dynamics.

# Training the model Args
pythonCopy codetraining_arguments = TrainingArguments(
    output_dir = "llama2-test-sample",
    per_device_train_batch_size = 8,
    gradient_accumulation_steps = 4,
    optim = "paged_adamw_8bit",
    learning_rate = 2e-4,
    lr_scheduler_type = "linear",
    save_strategy = "epoch",
    logging_steps = 10,
    num_train_epochs = 1,
    max_steps = 20,
    fp16 = True,
    push_to_hub = True
)

Step 8: Initializing and Running the Trainer

Finally, we initialize and run the trainer with the model, dataset, and training arguments.

#Staring the training withe above params
pythonCopy codetrainer = SFTTrainer(
    model = model,
    train_dataset = data,
    peft_config = peft_config,
    args = training_arguments,
    tokenizer = tokenizer,
    packing = True,
    dataset_text_field = "response",
    max_seq_length = 512
)

# Training the model
# Pushing the trained model to Hub
trainer.train()
trainer.push_to_hub()

Conclusion

By following this guide, you have successfully stepped into the fascinating world of large language models, specifically gearing and fine-tuning the Llama2 model, a beacon of the new era in language modeling. As we stand at the threshold of unprecedented advancements in NLP, harnessing the power of models like Llama2 opens up new avenues for innovation and discovery.

Feel free to experiment with the script, tweaking parameters and exploring the vast potential that fine-tuning Llama2 brings to your NLP projects. Happy coding from Odicis, and here is to guide into a new era of language modeling!

Thank you for being a part of our community! Before you go:

Be sure to clap and follow the writer! 👏
You can find even more content at Odicis.com 🚀
Follow us on LinkedIn.