Fine-Tuning Language Models with SageMaker’s LLAMA Algorithm: A Step-by-Step Guide

Introduction:

Tony Esposito
4 min readDec 22, 2023

If you’re passionate about machine learning and generative AI, you’re in for a treat. In this tutorial, we’ll dive into the fascinating world of fine-tuning language models using Amazon SageMaker’s LLAMA (Leverage Language Model) algorithm. Whether you’re a data engineer, a machine learning enthusiast, or just someone looking to explore cutting-edge AI, you’ll find this guide insightful.

Prerequisites:
Before we start, make sure you have an AWS account and some familiarity with SageMaker. You should also have the necessary data ready for fine-tuning. I used this notebook in us-west-2 with a Datasciene 2.0 instance

Setting Up the Environment:
In this section, we’ll import the required libraries, set up our SageMaker session, and define constants such as the S3 bucket for data storage.

!pip install --upgrade sagemaker datasets

Deploy Pre-trained Model

First we will deploy the Llama-2 model as a SageMaker endpoint. To train/deploy 13B and 70B models, please change model_id to “meta-textgeneration-llama-2–7b” and “meta-textgeneration-llama-2–70b” respectively.

model_id, model_version = "meta-textgeneration-llama-2-7b", "2.*"
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, model_version=model_version)
pretrained_predictor = pretrained_model.deploy()

Invoke the endpoint

Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

def print_response(payload, response):
print(payload["inputs"])
print(f"> {response[0]['generation']}")
print("\n==================================\n")
payload = {
"inputs": "I believe the meaning of life is",
"parameters": {
"max_new_tokens": 64,
"top_p": 0.9,
"temperature": 0.6,
"return_full_text": False,
},
}
try:
response = pretrained_predictor.predict(payload, custom_attributes="accept_eula=true")
print_response(payload, response)
except Exception as e:
print(

Dataset preparation for fine-tuning

You can fine-tune on the dataset with domain adaptation format or instruction tuning format. In this demo, we will use a subset of Dolly dataset in an instruction tuning format. Dolly dataset contains roughly 15,000 instruction following records for various categories such as question answering, summarization, information extraction etc. It is available under Apache 2.0 license. We will select the summarization examples for fine-tuning.

Training data is formatted in JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The training folder can also contain a template.json file describing the input and output formats.

from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

# To train for question answering/information extraction, you can replace the assertion in next line to example["category"] == "closed_qa"/"information_extraction".
summarization_dataset = dolly_dataset.filter(lambda example: example["category"] == "summarization")
summarization_dataset = summarization_dataset.remove_columns("category")

# We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

# Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")
train_and_test_dataset["train"][0]

Next, we create a prompt template for using the data in an instruction / input format for the training job (since we are instruction fine-tuning the model in this example), and also for inferencing the deployed endpoint.

import json

template = {
"prompt": "Below is an instruction that describes a task, paired with an input that provides further context. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n",
"completion": " {response}",
}
with open("template.json", "w") as f:
json.dump(template, f)

Upload dataset to S3

We will upload the prepared dataset to S3 which will be used for fine-tuning.

from sagemaker.s3 import S3Uploader
import sagemaker
import random

output_bucket = sagemaker.Session().default_bucket()
local_data_file = "train.jsonl"
train_data_location = f"s3://{output_bucket}/dolly_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"Training data: {train_data_location}")

Train the model

Next, we fine-tune the LLaMA v2 7B model on the summarization dataset from Dolly. Finetuning scripts are based on scripts provided by this repo.

from sagemaker.jumpstart.estimator import JumpStartEstimator


estimator = JumpStartEstimator(
model_id=model_id,
model_version=model_version,
environment={"accept_eula": "true"},
disable_output_compression=True, # For Llama-2-70b, add instance_type = "ml.g5.48xlarge"
)
# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use
estimator.set_hyperparameters(instruction_tuned="True", epoch="5", max_input_length="1024")
estimator.fit({"training": train_data_location})

Deploy the fine-tuned model

Next, we deploy fine-tuned model. We will compare the performance of fine-tuned and pre-trained model.

finetuned_predictor = estimator.deploy()

Evaluate the pre-trained and fine-tuned model

Next, we use the test data to evaluate the performance of the fine-tuned model and compare it with the pre-trained model.

import pandas as pd
from IPython.display import display, HTML

test_dataset = train_and_test_dataset["test"]

inputs, ground_truth_responses, responses_before_finetuning, responses_after_finetuning = (
[],
[],
[],
[],
)


def predict_and_print(datapoint):
# For instruction fine-tuning, we insert a special key between input and output
input_output_demarkation_key = "\n\n### Response:\n"

payload = {
"inputs": template["prompt"].format(
instruction=datapoint["instruction"], context=datapoint["context"]
)
+ input_output_demarkation_key,
"parameters": {"max_new_tokens": 100},
}
inputs.append(payload["inputs"])
ground_truth_responses.append(datapoint["response"])
# Please change the following line to "accept_eula=True"
pretrained_response = pretrained_predictor.predict(
payload, custom_attributes="accept_eula=false"
)
responses_before_finetuning.append(pretrained_response[0]["generation"])
# Please change the following line to "accept_eula=True"
finetuned_response = finetuned_predictor.predict(payload, custom_attributes="accept_eula=false")
responses_after_finetuning.append(finetuned_response[0]["generation"])


try:
for i, datapoint in enumerate(test_dataset.select(range(5))):
predict_and_print(datapoint)

df = pd.DataFrame(
{
"Inputs": inputs,
"Ground Truth": ground_truth_responses,
"Response from non-finetuned model": responses_before_finetuning,
"Response from fine-tuned model": responses_after_finetuning,
}
)
display(HTML(df.to_html()))
except Exception as e:
print(e)

Conclusion

Fine-tuning language models is an exciting and challenging endeavor, and with SageMaker’s LLAMA algorithm, you have a powerful tool at your disposal. We hope this step-by-step guide helps you on your journey to creating unique and powerful applications for generative AI. Feel free to adapt this tutorial to your needs and explore the limitless possibilities of machine learning and AI.

That concludes our comprehensive guide to fine-tuning language models using Amazon SageMaker’s LLAMA algorithm. We hope you found this tutorial informative and inspiring. Now, it’s your turn to unleash your creativity and build amazing AI applications!
Happy Holidays!!!

--

--

Tony Esposito

Generative AI SME, Conversational AI systems specialist