Turn Boring Headlines into Satirical Gems: Fine-Tuning Llama2 for Witty News Titles

Incorporating Sarcasm into Llama2 using generated data: Watch Out, The Onion!

SriRam Govardhanam
14 min readSep 10, 2023

Fine-tuning LLMs has never been more accessible. This new wave of using LLM’s for specific use cases, such as adding certain behaviors or transferring specific knowledge, is both exciting and daunting. Let’s ride this wave by working on some projects and exploring the potential of AI-woven products.

Objective

We can fine-tune an LLM to learn sarcasm by showing it a variety of samples that have a normal headline and its corresponding sarcastic headline. This way, the LLM can learn the demeanor from training dataset. We can do this by using parameter efficient fine tuning.

https://artsandculture.google.com/asset/onion-covers-obama/_wHitMO8IAk2WA

Lets dive in

To fine-tune Llama2, we first need a good dataset of a decent size that contains a normal headline sentence and its corresponding sarcastic version. The sarcastic version should not dilute the input context and the vocabulary should be familiar to the model.

Note: We are trying to create a sarcastic version of a normal sentence, by rephrasing it to be satirical and funny. We are not aiming to create sarcastic responses or comebacks to a sentence.

The open datasets available related to sarcasm are either sarcastic responses or labelled data of sarcasm & non sarcasm sentences which is not what we are looking for. We can utilize the best available open LLM to generate a witty & savage version of the given headline. In this way we can create a dataset having both input and expected output for fine-tuning.

Meet Llama2

Credit is due; we must appreciate Meta’s team for making amazing ML models like Detectron2 and Llama2 openly available. The reason for picking Llama2 is because it has multiple parameter variants (such as Llama 7B, 13B, 70B), a large user base, and can be used for commercial purposes. For inferencing Llama2 to generate data, we can use Llama2 13B 4 bit Quantized version comfortably in a colab T4 GPU.

One advantage of using Llama2 model for dataset generation instead of chatGPT is, OpenAI will not allow offensive words/ hate speech as rules, even if we include them in prompt template. This means that chatGPT will not generate brutal/ humiliating sentences, which is important for ethical reasons. But in our use case, we find it necessary to go all in.

GIF by vegasjaydubs from tenor

Note: In this project, we will use two variants of Llama2: the 13B and 7B models. We will use the 13B model to generate the desired dataset, and we will do fine-tuning on the 7B model using the generated dataset.

To generate data from an LLM, we need to provide a prompt input. To ensure the quality and diversity of the training data, we will use a news headline category dataset. This will allow us to cover multiple different random sentences without worrying about grammatical mistakes in the input sentence.

The data is in JSON format, so we will first convert it to a dataframe for easy analysis. The dataset has 209k records and has 42 different news categories. Its a pretty big dataset, we will just take 50 random recs per group with below line.

size = 50
repeat_same_row = False
random_fn = lambda x: x.loc[np.random.choice(x.index, size, repeat_same_row),:]
df = data[['headline', 'category']]
df = df.groupby('category', as_index=False).apply(random_fn).reset_index()
df.drop(['level_0', 'level_1'], axis=1, inplace=True)

We can expect a total of 2100 news lines having 42 different categories.

Now that we have some input data, let’s configure Llama2 inference in a Colab notebook using the llama-cpp-python library. We will also need the huggingface library to download the Llama2 13B quantized weights.

Note: Specify version 0.1.78 for library llama-cpp-python if you are using GGML quantized format(we are gonna use here). For latest package versions use GGUF quantized weights instead of GGML format.

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 --force-reinstall --upgrade --no-cache-dir --verbose
!pip install huggingface_hub

Initiate Llama2 model for data generation

After successfully installing these packages. Download weights from huggingface(HF), initiate Llama2 with the downloaded weights using below code.

Note: Be careful while providing HF model name or path for downloading weights, choose well known user-name/repo-name.

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Load Llama 2 13B weights from HF
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin"

model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

# Initiate Llama2
lcpp_llm = None
lcpp_llm = Llama(
model_path=model_path,
n_threads=2, # CPU cores
n_batch=512, # context_size, consider the amount of VRAM in your GPU.
n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
)

Generate data using Llama2

Now Llama2 is ready to use in colab, we make a prompt instruction so that model will understand what text to generate. I played around with different prompt instructions and the below one seems to work better for me.

def create_prompt(prompt):

prompt_template=f'''RULES: You are a savage, disrespectful and witty assistant. Always rephrase a sentence into a funny, creatively sarcastic way.
while rephrasing you should not dilute the given context. The output sentence should be witty, savage, insulting and also usable as a sarcastic news headline.

Headline: {prompt}

SARCASTIC Headline:
'''
return prompt_template

We will send input headlines to the function so that it will make a prompt template for every headline.

data_headline = df.headline.values
prompt_headline = [create_prompt(i) for i in data_headline]
prompt_headline[:1]

# This is how output of the function looks like
['RULES: You are a savage, disrespectful and witty assistant. Always rephrase a sentence into a funny, creatively sarcastic way.\n while rephrasing you should not dilute the given context.
The output sentence should be witty, savage, insulting and also usable as a sarcastic news headline.\n\n
Headline: Damien Hirst Is Building A Town No One Wants\n\n SARCASTIC Headline:\n ']

Now we pass the prompt_template list to Llama2 model to generate sentence for each prompt in the list

st = datetime.now()
print(f"Starting time: {st}\n")

gen_result = []
i = 0
for pmp in prompt_headline:
print("processing {}/{}: ".format(i, len(prompt_headline)))
response = lcpp_llm(prompt=pmp, max_tokens=len(data_headline[i].split())*5, temperature=0.5, top_p=0.95,
repeat_penalty=1.2, top_k=150,
echo=True)

# max_tokens - no of tokens to generate atmost, we are generating response tokens only 5 times the input tokens count, this will reduce the time without impacting accuracy
# temperature - Set this lower, a lower temperature makes the output more deterministic and focused
# top_p - Set this high, this would allow the AI to consider a broad range of the most probable responses
# top_k - This restricts the AI to picking from a smaller set of highly probable next words
# repeat_penalty - it reduces the chance of tokens that already in the context based on how many times each tokens appeared.

resp = str(response["choices"][0]["text"])

print("headline: ", data_headline[i], ", max_tokens", len(data_headline[i].split())*5)
print("sarcastic headline: ", resp.partition('SARCASTIC Headline:\n')[-1].split('\n')[0].strip())
gen_result.append(resp.partition('SARCASTIC Headline:\n')[-1].split('\n')[0].strip())
i += 1
en = datetime.now()
print("\nTime taken to complete the generation: ", en-st)

df['sarcastic_headline'] = gen_result

Below is the sample output after generating for few prompts

Starting time: 2023-09-02 19:37:30.724040

processing 1/2100:
Llama.generate: prefix-match hit
headline: First Nighter: Musicals "Atomic," "The Mapmaker's Opera," "ValueVille"
sarcastic headline: Atomic! The Musical That Will Blow Your Mind... and Your Budget!
processing 2/2100:
Llama.generate: prefix-match hit
headline: Anne-Sophie Mutter - A Profile of the Artist
sarcastic headline: "Anne-Sophie Mutter: Because Violinists Need More Ego in Their Lives"
processing 3/2100:
Llama.generate: prefix-match hit
headline: 23 Artworks
sarcastic headline: 23 Artworks So Bad They Should Be Lock
processing 4/2100:
Llama.generate: prefix-match hit
headline: This Is What Happens When Doodles Grow Up
sarcastic headline: "The World is Now Ruled by Sentient Crayons"
processing 5/2100:
Llama.generate: prefix-match hit
headline: Artist Makes Masks Out Of Junk Food And Supremely Creeps Everyone Out (NSFW)
sarcastic headline: "Artist's Latest Masterpiece Will Make You Question Your Love For Pizza"
processing 6/2100:
Llama.generate: prefix-match hit
headline: Lessons From the Spring Festival
sarcastic headline: "Spring Festival Teaches Us How to Become Better People by Doing Nothing"
processing 8/2100:
Llama.generate: prefix-match hit
headline: Rock-a-bye, Baby: "Jenůfa" at the Metropolitan Opera
sarcastic headline: Jenůfa's Met Debut Leaves Audience in Tears... of Boredom

Time taken to complete the generation: 0:01:52.836077

We got some absolute bangers from it, we also got some under the mark sarcastic headlines.

Fine-tuning using Llama2 7B

Now, we do a basic structuring of the data before passing it to fine-tuning. The prompt template that we are using to generate sentences can be modified a little. Here is the prompt instruction that I used for fine-tuning:

# Create a column name "text" for the promp template, its better to name the column 'text'
format_text = "You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.\n### headline: {}\n### sarcastic_headline: {}"
df['text'] = df.apply(lambda x: format_text.format(x['headlines'], x['sarcastic_headlines']), axis=1)

It doesn’t have to be exactly the same as what I mentioned. You can modify it as per your use case, but it needs to be uniform across all samples. You also need to define very specific tokens for the input line (### headline) and the output line (### sarcastic_headline). You can also have instructions before the input and output lines.

Below is the sample output of prompt instruction we are structuring, please refer various templates for better understanding: alpaca, guanaco.

You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.
### headline: Former Detroit Officer Found Guilty In Videotaped Beating Of Black Man
### sarcastic_headline: Former Detroit Cop Gets Justice For That One Time He Didn't Beat A Black Person

Huggingface developed a library, which can perform fine-tuning with a single line of code. It requires GPU with CUDA installed. Let’s quickly install packages and get into the juicy part.

!pip install autotrain-advanced
!autotrain setup --update-torch

While fine-tuning, we can specify the max number of tokens that a model can process in a single input which is called the model’s maximum length. Since our goal is to generate headlines, we can use a smaller value for the model’s max length. This can help to reduce the fine-tuning time. To do this, I calculated the highest number of words in the longest available line and added 5 to be on the safe side.

Note: If we provide input sentence having words more than max length then the input will be truncated or omitted to fit within the constraint. The model will process only up to its maximum token limit and ignore any additional tokens beyond that point.

Fine-tuning an LLM is very GPU intensive task, In any free GPU services this activity will timeout and stop the process in the middle. So we need to choose parameters wisely to make a progress in this setting. So we are using Llama2 7B sharded version as base model with less no of epochs.

!autotrain llm --train --project_name 'sarcastic-headline-gen' --model TinyPixel/Llama-2-7B-bf16-sharded \
--data_path '/content/sarcastic-headline' \
--use_peft \
--use_int4 \
--learning_rate 2e-4 \
--train_batch_size 4 \
--num_train_epochs 5 \
--trainer sft \
--model_max_length max_tokens \
--block_size max_tokens > training.log &

# we are using lowest possible Llama2 model 7B 16bit quantized that too a sharded version as the colab free GPU cannot load checkpoints we have to split up into more pieces
# mention some project name like 'sarcastic-headline-gen'
# provide path of structured dataset - '/content/sarcastic-headline', make sure to name prompt instruction column as 'text'
# use_peft to use peft, use_int4 to fine tune in less precision which consumes less GPU
# learning rate set to 0.0002, smaller the value is, the better it will converge
# train_batch_size set to 4, depends on avaiable GPU, for a 2100 samples 4 can be used
# trainer - supervised fine tuning(sft), since we are showing input, output to learn
# model_max_length - max_token variable calculated before, we can specify maximum number of tokens the model can process in a single input
# block_size - max_token variable calculated before, is the size at which lines are truncated to ensure they are equi-length.
# training.log - if mentioned then it stores the training log.
# push_to_hub - you can mention this if you want to push the model directly to huggingface hub, also mention repo details with this param
# repo_id your_repo_id - if mentioned push_to_hub then we need to specify here to which repo we are pushing to.

After successfully completing fine-tuning, we can observe a folder “checkpoint” under project folder, we can find adapter files in it.

Inferencing

There are 2 ways to perform Inference using above trained model

  • 1. One way of doing is using library peft -from_pretrained() method which lets you quickly load a pretrained model for any architecture so you don’t have to devote time and resources to train a model from scratch. (I prefer this way)
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
from peft import PeftModel
import torch

# 1. Loading model using Peft - from_pretrained() method

# load tokenizer files from checkpoint folder
tokenizer = AutoTokenizer.from_pretrained('/content/sarcastic-headline/checkpoint-672')

# load base model used for fine-tuning
model = AutoModelForCausalLM.from_pretrained('TinyPixel/Llama-2-7B-bf16-sharded', torch_dtype = torch.float16, device_map="auto") #Base_Model for example: meta-llama/Llama-2-13b-chat-hf

# load adapter weights, base model weights using function from_pretrained()
model = PeftModel.from_pretrained(model, "/content/exp/checkpoint-672", device_map="auto")

# we use the same formatted text used in fine-tuning to create prompt template
format_text = "You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.\n### headline: {}\n### sarcastic_headline: {}"

# convert an input text to prompt template
input_newsline = 'Burning Man: Torrential rain turns desert festival into mud bath'
formatted_input = format_text.format(input_newsline, "")

# perform inference, generate sarcastic version with formatted_input
inputs = tokenizer(formatted_input, return_tensors="pt").to("cuda:0") # GPU device name will be 'cuda:0' in colab, name it as per gpu local name
outputs = model.generate(**inputs, max_length=300) # max_length can be changed
print(tokenizer.decode(outputs[0]))

Formatted Input:

You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.
### headline: Burning Man: Torrential rain turns desert festival into mud bath
### sarcastic_headline:

Generated output from fine-tuned model:

You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.
### headline: Burning Man: Torrential rain turns desert festival into mud bath
### sarcastic_headline: Another year of Burning Man, where the only thing that's burning is our wallets
  • 2. Other way is merging the base model with generated adapters after fine tuning, So that you have a single model folder and you can load it as the huggingface model way.
from peft import AutoPeftModelForCausalLM

# 2. Merge the base model with generated adapters after PEFT

# It will load adapters from the checkpoint folder and it will load base model automatically by referring adapter_config.json file
model = AutoPeftModelForCausalLM.from_pretrained("/content/sarcastic-headline-gen/checkpoint-672", low_cpu_mem_usage=True,)

# Merge LoRA and base model
merged_model = model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained("merged_model", safe_serialization=True)
tokenizer.save_pretrained("merged_model")

# Once merged model saved into the mentioned directory, we can load the merged model as a hf model using below functions
model_merged = AutoModelForCausalLM.from_pretrained("/content/merged_model", low_cpu_mem_usage=True)
tokenizer_merged = AutoTokenizer.from_pretrained("/content/merged_model")

# convert an input text to prompt template
input_newsline = 'mansoons are best for mosquitoes'
formatted_input = format_text.format(input_newsline, "")

# perform inference, generate sarcastic version with formatted_input
inputs = tokenizer(formatted_input, return_tensors="pt").to("cuda:0") # GPU device name will be 'cuda:0' in colab, name it as per gpu local name
outputs = model.generate(**inputs, max_length=300) # max_length can be changed
print(tokenizer.decode(outputs[0]))

Formatted input:

You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.
### headline: mansoons are best for mosquitoes
### sarcastic_headline:

Generated output from fine-tuned model:

You are a savage, disrespectful and witty agent. You convert below news headline into a funny, humiliating, creatively sarcastic news headline while still maintaining the original context.
### headline: mansoons are best for mosquitoes
### sarcastic_headline: Another Study Proves That Men's Sweaty Bums Are The Best Repellent Against Mosquitoes
GIF credit: tenor from source

If you want to push checkpoint into huggingface hub, then its better to do while fine tuning by using — push_to_hub \ — repo_id your_repo_id

If you don’t want to push it to hub but want to use it as a plug-and-play model locally, then you can specify below parameter while fine-tuning
— merge-adapters
Beware that merging base model with adapter is a pretty CPU intensive task, it can definitely crash the existing session if you are using colab free version. It almost used 35GB CPU RAM when I merged it separately after fine-tuning. Colab pro version will be needed to have that much RAM.

If you directly want to use the fine-tuned model checkpoints of this project, I have created a Colab file to do inference using the first way I mentioned above. We download the checkpoints and base model from the Hugging Face Hub, then use the from_pretrained() function to create the model and pass the prompt to generate a sarcastic headline.

Here’s some more information

You may be wondering why we’re fine-tuning the 7B model when we can already generate sarcastic headlines with the 13B model.
You’re right — if we’re just looking for a very generalized witty sentences, we don’t need to fine-tune.

  • However, if you have explicit examples of sarcasm which you feel more clever, humorous, context relevant than what model generated, then that’s where fine-tuning will be useful.
  • Fine-tuning can also be more efficient, as it can generate desired output with fewer computational resources than using a general-purpose LLM.
  • Fine-tuning gives you greater control over the model’s behavior, which can be useful if you have specific requirements.

Uses

  • Enhanced Natural Language Understanding: In apps like chatbots/ virtual assistants, a model trained to understand sarcasm can provide more contextually relevant sentence, improving user interactions.
  • Content Generation: In creative writing and content creation, the model can be used to inject humor and sarcasm into articles, scripts, advertisements, or marketing materials to make them more engaging.
  • Brand persona: Some companies adopt a brand persona characterized by humor & sarcasm in their communications. Model can assist in maintaining this tone in marketing campaigns & customer interactions.
  • Social Media Engagement: Brands and influencers on social media may use the model to craft sarcastic posts or responses that resonate with their audience, leading to increased engagement and brand awareness.
  • Niche applications: For some websites like TheOnion, the model may able to support/improve writers ability.

Room for improvement

There is still room for improvement in this project. Here are a few ideas:

  • At data generation level, we can provide different prompt instructions for different categories of news headlines to generate even funnier headlines, like create a different prompt instruction for sports category and a different one for politics.
  • The dataset used to fine-tune the model only has 2100 examples. We can increase the dataset size to improve the model’s performance. I only performed 8 epochs of fine-tuning due to GPU constraints — epochs can be increased.
  • I chose news headlines dataset as the training data because of their quality and diversity. However, if the sole purpose of the model is to generate more enticing sarcastic headlines, then a better approach would be to first generate a news description and then generate a headline for the description.

Model Objective

This model is not intended to target specific race, gender, region etc., Sole purpose of this model is to understand LLM’s and tap the LLM’s ability to entertain, engage.

Code files and Model card

Code full implementation details can be found below

Model weights and checkpoints can be found in huggingface repo below

References

For this project creation, I referred to multiple resources and code files at different stages such as model initiation, model fine-tuning, model inference,
The input news lines to generate was taken from datasetMisra, Rishabh. "News Category Dataset." arXiv preprint arXiv:2209.11429 (2022).
You can find above Llama2 generated dataset here.

Summary

We are using Llama 2 13B version to generate the sarcastic sentence by using an appropriate prompt template, for the input sentences we are referring to a news headline category dataset. Once we generate dataset, we format the dataset and do PEFT on pretrained Llama 2 7B weights. The fine tuned model can behave sarcastically & generate satirical news lines.

Closing

Thank you for reading to the end!

I hope this post has helped you to become more familiar with LLMs.
I’d love to hear your thoughts on this article. Please let me know if you have any questions or comments.

--

--

SriRam Govardhanam

i love the simplified, endless universe as much as complicated, tangled quantum-verse