Flan-T5: sweet results with the smaller, more efficient LLM
Flan-T5 offers outstanding performance for a range of NLP applications, even compared to very large language models. Try now on Paperspace, powered by IPUs
Author: Harry Mellor, AI Engineer at Graphcore
In the world of AI language models, there’s no one-size-fits-all solution.
Commercial users are increasingly coming to the realisation that Ultra-Large Language Models, while broadly capable, are AI overkill for many applications.
The penny (or dollar) usually drops when they receive an outsize bill from the owners of their prefered proprietary model, or cloud compute provider. That’s assuming they can even secure GPU availability for the A100 and H100 system needed to run advanced models.
Instead, many are looking to more efficient, open-source alternatives to the likes of GPT-3/4.
Flan T5
In December 2022 Google published Scaling Instruction-Finetuned Language Models in which they perform extensive fine-tuning for a broad collection of tasks across a variety of models (PaLM, T5, U-PaLM).
Part of this publication was the release of Flan-T5 checkpoints, “which achieve strong few-shot performance” with relatively modest parameter counts “even compared to much larger models” like the largest members of the GPT family.
In this blog, we will show how you can use Flan-T5 running on a Paperspace Gradient Notebook, powered by Graphcore IPUs. Flan-T5-Large can be run on an IPU-POD4, using Paperspace’s six hour free trial, while Flan-T5-XL can be run on a paid IPU-POD16.
We will look at a range of common NLP workloads and consider the following:
- How good is Flan-T5, really?
- How do I run Flan-T5 on IPUs?
- What can I use Flan-T5 for?
- Why would I move up to Flan-T5-XL?
How good is Flan-T5, really?
Let’s start by looking at some performance numbers from the Google-authored paper:
These results are astounding. Notice that:
- Flan-T5 performs ~2x better than T5 in MMLU, BBH & MGSM
- In TyDiQA we even see the emergence of new abilities
- Flan-T5-Large is better than all previous variants of T5 (even XXL)
This establishes Flan-T5 as an entirely different beast to the T5 that you may know. Now let’s see how Flan-T5-Large and Flan-T5-XL compare to other models in the MMLU benchmark:
Noting that Flan-T5 had MMLU held out from training, this table shows that:
- Flan-T5-Large and Flan-T5-XL (with 0.8B and 3B parameters respectively) perform similarly to other models with significantly more parameters, for example GPT-3 (175B parameters) and Galactica (120B parameters).
- GPT-3 needs to be fine-tuned for the benchmark task in order to beat Flan-T5-XL.
- Flan-T5 outperforms smaller versions of more recent LLMs like PaLM and LLaMA (while also being multiple times smaller).
How do I run Flan-T5 on IPUs?
Since the Flan-T5 checkpoints are available on Hugging Face, you can use Graphcore’s Hugging Face integration (🤗 Optimum Graphcore) to easily run Flan-T5 with a standard inference pipeline.
If you already have an existing Hugging Face-based application that you’d like to try on IPUs, then it is as simple as:
- from transformers import pipeline
+ from optimum.graphcore import pipeline
- text_generator = pipeline("text2text-generation", model="google/flan-t5-large")
+ text_generator = pipeline("text2text-generation", model="google/flan-t5-large", ipu_config="Graphcore/t5-large-ipu")
text_generator("Please solve the following equation: x^2 - 9 = 0")
[{'generated_text': '3'}]
Now let’s define a text generator of our own to use in the rest of this notebook. First, make sure that your Python virtual environment has the latest version of 🤗 Optimum Graphcore installed:
%pip install "optimum-graphcore>=0.6.1, <0.7.0"
The location of the cache directories can be configured through environment variables or directly in the notebook:
import os
executable_cache_dir=os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache/")
num_available_ipus=int(os.getenv("NUM_AVAILABLE_IPU", 4))
Next, let’s import pipeline
from optimum.graphcore
and create our Flan-T5 pipeline for the appropriate number of IPUs:
from optimum.graphcore import pipeline
size = {4: "large", 16: "xl"}
flan_t5 = pipeline(
"text2text-generation",
model=f"google/flan-t5-{size[num_available_ipus]}",
ipu_config=f"Graphcore/t5-{size[num_available_ipus]}-ipu",
max_input_length=896,
ipu_config=ipu_config,
)
flan_t5.model.ipu_config.executable_cache_dir = executable_cache_dir
Now, let’s ask it some random questions:
questions = [
"Solve the following equation for x: x^2 - 9 = 0",
"At what temperature does nitrogen freeze?",
"In order to reduce symptoms of asthma such as tightness in the chest, wheezing, and difficulty breathing, what do you recommend?",
"Which country is home to the tallest mountain in the world?"
]
for out in flan_t5(questions):
print(out)
Graph compilation: 100%|██████████| 100/100 [05:20<00:00]
Graph compilation: 100%|██████████| 100/100 [02:56<00:00]
{'generated_text': '3'}
{'generated_text': '-32 °C'}
{'generated_text': 'ibuprofen'}
{'generated_text': 'nepal'}
Note that some of these answers may be wrong, information retrieval from the model itself is not the purpose of Flan-T5. However, if you use Flan-T5-XL they are less likely to be wrong (come back to this notebook with an IPU-POD16 to see the difference!)
What can I use Flan-T5 for?
Flan-T5 has been fine-tuned on thousands of different tasks across hundreds of datasets. So no matter what your task might be, it’s worth seeing if Flan-T5 can meet your requirements. Here we will demonstrate a few of the common ones:
Sentiment Analysis
sentiment_analysis = (
"Review: It gets too hot, the battery only can last 4 hours. Sentiment: Negative\n"
"Review: Nice looking phone. Sentiment: Positive\n"
"Review: Sometimes it freezes and you have to close all the open pages and then reopen where you were. Sentiment: Negative\n"
"Review: Wasn't that impressed, went back to my old phone. Sentiment:"
)
flan_t5(sentiment_analysis)[0]["generated_text"]
Negative
Advanced Named Entity Recognition
The following snippets are adapted from the Wikipedia pages corresponding to each mentioned company.
advanced_ner = """Microsoft Corporation is a company that makes computer software and video games. Bill Gates and Paul Allen founded the company in 1975
[Company]: Microsoft, [Founded]: 1975, [Founders]: Bill Gates, Paul Allen
Amazon.com, Inc., known as Amazon , is an American online business and cloud computing company. It was founded on July 5, 1994 by Jeff Bezos
[Company]: Amazon, [Founded]: 1994, [Founders]: Jeff Bezos
Apple Inc. is a multinational company that makes personal computers, mobile devices, and software. Apple was started in 1976 by Steve Jobs and Steve Wozniak."""
flan_t5(advanced_ner)[0]["generated_text"]
[Company]: Apple, [Founded]: 1976, [Founders]: Steve Jobs, Steve Wozniak
Question Answering
The following snippet came from the squad dataset.
context = 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.'
question = "Which NFL team represented the AFC at Super Bowl 50?"
# The correct answer is Denver Broncos
flan_t5(f"{context} {question}")[0]['generated_text']
Denver Broncos
Intent Classification
intent_classification = """[Text]: I really need to get a gym membership, I'm exhausted.
[Intent]: get gym membership
[Text]: What do I need to make a carbonara?
[Intent]: cook carbonara
[Text]: I need all these documents sorted and filed by Monday.
[Intent]:"""
flan_t5([intent_classification])[0]["generated_text"]
file documents
Summarization
The following snippets came from the xsum dataset.
summarization="""
Document: Firstsource Solutions said new staff will be based at its Cardiff Bay site which already employs about 800 people.
The 300 new jobs include sales and customer service roles working in both inbound and outbound departments.
The company's sales vice president Kathryn Chivers said: "Firstsource Solutions is delighted to be able to continue to bring new employment to Cardiff."
Summary: Hundreds of new jobs have been announced for a Cardiff call centre.
Document: The visitors raced into a three-goal first-half lead at Hampden.
Weatherson opened the scoring with an unstoppable 15th-minute free-kick, and he made it 2-0 in the 27th minute.
Matt Flynn made it 3-0 six minutes later with a fine finish.
Queen's pulled a consolation goal back in stoppage time through John Carter.
Summary: Peter Weatherson netted a brace as Annan recorded only their second win in eight matches.
Document: Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday.
Detectives said three firearms, ammunition and a five-figure sum of money were recovered.
A 26-year-old man who was arrested and charged appeared at Edinburgh Sheriff Court on Thursday.
Summary:
"""
flan_t5(summarization)[0]["generated_text"]
A man has been arrested after a firearm was found in a property in Edinburgh.
Text Classification
text_classification_1 = """A return ticket is better value than a single.
topic: travel cost
You can start from the basic stitches, and go from there.
topic: learning knitting
The desk which I bought yesterday is very big.
topic: furniture size
George Washington was president of the United States from 1789 to 1797.
topic:"""
flan_t5(text_classification_1)[0]["generated_text"]
George Washington presidency
text_classification_2 = """FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models - it is an enhanced version of T5 that has been finetuned in a mixture of tasks.
keywords: released, enhanced, finetuned
The IPU, or Intelligence Processing Unit, is a highly flexible, easy-to-use parallel processor designed from the ground up for AI workloads.
keywords: processor, AI
Paperspace is the platform for AI developers. providing the speed and scale needed to take AI models from concept to production.
keywords:"""
flan_t5(text_classification_2)[0]["generated_text"]
paperspace, AI, scale
Why would I move up to Flan-T5-XL?
As we saw earlier, when looking at the results from the paper, Flan-T5-XL is roughly 40% (on average) better than Flan-T5-Large across its validation tasks. Therefore when deciding if Flan-T5-XL is worth the cost for you, ask yourself the following questions:
- Does my data need greater linguistic understanding for the task to be performed?
- Is my task too complicated for a model as small as Flan-T5-Large and too easy for a model as large as GPT-3?
- Does my task require longer output sequences that Flan-T5-XL is needed to generate?
To demonstrate, let us now look at an example of a task where the answer to all of the above questions is yes. Let’s say you have a customer service AI that you use to answer basic questions in order to reduce the workload of your customer service personnel. This needs:
- Strong linguistic ability to both parse and generate medium-sized chunks of text.
- An LLM that is able to learn well from context, but doesn’t have all of human history embedded in its parameters.
- The ability to produce multiple-sentence responses, but not much longer than this.
Looking at the code below, we see some context about Graphcore provided in the input, as well as a primer for a conversational response from the model. As you can see from the example, Flan-T5-XL was able to understand the information provided in the context and provide useful and natural answers to the questions it was asked.
from IPython.display import clear_output
class ChatBot:
def __init__(self, model, context) -> None:
self.model = model
self.initial_context = context
self.context = self.initial_context
self.user, self.persona = [x.split(":")[0] for x in context.split("\n")[-2:]]
def ask(self, question):
question += "." if question[-1] not in [".", "?", "!"] else ""
x = f"{self.context}\n{self.user}: {question}\n{self.persona}: "
# print(f"\n{x}\n")
y = self.model(x)
response = y[0]["generated_text"]
self.context = f"{x}{response}"
return response
def session(self):
print("Starting session", flush=True)
prompt = input()
while prompt != "":
if prompt == "reset":
clear_output()
print("Starting session", flush=True)
self.context = self.initial_context
prompt = input()
print(f"{self.user.title()}: {prompt}", flush=True)
answer = self.ask(prompt)
print(f"{self.persona.title()}: {answer}", flush=True)
prompt = input()
print("Ending session", flush=True)
context = f"""This is a conversation between a [customer] and a [virtual assistant].
The [virtual assistant] works at Graphcore. Here is some informaton about Graphcore:
- Graphcore is located in Bristol.
- Graphcore invented the intelligence processing unit (IPU). It is purpose built for AI applications.
- The currently available IPU models are: Classic IPU, Bow IPU, C600.
- IPUs are available on: Paperspace, Gcore Cloud and Graphcloud.
[virtual assistant]: Hello, welcome to Graphcore, how can I help you today?
[customer]: I'd like to ask some questions about your company.
[virtual assistant]: Ok, I can help you with that."""
chatbot = ChatBot(flan_t5, context)
chatbot.session()
Starting session
[Customer]: What is an IPU?
[Virtual Assistant]: The Intelligence Processing Unit (IPU) is a computer chip that is used to process artificial intelligence.
[Customer]: Who makes it?
[Virtual Assistant]: Graphcore is the manufacturer of the IPU.
[Customer]: Can I use them?
[Virtual Assistant]: Yes, I'm sure you can.
[Customer]: Where?
[Virtual Assistant]: The IPU is available on Paperspace, Gcore and Graphcloud.
Ending session
flan_t5.model.detachFromDevice()
Conclusion
In summary, the answers to the questions we posed in the introduction are:
How good is Flan-T5, really?
A: Twice as good as T5 and on par with GPT-3 according to the MMLU benchmark.
How do I run Flan-T5 on IPUs?
A: Change one import and add one keyword argument to your pipeline instantiation.
What can I use Flan-T5 for?
A: Given its wide variety of fine-tuned tasks, almost anything.
Why would I move up to Flan-T5-XL?
A: For an approximately 40% performance increase over Flan-T5-Large, enabling more demanding tasks.
If you’d like to learn more about how we got T5 to work properly in Float16, see our technical blog on the subject.
You can also try other variations of T5 on IPUs:
- Zero-Shot Text Classification on IPUs using MT5-Large — Inference
- Machine Translation on IPUs using MT5-Small — Fine-tuning
- Summarization on IPU using T5-Small — Fine-Tuning
If you’d like to continue exploring NLP on the IPU, take a look at our GPT-J Fine-Tuning blog and corresponding notebook.