Introducing Falcon LLM: Leading the Way in Language Generation

5 min readJun 12, 2023

In the world of natural language processing, the Falcon models have emerged as groundbreaking causal language models (LLMs). Developed and trained by the Technology Innovation Institute (TII) of Abu Dhabi, this models have garnered immense attention due to their exceptional performance and diverse applications. In this article, we will explore the intricacies of the Falcon models, their training methodologies, compare them with ChatGPT and GPT4, discuss code implementation, delve into their use cases, and analyze the differences between the Falcon-40B, Falcon-7B, and instruct versions.

Refer to Github : https://github.com/yash9439/Falcon-Local-AI-Model

Understanding the Falcon Models:

Falcon-40B is an expansive language model with 40 billion parameters, specifically designed as a causal decoder-only model.

A causal language model predicts the next token in a sequence by focusing on the left context during training. It generates coherent text in a sequential manner, trained on large-scale datasets to learn language patterns. It excels in tasks requiring contextually appropriate text generation.

These models excel at a wide range of language tasks, including question-answering, reasoning, and generating human-like text. Falcon models, such as Falcon-40B and Falcon-7B, surpass the performance of other notable language models like LLaMA, StableLM, RedPajama, and MPT. The key to Falcon’s success lies in its architecture and training methodology.

The Falcon models incorporate the FlashAttention method, a technique that enables faster and optimized inference. This method allows Transformers to be trained more efficiently compared to existing benchmarks.

Training Data:

The Falcon models owe their capabilities to the Falcon RefinedWeb dataset, a multimodal-friendly corpus crafted by TII (Training Infrastructure Intelligence). This dataset, combined with curated corpora from various sources, serves as the foundation for training the Falcon models. The training process involved utilizing a staggering 1000 billion tokens from the refined-web dataset.

# Note: Please ensure that you have the "datasets" library installed by using the command "pip install datasets" if it is not already installed.
from datasets import load_dataset
rw = load_dataset("tiiuae/falcon-refinedweb")

Position on OpenLLM Leaderboar:

The instruct version of Falcon-40B is ranked first on the OpenLLM leaderboard.

The OpenLLM leaderboard evaluates the performance of LLMs on 4 tasks:

AI2 Reasoning Challenge (25-shot): Tests grade-school science questions, assessing a model’s ability to reason and provide accurate answers.
HellaSwag (10-shot): Evaluates commonsense inference, a challenging task for models, measuring their ability to make contextually consistent responses.
MMLU (5-shot): Measures multitask accuracy across 57 diverse tasks, testing a model’s ability to generalize and perform across various domains.
TruthfulQA (0-shot): Assesses a model’s truthfulness in generating accurate answers, ensuring reliability and factual accuracy.

Instruct Version

The instruct versions of Falcon-40B and 7B, namely Falcon-40B-Instruct and Falcon-7B-Instruct, exhibit even greater performance.

The training process involved fine-tuning the models on a dataset comprising 250 million tokens. This dataset was a combination of chat and instruct data obtained from various sources, including Baize, GPT4all, GPTeacher, as well as 13 million tokens from the RefinedWeb corpus.

It’s important to note that Baize is a dataset generated by ChatGPT. Consequently, when considering the use of the instruct versions of Falcon models in commercial applications, caution is advised.

Problem with Falcon 40B and Falcon 40B — instruct is that its painfully slow.
While Falcon 7B are very fast in terms of genertating output.

Implementing the Falcon Models

To utilize the power of the Falcon models, the community can leverage the pretrained versions made available by TII. By accessing the Falcon models, developers and researchers can harness their capabilities for a wide range of applications, including chatbots, language generation, and knowledge retrieval. While the exact implementation details may vary, utilizing frameworks such as TensorFlow or PyTorch, combined with appropriate code snippets provided by TII, will allow users to tap into the potential of the Falcon models.

!pip install -q transformers einops accelerate langchain bitsandbytes

from langchain import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline
import torch

model = "tiiuae/falcon-7b-instruct" #tiiuae/falcon-40b-instruct

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = pipeline(
    "text-generation", #task
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})

from langchain import PromptTemplate,  LLMChain

template = """
You are an intelligent chatbot. Help the following question with brilliant answers.
Question: {question}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "Explain what is Artificial Intellience as Nursery Rhymes "

print(llm_chain.run(question))

Exploring Use Cases and Model Variants

The Falcon models offer immense potential across various domains and use cases. Their ability to perform tasks such as question-answering, language translation, and document summarization makes them invaluable in fields like customer support, content generation, and research assistance. The Falcon-40B and Falcon-7B models serve as robust options for different hardware and fine-tuning requirements. Additionally, the instruct versions of Falcon-40B and Falcon-7B provide specialized models fine-tuned on chat/instruct datasets.

Fine Tuning

Fine-tuning a Falcon model involves training it on specific data to further customize its capabilities. Publicly available fine-tuning datasets can be used to adapt the model according to specific requirements. While fine-tuning a decoder-only architecture model like Falcon may require a GPU with ample memory and finding the appropriate fine-tuning template, it is a rewarding process that offers more tailored results. Getting started with fine-tuning can be challenging at first, but once you delve into it, it becomes easier to navigate.

Conclusion

Falcon models offer a wide range of possibilities in various domains, and their different variants cater to different needs. By fine-tuning these models, you can tailor them to address specific tasks and achieve more accurate and personalized results.

Connect with me : https://www.linkedin.com/in/yash-bhaskar/
More Articles like this: https://medium.com/@yash9439