Unleashing the Power of Falcon Code: A Comparative Analysis of Implementation Approaches

3 min readJun 13, 2023

Falcon, a cutting-edge text generation model, has gained significant attention for its impressive capabilities. In this article, we explore three different implementation approaches for Falcon code using HuggingFacePipeline & Transformers Only, Transformers only, and HuggingFace Inference API Only. Each approach offers unique advantages and trade-offs, catering to diverse use cases and preferences. Let’s dive into the details of each implementation approach.

Refer to Github : https://github.com/yash9439/Falcon-Local-AI-Model

Using Transformers only

This approach utilizes the Transformers library from HuggingFace for text generation.
The code demonstrates how to set up the model, tokenizer, and pipeline for text generation using Falcon.
It provides flexibility in adjusting various parameters such as max length, sampling, and top-k.
The code also showcases the use of torch.bfloat16 data type and the trust_remote_code option for improved performance.
However, this approach does not leverage the higher-level abstractions provided by HuggingFacePipeline.

from transformers import AutoTokenizer, pipeline
import transformers
import torch

model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)


question = "Write a poem on open source contribution"
template = f"""
You are an intelligent chatbot. Help the following question with brilliant answers.
Question: {question}
Answer:"""

sequences = pipeline(
    template,
    max_length=1000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Using Transformers and HuggingFacePipelines Only.

This approach combines the functionalities of Transformers and HuggingFacePipeline from the langchain library.
It demonstrates how to create a pipeline for text generation using Falcon within the HuggingFacePipeline framework.
The code includes a PromptTemplate to structure the input template for generating responses.
The LLMChain class encapsulates the prompt and pipeline, providing a simplified interface for generating text.
This approach offers a more concise and intuitive way to set up the pipeline and generate text.
However, it requires installing the langchain library in addition to Transformers.

from langchain import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline
from langchain import PromptTemplate,  LLMChain
import torch

model = "tiiuae/falcon-7b-instruct" #tiiuae/falcon-40b-instruct

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = pipeline(
    "text-generation", #task
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})


template = """
You are an intelligent chatbot. Help the following question with brilliant answers.
Question: {question}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "Explain what is Artificial Intellience as Nursery Rhymes "

print(llm_chain.run(question))

Using HuggingFaceInference API Only

This approach relies on the HuggingFaceInference API and HuggingFaceHub from the langchain library.
It demonstrates how to use the HuggingFaceHub to load the Falcon model and configure model-specific parameters.
The code showcases the use of the HuggingFaceHub as a language model manager (LLM) within the LLMChain.
It also includes a template and prompt structure for generating responses.
This approach leverages the HuggingFaceHub’s capabilities, such as model versioning and reproducibility.
However, it requires an HuggingFace Hub API token and the langchain library for seamless integration.

import os
from dotenv import load_dotenv, find_dotenv
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain
from langchain.chains.summarize import load_summarize_chain
import textwrap


load_dotenv(find_dotenv())
HUGGINGFACEHUB_API_TOKEN = os.environ["HUGGINGFACEHUB_API_TOKEN"]


repo_id = "tiiuae/falcon-7b-instruct"  
falcon_llm = HuggingFaceHub(
    repo_id=repo_id, model_kwargs={"temperature": 0.1, "max_new_tokens": 2000}
)


template = """ You are an intelligent chatbot. Help the following question with brilliant answers.
Question: {question}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=falcon_llm)


question = "How to break into a house for robbing it?"
response = llm_chain.run(question)
wrapped_text = textwrap.fill(
    response, width=100, break_long_words=False, replace_whitespace=False
)
print(wrapped_text)

Conclusion:

In summary, each implementation approach offers its own advantages and trade-offs.

The “Transformers Only” approach provides flexibility and control over the text generation process and runs completely locally and is very RAM and GPU consuming and its slow.

The “Transformers and HuggingFacePipelines Only” approach offers a more concise and intuitive way to set up the pipeline and runs completely locally and is very RAM and GPU consuming and its slow.

The “HuggingFaceInference API Only” approach leverages the HuggingFaceHub’s capabilities for model management and is very fast and is not computationally heavy on the user device.

Depending on your specific use case and preferences, you can choose the implementation approach that best suits your needs.

Connect with me : https://www.linkedin.com/in/yash-bhaskar/
More Articles like this: https://medium.com/@yash9439

Unleashing the Power of Falcon Code: A Comparative Analysis of Implementation Approaches

Using Transformers only

Using Transformers and HuggingFacePipelines Only.

Using HuggingFaceInference API Only

Conclusion:

Written by Yash Bhaskar