Hugging Face Pipeline: Quick tour

Published in

Women in Technology

6 min readAug 6, 2024

https://saleconvs.shop/product_details/110357360.html

To begin with the Hugging face transformer it is necessary to understand what a pipeline is and its advantages. Let us consider an example of Ready-to-eat food products that are in high demand. These convenient options are ideal for busy individuals or those looking for quick and hassle-free meal solutions. Pipelines serve the same purpose.

What is a pipeline?

The pipelines are a great and easy way to use pre-trained models for inference. The Hugging Face pipeline provides a simple API for directly using transformer models for various NLP tasks like text classification, named entity recognition, and more.

pipeline() function returns an end-to-end object, out-of-the-box for many tasks across different modalities, some of which are shown in the table below:

https://huggingface.co/docs/transformers/en/quicktour

Pipelines are made of three parts:

Tokenizer which maps raw textual input to token. The Hugging Face tokenizer efficiently transforms raw text into a format suitable for transformer models, facilitating various NLP tasks. Its integration with pre-trained models ensures compatibility and optimizes the preprocessing pipeline for effective model training and inference.

Eg: [‘EU’, ‘rejects’, ‘German’, ‘call’, ‘to’, ‘boycott’, ‘British’, ‘lamb’, ‘.’] are tokenized as [101, 7327, 19164, 2446, 2655, 2000, 17757, 2329, 12559, 1012, 102]. It automatically adds special tokens like [CLS] (classification token) at the beginning and [SEP] (separator token) at the end of sequences. 101 is [CLS] and 102 stands for [SEP].

Models that are used to make predictions from the inputs.
Post-processing for enhancing the model’s output (which is optional).

Parameters of pipeline() function:

https://huggingface.co/docs/transformers/v4.43.2/en/main_classes/pipelines#transformers.pipeline.returns

The pipeline() is the most powerful object encapsulating all other Task-specific pipelines.

task (str): Currently accepted tasks are listed here:

"audio-classification": will return a AudioClassificationPipeline.
"automatic-speech-recognition": will return a AutomaticSpeechRecognitionPipeline.
"depth-estimation": will return a DepthEstimationPipeline.
"document-question-answering": will return a DocumentQuestionAnsweringPipeline.
"feature-extraction": will return a FeatureExtractionPipeline.
"fill-mask": will return a FillMaskPipeline:.
"image-classification": will return a ImageClassificationPipeline.
"image-feature-extraction": will return an ImageFeatureExtractionPipeline.
"image-segmentation": will return a ImageSegmentationPipeline.
"image-to-image": will return a ImageToImagePipeline.
"image-to-text": will return a ImageToTextPipeline.
"mask-generation": will return a MaskGenerationPipeline.
"object-detection": will return a ObjectDetectionPipeline.
"question-answering": will return a QuestionAnsweringPipeline.
"summarization": will return a SummarizationPipeline.
"table-question-answering": will return a TableQuestionAnsweringPipeline.
"text2text-generation": will return a Text2TextGenerationPipeline.
"text-classification" (alias "sentiment-analysis" available): will return a TextClassificationPipeline.
"text-generation": will return a TextGenerationPipeline:.
"text-to-audio" (alias "text-to-speech" available): will return a TextToAudioPipeline:.
"token-classification" (alias "ner" available): will return a TokenClassificationPipeline.
"translation": will return a TranslationPipeline.
"translation_xx_to_yy": will return a TranslationPipeline.
"video-classification": will return a VideoClassificationPipeline.
"visual-question-answering": will return a VisualQuestionAnsweringPipeline.
"zero-shot-classification": will return a ZeroShotClassificationPipeline.
"zero-shot-image-classification": will return a ZeroShotImageClassificationPipeline.
"zero-shot-audio-classification": will return a ZeroShotAudioClassificationPipeline.
"zero-shot-object-detection": will return a ZeroShotObjectDetectionPipeline.

model (str or PreTrainedModel or TFPreTrainedModel, optional): The model that will be used by the pipeline to make predictions. This can be a model identifier or an actual instance of a pretrained model inheriting from PreTrainedModel (for PyTorch) or TFPreTrainedModel (for TensorFlow). If not provided, the default for the task will be loaded.

Example:

1. Sentiment Analysis

It classifies the sentences into POSITIVE or NEGATIVE labels with a given probability score based on the type of sentiment or emotion expressed in the sentence.

from transformers import pipeline

text1 ='''Descriptions of the the characters were all very well done. 
The story line too was good. Looking forward to read more of Elin .'''
text2 = '''The story is stupid, but not more than certain TV series. 
The characters and cartoon-like. It's a book just to pass time.'''

Sentiment_Analys_pipeline = pipeline("sentiment-analysis")

result1 = sentimentAnalysis_pipeline(text1)
result2 = sentimentAnalysis_pipeline(text2)
result1,result2

2. Text generation

You can generate text using a pre-trained transformer model by providing a starting text prompt to the model and letting it predict the subsequent words based on the learned language patterns.

text_gen_pipeline = pipeline('text-generation', model='gpt2')
prompt = "First boil the water and add tea powder"
text_gen_pipeline(prompt, max_length=30)

3. Question Answering

text = """For the second time in 3 weeks, foreign ministers of India and China, S Jaishankar and Wang Yi respectively, met Thursday 
agreeing significantly on the need for a “strong guidance” to complete the disengagement process in eastern Ladakh, an issue 
blocking return of normalcy to bilateral ties.
Jaishankar’s remark stating the same in a post on X seemed to improve upon the outcome of their previous meeting in Kazakhstan 
earlier this month where he called for redoubling efforts to achieve complete disengagement and both leaders agreed prolongation
of the border situation was not in the interest of either side. """

QA = pipeline("question-answering")
question = "Who is the foreign minister of India?"
Ans1 = QA(question=question, context=text)
question = "What is the outcome of the meeting?"
Ans2 = QA(question=question, context=text)

4. Text Summarization

ARTICLE = """The Apollo program, also known as Project Apollo, was the third United States human spaceflight program carried out by the National Aeronautics and Space Administration (NASA), which accomplished landing the first humans on the Moon from 1969 to 1972.
First conceived during Dwight D. Eisenhower's administration as a three-man spacecraft to follow the one-man Project Mercury which put the first Americans in space,
Apollo was later dedicated to President John F. Kennedy's national goal of "landing a man on the Moon and returning him safely to the Earth" by the end of the 1960s, which he proposed in a May 25, 1961, address to Congress. 
Project Mercury was followed by the two-man Project Gemini (1962–66). 
The first manned flight of Apollo was in 1968.
Apollo ran from 1961 to 1972, and was supported by the two-man Gemini program which ran concurrently with it from 1962 to 1966. 
Gemini missions developed some of the space travel techniques that were necessary for the success of the Apollo missions.
Apollo used Saturn family rockets as launch vehicles. 
Apollo/Saturn vehicles were also used for an Apollo Applications Program, which consisted of Skylab, a space station that supported three manned missions in 1973–74, and the Apollo–Soyuz Test Project, a joint Earth orbit mission with the Soviet Union in 1975.
 """

summary=summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]
summary['summary_text']

As of now, the Hugging Face pipeline is primarily focused on NLP tasks and does not directly support computer vision tasks.

Fine-tuning in Hugging Face involves taking a pre-trained transformer and further training it on a specific dataset or task to adapt its parameters and improve performance for the specific task. Hugging Face pipeline can also be used for fine-tuned pre-trained transformer models.

Example:

fine_tuned_pipeline=pipeline("ner",model=model_fine_tuned,tokenizer=tokenizer)

EndNote:

Thanks for reading the blog. Have thoughts or questions? We’d love to hear from you! Feel free to leave a comment below.

Looking forward to staying in touch through Linkedin. Mail me here for any queries.

Stay tuned for more exciting content till then Happy reading!!!!

I believe in the power of continuous learning and sharing knowledge with the community. Your contributions are invaluable in helping me create meaningful content and resources that benefit everyone. Join me on this journey of exploration and innovation in the fascinating world of data science by donating to Buy Me a Coffee.

Hugging Face Pipeline: Quick tour

Example:

1. Sentiment Analysis

EndNote:

Written by Pallavi Padav