LLM by Tooling — Hugging Face and Transformers Introduction

MB20261
4 min readJan 19, 2024

--

Who is Hugging Face?

Hugging Face is a thriving open-source community focused on developing tools that empower users to create, train, and deploy machine learning models using open-source code and technologies. With its toolkit, Hugging Face facilitates seamless sharing of tools, models, model weights, and datasets among fellow practitioners.

Hugging Face provides below major Features:

  • Transformers:
    - State-of-the-art machine learning libraries developed for NLP, Vision, Audio, Multimodal, etc.
    - Easy to use Tokenizers, Evaluate APIs, pre-defined Tasks and Pipelines, Auto Train APIs and Safe tensors
    - Use Optimum to expand performance on target hardware architecture
  • Hugging Face Hub:
    - A community for people to share models and datasets. Hugging Face also publish their contributions in the Hub.
    - A client library for Hugging Face Hub in both Python and JavaScript
    - Open LLM leaderboard
  • Spaces, delegated computation power, and inference endpoints
    - Free or paid hosting service for models, datasets, visualizations, training, etc.
  • Enterprise solutions

What is Transformers?

Transformers are a cutting-edge neural network architecture pivotal for processing sequential data such as text, introduced by Hugging Face since 2017. They differ from traditional RNNs and CNNs by efficiently capturing long-range dependencies, enabling superior understanding of context. Their fast training capabilities have made them popular for building large language models like GPTs and BERT, revolutionizing applications from machine translation to natural language processing. As they streamline AI tasks across industries, transformers have become a fundamental technology for advancing AI’s interaction with human language.

How to Install Transformers?

Hugging Face website provides you very detailed install steps.

We are not going to repeat the same information, but if you are new to AI and want to play yourself, here are a few tips:

  • First of all, you need a play ground. If you want to start from your personal computer, please check out below articles on how to setup.
  • Although it is not necessary to have programming knowledge to gain some experience on LLMs, it is recommended to pick up at least some Python scripting knowledge. It is delight your journey. Just in case if you need, below are a few articles that will help you.

Use Cases

Transformers provides a set of easy to use APIs (called pipelines) for common use case, including:

Computer Vision

  • Depth Estimation: Predict depth of the objects inside an image
  • Image Classification: Classify images into pre-trained labels
  • Image Segmentation: partially transform image
  • Image-to-Image: Transform one image to another by changing styles, color, painting, resolution, etc.
  • Mask Generation: Generate sample images from given conditional classes
  • Object Detection: Identify objects out of given image
  • Video Classification: Classify videos into pre-trained labels
  • Unconditional Image Generation: Instead of generating images from original images, this use case is to generate totally new image without any specify inputs.
  • Zero-Shot Image Classification: Given an image and a few pre-trained labels, zero-shot image classification is to identify which label is not showing up.
  • Zero-Shot Object Detection: identify objects from an image

Natural Language Processing

  • Conversational: Generate relevant conversation basing on given text input (called prompts)
  • Fill-Mask: Predict words in the middle of sentence to have the sentence completed.
  • Question Answering: Retrieve answers from given text for given question
  • Sentence Similarity: Determine how similar (in meaning) of two texts are.
  • Summarization: generate a shorter version of given text input
  • Table Question Answering: Answer questions from a table of data
  • Text Classification: Classify text input
  • Text Generation: Create new text to follow the given text inputs
  • Token Classification: Tag the text input with multiple pre-trained labels
  • Translation: Convert text from one language to another
  • Zero-Shot Classification: Categorize given text input (by pre-trained labels)

Audio

  • Audio Classification: label an audio
  • Audio-to-Audio: enhance speech or separate sources from an audio media
  • Automatic Speech Recognition (STT): transcribe audio to text output
  • Text-to-Speech (TTS): Generate speech by text input

Multimodal

  • Document Question Answering: Answer questions basing on Document images.
  • Feature Extraction: Used to reduce redundant data for ML
  • Image-to-Text: Describe images to text output
  • Text-to-Image: Generate images from text input
  • Text-to-Video: Generate videos from text input
  • Visual Question Answering: Answer questions from image(s)
  • Text-to-3D: Generate 3D images from text inputs
  • Image-to-3D: Generate 3D images from 2D images

Have Fun!

--

--

MB20261

Digital Transformation | FinOps | DevOps | AI | Software Architecture/Solutions | Microservices | Data Lake | Kubernetes | Python | SpringBoot | Certifications