Deep Learning with Hugging Face models

Avinash
6 min readMar 5, 2024

--

Hugging Face is a platform and community that provides various natural language processing (NLP) tools, libraries, and pretrained models. It is known for its contributions to the field of NLP, including state-of-the-art models such as BERT, GPT, and Transformer. Hugging Face offers easy access to pretrained models, fine-tuning capabilities, and a collaborative environment for researchers and developers to explore and advance NLP technology.

Components in Hugging Face :

  1. Tokenizers : Tokenizers are tools used to convert raw text into tokens, which are the basic units of text processing. They segment text into individual words, subwords, or characters, making it easier for models to understand and process language. Hugging Face provides various tokenizers optimized for different tasks and languages, including byte pair encoding (BPE) and wordpiece tokenization.
from transformers import BertTokenizer

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# See how many tokens are in the vocabulary
tokenizer.vocab_size
# 30522




# Tokenize the sentence
tokens = tokenizer.tokenize("I heart Generative AI")

# Print the tokens
print(tokens)
# ['i', 'heart', 'genera', '##tive', 'ai']

# Show the token ids assigned to each token
print(tokenizer.convert_tokens_to_ids(tokens))
# [1045, 2540, 11416, 6024, 9932]

To Read about More Bert tokenizer :

https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer

You can refer my complete blog on Bert in details here :

2. Models: Models in Hugging Face refer to pretrained neural network architectures designed for various natural language processing tasks, such as text classification, named entity recognition, and text generation. These models are trained on large datasets and fine-tuned for specific tasks, allowing users to leverage state-of-the-art NLP capabilities without the need for extensive training. Hugging Face offers a wide range of pretrained models, including BERT, GPT, RoBERTa, and many others.

from transformers import BertForSequenceClassification, BertTokenizer


model_name = "textattack/bert-base-uncased-imdb"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)


tokenizer = BertTokenizer.from_pretrained(model_name)

inputs = tokenizer("I love Football and Cricket", return_tensors="pt")


with torch.no_grad():
outputs = model(**inputs).logits
probabilities = torch.nn.functional.softmax(outputs, dim=1)
predicted_class = torch.argmax(probabilities)


if predicted_class == 1:
print(f"Sentiment: Positive ({probabilities[0][1] * 100:.2f}%)")
else:
print(f"Sentiment: Negative ({probabilities[0][0] * 100:.2f}%)")

tensor([[0.1659, 0.8341]])

tensor(1)

Sentiment: Positive (83.41%)

import torch
from transformers import BertForSequenceClassification, BertTokenizer

model_name = "textattack/bert-base-uncased-imdb"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

tokenizer = BertTokenizer.from_pretrained(model_name)

inputs = tokenizer("I hate Football and Cricket", return_tensors="pt")

with torch.no_grad():
outputs = model(**inputs).logits
probabilities = torch.nn.functional.softmax(outputs, dim=1)
print(probabilities)
predicted_class = torch.argmax(probabilities)
print(predicted_class)

if predicted_class == 1:
print(f"Sentiment: Positive ({probabilities[0][1] * 100:.2f}%)")
else:
print(f"Sentiment: Negative ({probabilities[0][0] * 100:.2f}%)")
tensor([[0.5498, 0.4502]])
tensor(0)
Sentiment: Negative (54.98%)

3.Datasets : Datasets are collections of labeled or unlabeled text data used for training and evaluating NLP models. Hugging Face provides access to a diverse array of datasets spanning different languages, domains, and tasks. These datasets are curated, annotated, and often benchmarked, making them valuable resources for researchers and practitioners working on NLP tasks. Users can easily download, preprocess, and use datasets from the Hugging Face library in their projects.

!pip install datasets
import torch
from datasets import load_dataset

# Load a dataset from the Hugging Face Hub
dataset = load_dataset("imdb")

# Print dataset information
print(dataset)

# Access dataset splits (e.g., train, test)
train_data = dataset["train"]
test_data = dataset["test"]

# Iterate through dataset examples
for example in train_data[:5]: # Print the first 5 examples
print(example)
import torch
from torch.utils.data import Dataset, DataLoader
from datasets import load_dataset


class HFDataset(Dataset):
def __init__(self, dataset):
self.dataset = dataset

def __len__(self):
return len(self.dataset)

def __getitem__(self, idx):
return self.dataset[idx]


hf_dataset = load_dataset("imdb")


custom_dataset = HFDataset(hf_dataset["train"])


batch_size = 32
shuffle = True
num_workers = 2
data_loader = DataLoader(custom_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)


for batch in data_loader:
# Process each batch
inputs = batch['text']
labels = batch['label']

4. Trainers : Trainers are components used to facilitate the training and fine-tuning of NLP models on custom datasets. Hugging Face provides trainers that streamline the training process by handling tasks such as batching, optimization, and evaluation. These trainers support various training strategies, including distributed training across multiple GPUs or TPUs. Additionally, trainers in Hugging Face often come with preconfigured settings and hyperparameters, making it easier for users to train models efficiently.

from transformers import (DistilBertForSequenceClassification,
DistilBertTokenizer,
TrainingArguments,
Trainer
)
from datasets import load_dataset

model = DistilBertForSequenceClassification.from_pretrained(
"distilbert-base-uncased", num_labels=2
)
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)


dataset = load_dataset("imdb")
tokenized_datasets = dataset.map(tokenize_function, batched=True)

training_args = TrainingArguments(
per_device_train_batch_size=64,
output_dir="./results",
learning_rate=2e-5,
num_train_epochs=3,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
)
trainer.train()

#custom callback

class CustomCallback:
def on_epoch_end(self, args, state, control, model, **kwargs):
if state.epoch % 1 == 0: # Print metrics every epoch
print(f"Epoch {state.epoch} Training Loss: {np.mean(state.loss_epoch):.4f}")

We can modify trainer as below:

# Define Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
callbacks=[CustomCallback()] # Add custom callback
)

Pre-Trained Models and Transfer Learning

Pre-trained models and transfer learning have revolutionized the field of deep learning, enabling efficient and effective utilization of neural networks in various tasks.

Pre-trained models are neural network architectures that have been trained on large datasets for specific tasks, such as image classification, natural language processing (NLP), or object detection. These models have learned to extract meaningful features from the input data and make accurate predictions. Pre-training typically involves training the model on a massive dataset, such as ImageNet for image classification or Wikipedia for language understanding, using supervised or self-supervised learning approaches.

Transfer learning, on the other hand, is a machine learning technique where knowledge gained from solving one task is applied to a different but related task. In the context of deep learning, transfer learning involves fine-tuning a pre-trained model on a new dataset or task. Instead of training a neural network from scratch, transfer learning allows us to leverage the knowledge encoded in the pre-trained model and adapt it to new tasks with smaller datasets.

The benefits of pre-trained models and transfer learning include:

Faster Training: Pre-trained models have already learned useful representations from vast amounts of data, reducing the time and computational resources required for training.

Improved Performance: Transfer learning enables models to achieve better performance on new tasks, especially when the target dataset is small or similar to the dataset used for pre-training.

Effective Feature Extraction: Pre-trained models can serve as powerful feature extractors, capturing high-level features from different modalities like images, text, or audio.

Domain Adaptation: Transfer learning facilitates domain adaptation by transferring knowledge from a source domain to a target domain, even when the datasets differ in distribution.

Generalization: Pre-trained models often learn generalizable features that are applicable across a wide range of tasks, domains, and datasets.

In summary, pre-trained models and transfer learning offer a practical and scalable approach to leverage the power of deep learning for various applications, enabling researchers and practitioners to build accurate and efficient models with less effort and data.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

# Define data transformations
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


train_dataset = datasets.ImageFolder(root='train_dir', transform=transform)
test_dataset = datasets.ImageFolder(root='test_dir', transform=transform)

# Define data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32)

# Load pre-trained MobileNetV3 model
model = models.mobilenet_v3_small(pretrained=True)
# Replace the final classification layer with a new fully connected layer with the appropriate number of classes
num_classes = len(train_dataset.classes)
model.classifier = nn.Sequential(
nn.Linear(1024, num_classes),
nn.Softmax(dim=1)
)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item() * images.size(0)
epoch_loss = running_loss / len(train_loader.dataset)
print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_loss:.4f}')

# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Test Accuracy: {accuracy:.4f}')

Remember, learning is a journey, and every step you take brings you closer to your goals. Keep exploring, keep growing, and never stop smiling! If you have any questions or need assistance along the way, feel free to reach out.

Happy reading, and see you in the next blog!😊

--

--

Avinash

Writer | AI Engineer | python is ❤️| Deep Learning | Natural Language Processing | Mentor| Traveller | Love for Data