Transfer Learning and Fine Tuning

Fatima Mubarak
Tech Blog
Published in
6 min readApr 28, 2024

In machine learning and deep learning, two common methods for using pre-trained models are transfer learning and fine-tuning. They allow you to borrow the knowledge of existing models to make your own models smarter. To simplify, think of transfer learning and fine-tuning as ways to make your own models better by using what other models already know.

Difference between Transfer learning and Fine Tuning (by Elven Kim)

In this article, we’ll go into both methods, explaining how they work and what makes them different. Plus, we’ll be showcasing Python code examples to show you how it’s done. By the end, you’ll have a solid knowledge on how to put these techniques into action in your own projects.

Transfer Learning

Transfer Learning is the re-use of a pre-trained model with a new related task. It is particularly beneficial when the new task has limited labeled data and computational resources. It is a popular term in deep learning because it involves training a deep neural network, and it can also be applied to traditional machine learning models. This is very useful since most problems typically do not have enough labeled data points to train such complex models.

Introducing Transfer Learning (by Harley Davidson Regua)

In transfer learning, we use the knowledge acquired by a pre-trained machine learning model on a related task. For example, if we’ve trained a model to recognize cars, we can leverage the learned features and patterns to aid in the identification of trucks. The model acquires general features from the initial task (recognizing cars) that can prove beneficial for the subsequent task (identifying trucks).

Technically, it is taking the pre-trained model and doing the following steps:

  • We use the pre-trained model as a fixed feature extractor.
  • Then we remove the final layers responsible for classification and replace them with new layers that are specific to our task.

Basically, we freeze the weights and parameters of the hidden, pre-trained layers. This means that during the training process for our new task, these layers remain fixed and their parameters are not updated. By freezing these layers, we ensure that the learned features from the original task are preserved and not adjusted during training on the new task. Freezing these layers prevents the risk of losing these valuable features by overfitting them to the new data.

After freezing the pre-trained layers, we add new layers on top of the pre-trained model to adapt it to the new task. These new layers, referred to as the “classifier,” are responsible for making predictions specific to our task (e.g., classifying different types of flowers). Initially, these new layers had random weights. During training, we feed the input data through the pre-trained layers to extract features. These extracted features are then passed to the new classifier layers, which learn to map these features to the correct output for the new task. The weights of these new layers are updated during training using backpropagation and gradient descent, based on the error between the predicted output and the true labels. By training the new classifier on top of the fixed, pre-trained layers, we effectively transfer the knowledge learned from the original task to the new task.

Code Snippets

import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

# One-hot encode the labels
train_labels = to_categorical(train_labels, num_classes=10)
test_labels = to_categorical(test_labels, num_classes=10)

# Load pre-trained VGG16 model without the top (classification) layer
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))

# Freeze the weights of the pre-trained layers
for layer in base_model.layers:
layer.trainable = False

# Create a new model on top of the pre-trained base model
model = Sequential([
base_model,
Flatten(),
Dense(256, activation='relu'),
Dense(10, activation='softmax') # Output layer with 10 classes
])

# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model
history = model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print("Test Accuracy:", test_acc)

Fine Tuning

Fine-tuning is a type of transfer learning. It involves taking a pre-trained model, which has been trained on a large dataset for a general task such as image recognition or natural language understanding and making minor adjustments to its internal parameters.

Fine Tuning (Dive into Deep Learning)

let’s discuss a step-by-step approach to effectively fine-tuning a model:

  1. Select a Pre-trained Model: Choose a pre-trained model that aligns with your task and dataset.
  2. Understand Model Architecture: Study the architecture of the pre-trained model, including the number of layers, their functionalities, and the specific tasks they were trained on.
  3. Determine Fine-tuning Layers: Decide which layers of the pre-trained model to fine-tune. Typically, earlier layers capture low-level features, while later layers capture more high-level features. You may choose to fine-tune only the top layers or some of the entire model.
  4. Freeze Pre-trained Layers: Freeze the weights of the pre-trained layers that you do not want to fine-tune. This ensures that you prevent these layers from being updated during training.
  5. Add Task-specific Layers: Add new layers on top of the pre-trained model to adapt it to your specific task. These layers referred to as the “classifier,” will be responsible for making predictions relevant to your task.
  6. Configure Training Parameters: Set the hyperparameters for training, including the learning rate(Small learning rate), batch size, and number of epochs. These parameters may need to be adjusted based on the size of your dataset and the complexity of your task.
  7. Train the Model: Train the model on your dataset using a suitable optimization algorithm, such as stochastic gradient descent (SGD) or Adam. During training, the weights of the unfrozen layers will be updated to minimize the loss between the predicted outputs and the ground truth labels.

By following this step-by-step approach, you can effectively fine-tune a pre-trained model.

Code Snippets

import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

# One-hot encode the labels
train_labels = to_categorical(train_labels, num_classes=10)
test_labels = to_categorical(test_labels, num_classes=10)

# Load pre-trained VGG16 model without the top (classification) layer
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))

# Freeze the first few layers
for layer in base_model.layers[:-4]:
layer.trainable = False

# Create a new model on top of the pre-trained base model
model = Sequential([
base_model,
Flatten(),
Dense(256, activation='relu'),
Dense(10, activation='softmax') # Output layer with 10 classes
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model
history = model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print("Test Accuracy:", test_acc)

Conclusion

In summary, transfer learning and fine-tuning are valuable tools in machine learning and deep learning. They allow you to borrow the knowledge of existing models to make your own models. To simplify, think of transfer learning and fine-tuning as ways to make your own models better by using what other models already know.
As we reflect on these techniques: What other methods could be developed to optimize model performance in the field of artificial intelligence?

References

--

--

Fatima Mubarak
Tech Blog

Data scientist @montymobile | In my writing, I explore the fields of data science , machine learning and related topics.