Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Courage to Learn ML: Tackling Vanishing and Exploding Gradients (Part 1)

Amy Ma
13 min readFeb 5, 2024

--

Image created by the author using ChatGPT.

Can you illustrate the backpropgation and gradient descent in PyTorch?

​import torch

# Define the model
model = CustomizedModel()
# Define the loss function
loss_fn = torch.nn.L1Loss()
# Define optimizer
optimizer = torch.optim.SGD(params = model.parameters(),
lr = 0.01, momentum = 0.9)

epoches = 10
for epoch in range(epoches):
# Step 1: Setting the Model to Training Mode:
model.train()

# Step 2: Forward Pass - Making Predictions
y_pred = model(X_train)

# Step 3: Calculating the Loss
loss = loss_fn(y_pred, y_train)

# Step 4: Backpropagation - Calculating Gradients
optimizer.zero_grad() # clears old gradients
loss.backward() # performs backpropagation to compute the gradients of the loss w.r.t model parameters

# Step 5: Gradient Descent - Updating Parameters
optimizer.step()

Can you explain the concepts of vanishing and exploding gradients in neural networks? What are the primary causes of these issues?

I get that vanishing gradients can be an issue because the lower layers’ parameters barely update, making learning difficult with such small gradients. But why are exploding gradients problematic too? Wouldn’t they provide substantial updates for the later layers?

Why is it that the oscillation in weight updates caused by exploding gradients turns out to be harmful? I understand that Stochastic Gradient Descent (SGD) with small batch sizes also leads to oscillation, but why doesn’t that cause similar issues?

So how do we know whether a gradient is too large or too small? How’d we detect those problems?

Are you suggesting that, in DNNs, we aim for all layers to learn at an same pace?

How do we generally address the issue of unstable gradients when training DNNs?

Image created by the author using ChatGPT.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Amy Ma
Amy Ma

Written by Amy Ma

Tech, life, and the chaos in between—fueled by curiosity, caffeine, and a toddler 🍼☕🐾 Want more? My newsletter -https://theamyma101.substack.com

Responses (1)