Published in

Python’s Gurus

2 min readJul 3, 2024

⚛ 𝗧𝗵𝗲 𝗛𝗲𝗮𝗿𝘁𝗯𝗲𝗮𝘁 𝗼𝗳 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸: 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁 📉

Gradient descent helps optimize the performance and efficiency of training neural networks, especially when dealing with different dataset sizes and computational constraints.

𝟭. 𝗕𝗮𝘁𝗰𝗵 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁:

Batch Gradient Descent computes the gradient of the loss function with respect to the entire dataset. It updates the model parameters after processing all the training examples.

𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀:

- Converges to the minimum more smoothly because it uses the entire dataset.

- Efficient for small to medium-sized datasets as it leverages vectorized operations.

𝗗𝗶𝘀𝗮𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀:

- Can be very slow and computationally expensive for large datasets.

- Requires the entire dataset to fit into memory, which can be impractical for very large datasets.

𝟮. 𝗦𝘁𝗼𝗰𝗵𝗮𝘀𝘁𝗶𝗰 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁

Stochastic Gradient Descent updates the model parameters for each training example, i.e., it computes the gradient and updates the parameters after each individual example.

𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀:

- Faster and more memory-efficient, especially for large datasets.

- Can escape local minima due to its noisier updates, which can be advantageous in finding a better global minimum.

𝗗𝗶𝘀𝗮𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀:

- The updates can be noisy, leading to a more erratic convergence path.

- May require more iterations to converge compared to Batch Gradient Descent.

𝟯. 𝗠𝗶𝗻𝗶-𝗕𝗮𝘁𝗰𝗵 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁

Mini-Batch Gradient Descent is a compromise between Batch Gradient Descent and Stochastic Gradient Descent. It splits the dataset into small batches and computes the gradient for each batch, updating the model parameters after each mini-batch.

𝗔𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀:

- Offers a balance between Batch Gradient Descent’s computational efficiency and Stochastic Gradient Descent’s noisy updates.

- Often leads to more stable convergence compared to Stochastic Gradient Descent.

𝗗𝗶𝘀𝗮𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲𝘀:

- The choice of mini-batch size can affect performance and convergence.

- Still requires tuning of hyperparameters like learning rate and batch size.

✨ 𝗦𝘂𝗺𝗺𝗮𝗿𝘆 ✨

𝗕𝗮𝘁𝗰𝗵 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁:

processes the entire dataset in one go, suitable for smaller datasets.

𝗦𝘁𝗼𝗰𝗵𝗮𝘀𝘁𝗶𝗰 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁:

Updates parameters for each example, useful for large datasets and faster iterations.

𝗠𝗶𝗻𝗶-𝗕𝗮𝘁𝗰𝗵 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁:

splits the dataset into smaller batches, offering a middle ground with more stable convergence and computational efficiency.

#BackwordPropagation #DataScience #GradientDescent #BatchGD #StochasticGD #MiniBatchGD #𝐌𝐚𝐜𝐡𝐢𝐧𝐞𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 #𝐃𝐞𝐞𝐩𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 #LossFunction #𝐍𝐞𝐮𝐫𝐚𝐥𝐍𝐞𝐭𝐰𝐨𝐫𝐤 #𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 #𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞𝐀𝐈 #LLM

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

Be sure to clap x50 time and follow the writer ️👏️️
Follow us: Newsletter
Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

Python’s Gurus🚀

Written by Neeharika Patel