Neeharika Patel
Pythonโ€™s Gurus
2 min readJul 3, 2024

--

โš› ๐—ง๐—ต๐—ฒ ๐—›๐—ฒ๐—ฎ๐—ฟ๐˜๐—ฏ๐—ฒ๐—ฎ๐˜ ๐—ผ๐—ณ ๐—ก๐—ฒ๐˜‚๐—ฟ๐—ฎ๐—น ๐—ก๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ: ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜ ๐Ÿ“‰

Gradient descent helps optimize the performance and efficiency of training neural networks, especially when dealing with different dataset sizes and computational constraints.

๐Ÿญ. ๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜:

Batch Gradient Descent computes the gradient of the loss function with respect to the entire dataset. It updates the model parameters after processing all the training examples.

๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:

- Converges to the minimum more smoothly because it uses the entire dataset.

- Efficient for small to medium-sized datasets as it leverages vectorized operations.

๐——๐—ถ๐˜€๐—ฎ๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:

- Can be very slow and computationally expensive for large datasets.

- Requires the entire dataset to fit into memory, which can be impractical for very large datasets.

๐Ÿฎ. ๐—ฆ๐˜๐—ผ๐—ฐ๐—ต๐—ฎ๐˜€๐˜๐—ถ๐—ฐ ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜

Stochastic Gradient Descent updates the model parameters for each training example, i.e., it computes the gradient and updates the parameters after each individual example.

๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:

- Faster and more memory-efficient, especially for large datasets.

- Can escape local minima due to its noisier updates, which can be advantageous in finding a better global minimum.

๐——๐—ถ๐˜€๐—ฎ๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:

- The updates can be noisy, leading to a more erratic convergence path.

- May require more iterations to converge compared to Batch Gradient Descent.

๐Ÿฏ. ๐— ๐—ถ๐—ป๐—ถ-๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜

Mini-Batch Gradient Descent is a compromise between Batch Gradient Descent and Stochastic Gradient Descent. It splits the dataset into small batches and computes the gradient for each batch, updating the model parameters after each mini-batch.

๐—”๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:

- Offers a balance between Batch Gradient Descentโ€™s computational efficiency and Stochastic Gradient Descentโ€™s noisy updates.

- Often leads to more stable convergence compared to Stochastic Gradient Descent.

๐——๐—ถ๐˜€๐—ฎ๐—ฑ๐˜ƒ๐—ฎ๐—ป๐˜๐—ฎ๐—ด๐—ฒ๐˜€:

- The choice of mini-batch size can affect performance and convergence.

- Still requires tuning of hyperparameters like learning rate and batch size.

โœจ ๐—ฆ๐˜‚๐—บ๐—บ๐—ฎ๐—ฟ๐˜† โœจ

๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜:

processes the entire dataset in one go, suitable for smaller datasets.

๐—ฆ๐˜๐—ผ๐—ฐ๐—ต๐—ฎ๐˜€๐˜๐—ถ๐—ฐ ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜:

Updates parameters for each example, useful for large datasets and faster iterations.

๐— ๐—ถ๐—ป๐—ถ-๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐—š๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ฒ๐—ป๐˜ ๐——๐—ฒ๐˜€๐—ฐ๐—ฒ๐—ป๐˜:

splits the dataset into smaller batches, offering a middle ground with more stable convergence and computational efficiency.

#BackwordPropagation #DataScience #GradientDescent #BatchGD #StochasticGD #MiniBatchGD #๐Œ๐š๐œ๐ก๐ข๐ง๐ž๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  #๐ƒ๐ž๐ž๐ฉ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  #LossFunction #๐๐ž๐ฎ๐ซ๐š๐ฅ๐๐ž๐ญ๐ฐ๐จ๐ซ๐ค #๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž #๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ฏ๐ž๐€๐ˆ #LLM

Pythonโ€™s Gurus๐Ÿš€

Thank you for being a part of the Pythonโ€™s Gurus community!

Before you go:

  • Be sure to clap x50 time and follow the writer ๏ธ๐Ÿ‘๏ธ๏ธ
  • Follow us: Newsletter
  • Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

--

--