SGD: The Fundamental Steps of Deep Learning

Pinkesh Patel, MBA
unpack
Published in
4 min readJun 6, 2021

In the Ai training Process, there is system that can automatically modify itself to improve its performance which is known as ‘Stochastic Gradient Descent (SGD)’. The word ‘stochastic’ means a system or a process that is linked with a random probability. Hence, in Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data set for each iteration. We will review the basic principles and fundamental steps of the SGD in this paper.

Assume we set up an automated methods of evaluating the efficacy of any current weight assignment in terms of actual performance, as well as a method for changing the weight assignment to maximize performance. We do not need to go into the specifics of such a technique to see how it could be completely automated and how a machine programmed in this way could “learn” from its experience. This is the key to having a model that can improve over time — one that can learn. However, our pixel similarity method falls short. We do not have any form of weight assignment or any way to improve depending on a weight assignment’s effectiveness being tested. To put it another way, we cannot truly improve our pixel similarity method by tweaking a few parameters.

Digit 3 and 7

Instead of attempting to identify a similarity between an image and a “ideal image,” we may examine each individual pixel and assign weights to each one, with the greatest weights corresponding to the pixels most likely to be black for a certain category. For example, pixels at the bottom right corner are unlikely to be triggered for a 7, thus they should be given a lower priority.

There is specific step we will need to do to turn this function into a machine learning classifier. Howard et al have mentioned the following steps for such process:

The fundamental steps of deep learning

1. Initialize the weights.

2. For each image, use these weights to predict the correct image

3. Based on these predictions, calculate how good the model is (its loss).

4. Calculate the gradient, which measures for each weight, how changing that weight would change the loss.

5. Step (that is, change) all the weights based on that calculation.

6. Go back to the step 2 and repeat the process.

7. Iterate until you decide to stop the training process (for instance, because the model is good enough or you do not want to wait any longer).

All deep learning models must be trained using these seven steps, which are illustrated here. It is incredibly surprising to see that deep learning relies solely on these stages. It is incredible that this approach can resolve such difficult issues.

Each of these seven phases can be completed in a variety of ways. For deep learning practitioners, these are the nuances that make all the difference, but it turns out that the overall approach to each one follows some basic principles. Here are some pointers:

Initialize: The settings are set to random values. This may come as a shock. There are obviously other options, such as initializing them to the percentage of times that pixel is activated for that category — but since we already know we have a procedure to enhance these weights, it turns out that starting with random weights works just well.

Loss: When Samuel mentioned measuring the usefulness of any present weight assignment in terms of actual performance, he was referring to this. If the model’s performance is good, we will require a function that returns a modest number (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention).

Step: A simple technique to determine if a weight should be increased or dropped is to just try it: increase the weight by a modest amount and see whether the loss increases or decreases. Once you have found the right direction, you can adjust the amount by a little more and a little less until you achieve a good balance. However, this is a slow process!

Stop: We implement our decision once we have determined how many epochs to train the model This is where the choice is put into action. We would keep training our digit classifier until the model’s accuracy began to deteriorate or we ran out of time.

Overall, these are the basic principles and fundamental steps of the SGD. Author believes that this would be helpful to the AI professionals starting to work on AI model.

References

1. Howard, J. and Gugger, S., 2020. Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD. 1st ed. Canada: O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

About Author: Pinkesh Patel

Pinkesh have Over 16 years of experience in R&D, portfolio management, and business development in life science & retail Industry. He is mentor and investor at gold and diamond Jewelry firm ‘Proyasha Diamonds’. Pinkesh has Received B.A. Honors in Pharmacology from London Metropolitan university and MBA from Anglia Ruskin University.

--

--

Pinkesh Patel, MBA
unpack
Writer for

The Diversified Pharma Manager🧬💊👨🏻‍💻 | Business Development , Licensing & Strategic Alliance Management https://www.linkedin.com/in/pinkesh-patel-bd/