How Should I Feed my Gradient Descent?

Photo by Jonathan Farber on Unsplash

INTRODUCTION

Batch Gradient Descent is when we feed the ENTIRE dataset to calculate the gradients. Once the gradients are calculated, the GD will take a step towards the minimum.

DATA

We will work on a simple regression problem using synthesized data. The features of the data are sampled randomly. The targets will be a linear combination of those features with some noise (randomly sampled from a normal distribution). The training and validation sets will contain 128000 data points.

# Setting the seed
np.random.seed(0)

# Creating observations
X_train = np.random.rand(number_of_observations,number_of_features)
X_valid = np.random.rand(number_of_observations,number_of_features)

# Instantiating the parameters W and b
W = np.round(np.random.rand(number_of_features,1)*10, 0)
b = np.round(np.random.rand(1,1),0)

# Creating some noise
noise = np.random.randn(number_of_observations,1)

# Creating y by doing XW + b and some noise
y_train = np.dot(X_train, W) + b + noise
y_valid = np.dot(X_valid, W) + b + noise

# Printing coefficients
print(f"True W: {W[0][0]} \nTrue b: {b[0][0]}")

EXPERIMENT

Now for the experiment, we train three simple linear regression models (implemented in Tensorflow). Each model has the same architecture. The only difference is how we feed the data into the GD algorithm. For Batch GD, we feed all the data. For Stochastic GD, we feed one data point at a time. For Mini-Batch, we are feeding in 64 data points at a time. The models are trained for 100 epochs with early stopping. The results of the models after training are shown below:

CONCLUSION

In summary, Batch GD is computationally the fastest but needs more training time (epochs) to converge. Stochastic GD is the fastest to converge, but takes a long time computationally to calculate and update the gradients. Mini-Batch falls somewhere in between where it converges fast enough and in a short amount of time. Hopefully this blog visually explains in detail why machine learning practitioners lean towards Mini-Batch Gradient Descent (sometimes Batch).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store