# How Should I Feed my Gradient Descent?

## INTRODUCTION

Batch Gradient Descent is when we feed the ENTIRE dataset to calculate the gradients. Once the gradients are calculated, the GD will take a step towards the minimum.

## DATA

We will work on a simple regression problem using synthesized data. The features of the data are sampled randomly. The targets will be a linear combination of those features with some noise (randomly sampled from a normal distribution). The training and validation sets will contain 128000 data points.

`# Setting the seednp.random.seed(0)# Creating observationsX_train = np.random.rand(number_of_observations,number_of_features)X_valid = np.random.rand(number_of_observations,number_of_features)# Instantiating the parameters W and bW = np.round(np.random.rand(number_of_features,1)*10, 0)b = np.round(np.random.rand(1,1),0)# Creating some noisenoise = np.random.randn(number_of_observations,1)# Creating y by doing XW + b and some noisey_train = np.dot(X_train, W) + b + noisey_valid = np.dot(X_valid, W) + b + noise# Printing coefficientsprint(f"True W: {W} \nTrue b: {b}")`

## EXPERIMENT

Now for the experiment, we train three simple linear regression models (implemented in Tensorflow). Each model has the same architecture. The only difference is how we feed the data into the GD algorithm. For Batch GD, we feed all the data. For Stochastic GD, we feed one data point at a time. For Mini-Batch, we are feeding in 64 data points at a time. The models are trained for 100 epochs with early stopping. The results of the models after training are shown below:

## CONCLUSION

In summary, Batch GD is computationally the fastest but needs more training time (epochs) to converge. Stochastic GD is the fastest to converge, but takes a long time computationally to calculate and update the gradients. Mini-Batch falls somewhere in between where it converges fast enough and in a short amount of time. Hopefully this blog visually explains in detail why machine learning practitioners lean towards Mini-Batch Gradient Descent (sometimes Batch).

--

--

## More from The Biased Outliers

Kaggling together, one model at a time.