Batch gradient descent algorithm using Numpy’s einsum

ManiShankar Singh
Nerd For Tech
Published in
4 min readJun 27, 2021
Free Stock Image taken from Pixabay

Usage of Einstein summation technique. No he didn't invent it, he applied it to express complete paper of general theory of relativity. This was invented by Gregorio Ricci-Curbastro well known for Ricci calculus or "tensor analysis" as they call it now!

Recently I discovered numpy’s einsum and also got to know of its power in terms of speed, and memory efficiency. It is also succinct in expressing linear algebraic operations in matrices and vectors.

So I decided to implement batch gradient descent using it. In my code, I have taken California housing data as input. It has 20640 total records. In my implementation, if batch size is not given, it is set as 20640 i.e. everything is taken all at once. When the batch size(n) is given, the records are broken into n batches and then gradient descent is implemented on those batches.

ENOUGH TALK ! SHOW ME CODE!!

Here is the link to my colab notebook.

Below is the picture of batch gradient descent algorithm using numpy’s `einsum`. The only limitation in my implementation is that total records in your data X should be a multiple of `batch_size` i.e.

Data records count should be divisible by batchsize
Data records count ‘m’ should be divisible by batchsize — Image owned by Author
Batch Gradient descent algorithm implemented using numpy einsum
Batch gradient descent implementation using numpy’s einsum — Image owned by Author
Normalize the data before passing to gradient descent algo
Data normalization — Image owned by Author
Batch gradient descent with batch size 10
Invocation with batch size 10 and the cost decay with each iteration — Image owned by Author
Batch gradient descent with batch size 1
Invocation with batch size 1 and the cost decay with each iteration — Image owned by Author

Now you would ask why one should opt for `np.einsum` instead of the normal numpy functions while implementing batch gradient descent. Lets try to implement batch gradient descent without using einsum method.

Normal batch gradient descent without using einsum — Image owned by Author

Just look at the monstrous code size and numerous for loops used in the implementation of normal batch gradient descent without using einsum. Though it looks easy for new comers to Machine learning, the time and memory needed to execute this code shoots through the roof. I am also ignoring the fact that the normal implementation without using einsum is only processing 1 D array feature vectors and not complete dataframes.

Here is the time taken when using einsum method:

timeit info for batch gradient descent algorithm using einsum
Time taken for batch gradient descent using einsum with 1000 epochs and batch size of 80 . — Image owned by Author

Also here is the time taken when not using einsum method:

Time taken for batch gradient descent NOT using einsum with 1000 epochs and batch size of 80 . — Image owned by Author

The normal implementation is slower than einsum implementation by a factor of whooping 1,460 on my machine. That is 146,022% slower. Phew!

I agree that batch gradient descent implementation without einsum can be improved further. But still it is cumbersome and error prone and that kind of coding doesn't scale when used in neural networks.

If this time comparison doesn’t propel you to start using einsum method, then you should try to implement any simple neural network both with and without einsum. Then you will start appreciating its simplicity, speed, memory efficiency and expressiveness.

Please clap generously(it does not cost at all) and share your comments below to let me know how you liked this post.

I am a machine learning engineer at TCS and my (Digital Software & Solutions) team is building amazing products. Check out more about our products in the link below:

--

--

ManiShankar Singh
Nerd For Tech

I am currently working as a machine learning engineer with focus on Machine learning using numpy, scikit and tensorflow.