Batch gradient descent algorithm using Numpy’s einsum

Published in

Nerd For Tech

4 min readJun 27, 2021

Usage of Einstein summation technique. No he didn't invent it, he applied it to express complete paper of general theory of relativity. This was invented by Gregorio Ricci-Curbastro well known for Ricci calculus or "tensor analysis" as they call it now!

Recently I discovered numpy’s einsum and also got to know of its power in terms of speed, and memory efficiency. It is also succinct in expressing linear algebraic operations in matrices and vectors.

So I decided to implement batch gradient descent using it. In my code, I have taken California housing data as input. It has 20640 total records. In my implementation, if batch size is not given, it is set as 20640 i.e. everything is taken all at once. When the batch size(n) is given, the records are broken into n batches and then gradient descent is implemented on those batches.

ENOUGH TALK ! SHOW ME CODE!!

Here is the link to my colab notebook.

Google Colaboratory

Batch_Gradient_Descent_Numpy_einsum

Below is the picture of batch gradient descent algorithm using numpy’s `einsum`. The only limitation in my implementation is that total records in your data X should be a multiple of `batch_size` i.e.

Data records count should be divisible by batchsize — Data records count ‘m’ should be divisible by batchsize — Image owned by Author

Batch Gradient descent algorithm implemented using numpy einsum — Batch gradient descent implementation using numpy’s einsum — Image owned by Author

Normalize the data before passing to gradient descent algo — Data normalization — Image owned by Author

Batch gradient descent with batch size 10 — Invocation with batch size 10 and the cost decay with each iteration — Image owned by Author

Batch gradient descent with batch size 1 — Invocation with batch size 1 and the cost decay with each iteration — Image owned by Author

Now you would ask why one should opt for `np.einsum` instead of the normal numpy functions while implementing batch gradient descent. Lets try to implement batch gradient descent without using einsum method.

Normal batch gradient descent without using einsum — Image owned by Author

Just look at the monstrous code size and numerous for loops used in the implementation of normal batch gradient descent without using einsum. Though it looks easy for new comers to Machine learning, the time and memory needed to execute this code shoots through the roof. I am also ignoring the fact that the normal implementation without using einsum is only processing 1 D array feature vectors and not complete dataframes.

Here is the time taken when using einsum method:

timeit info for batch gradient descent algorithm using einsum — Time taken for batch gradient descent using einsum with 1000 epochs and batch size of 80 . — Image owned by Author

Also here is the time taken when not using einsum method:

Time taken for batch gradient descent NOT using einsum with 1000 epochs and batch size of 80 . — Image owned by Author

The normal implementation is slower than einsum implementation by a factor of whooping 1,460 on my machine. That is 146,022% slower. Phew!

I agree that batch gradient descent implementation without einsum can be improved further. But still it is cumbersome and error prone and that kind of coding doesn't scale when used in neural networks.

If this time comparison doesn’t propel you to start using einsum method, then you should try to implement any simple neural network both with and without einsum. Then you will start appreciating its simplicity, speed, memory efficiency and expressiveness.

Please clap generously(it does not cost at all) and share your comments below to let me know how you liked this post.

I am a machine learning engineer at TCS and my (Digital Software & Solutions) team is building amazing products. Check out more about our products in the link below:

Batch gradient descent algorithm using Numpy’s einsum

Usage of Einstein summation technique. No he didn't invent it, he applied it to express complete paper of general theory of relativity. This was invented by Gregorio Ricci-Curbastro well known for Ricci calculus or "tensor analysis" as they call it now!

Google Colaboratory

Digital Software & Solutions: Deliver Highly Personalized Experiences

The Connected Intelligence solutions from the Digital Software & Solutions will help transform your products & services…

Written by ManiShankar Singh