Batch gradient descent algorithm using Numpy’s einsum
Usage of Einstein summation technique. No he didn't invent it, he applied it to express complete paper of general theory of relativity. This was invented by Gregorio Ricci-Curbastro well known for Ricci calculus or "tensor analysis" as they call it now!
Recently I discovered numpy’s einsum and also got to know of its power in terms of speed, and memory efficiency. It is also succinct in expressing linear algebraic operations in matrices and vectors.
So I decided to implement batch gradient descent using it. In my code, I have taken California housing data as input. It has 20640 total records. In my implementation, if batch size is not given, it is set as 20640 i.e. everything is taken all at once. When the batch size(n) is given, the records are broken into n batches and then gradient descent is implemented on those batches.
ENOUGH TALK ! SHOW ME CODE!!
Here is the link to my colab notebook.
Below is the picture of batch gradient descent algorithm using numpy’s `einsum`. The only limitation in my implementation is that total records in your data X should be a multiple of `batch_size` i.e.
Now you would ask why one should opt for `np.einsum` instead of the normal numpy functions while implementing batch gradient descent. Lets try to implement batch gradient descent without using einsum method.
Just look at the monstrous code size and numerous for loops used in the implementation of normal batch gradient descent without using einsum. Though it looks easy for new comers to Machine learning, the time and memory needed to execute this code shoots through the roof. I am also ignoring the fact that the normal implementation without using einsum is only processing 1 D array feature vectors and not complete dataframes.
Here is the time taken when using einsum method:
Also here is the time taken when not using einsum method:
The normal implementation is slower than einsum implementation by a factor of whooping 1,460 on my machine. That is 146,022% slower. Phew!
I agree that batch gradient descent implementation without einsum can be improved further. But still it is cumbersome and error prone and that kind of coding doesn't scale when used in neural networks.
If this time comparison doesn’t propel you to start using einsum method, then you should try to implement any simple neural network both with and without einsum. Then you will start appreciating its simplicity, speed, memory efficiency and expressiveness.
Please clap generously(it does not cost at all) and share your comments below to let me know how you liked this post.
I am a machine learning engineer at TCS and my (Digital Software & Solutions) team is building amazing products. Check out more about our products in the link below: