General Matrix Multiplication in Assembly Part 1

So, it has been a while since Pete Warden’s post for calls to assembly hackers to work on deep learning. Here is the simplest implementation of GEMM in C

and I tried to dissect it with the infamous Compiler Explorer (by Matt Godbolt)

Here starts my nights :)

Originally published at The Secret Guild of Silicon Valley.