More efficient matrix multiplication (fastai PartII-Lesson08)

bigablecat

Published in

AI³ | Theory, Practice, Business

3 min readAug 9, 2019

Four steps to improve matrix multiplication

In Lesson 8, we implement some functions of fastai and Pytorch from scrach.

One of such trials is to build a more efficient matrix multiplication using Python.

It goes through fours steps until get the final version of a fast matrix multiplication method:

matrix multiplication with raw python loops
use elementwise operation to reduce one loop
use broadcasting to reduce one more loop
use einstein summation to combine products and sums

download codes from https://github.com/fastai/course-v3
the following explainations will be based on codes from the file ‘course-v3\nbs\dl2\01_matmul.ipynb’

Step 1: matrix multiplication with raw python loops

there are three ‘for loops’ to realize the multiplication, straightforward but not efficient

Step 2: use elementwise operation to reduce one loop

an example of elementwise addition

See the detailed process of (a + b)

a[0] + b[0] = 10. + 2. = 12.
a[1] + b[1] = 6. + 8. = 14.
a[2] + b[2] = -4. + 7. = 3.a + b = tensor([12., 14.,  3.])

In our case, the final ‘for loop’ in matmul funciton:

could be replaced with:

New matmul funciton with elementwise operation:

Step 3: use broadcasting to reduce one more loop

Refered to Tensorflow XLA guide, Broadcasting is the process of making arrays with different shapes have compatible shapes for arithmetic operations. The terminology is borrowed from Numpy broadcasting.

We look at some examples of broadcasting operations.

Broadcasting with a scalar

Broadcasting a vector to a matrix

So we modify matmul function further with broadcasting, which will eleminate another ‘for loop’.

it will be replaced as:

New matmul funciton with broadcasting:

Step4: use einstein summation to combine products and sums

Einstein Summation (einsum) is a compact representation for combining products and sums in a general way.

An article written by Tim Rocktäschel explains einsumwell.

“ Einsum is implemented in numpy via np.einsum, in PyTorch via torch.einsum, and in TensorFlow via tf.einsum. All three einsum functions share the same signature einsum(equation,operands) where equation is a string representing the Einstein summation and operands is a sequence of tensors.”

“A typical call to einsum has the following form

where ◻ is a placeholder for a character identifying a tensor dimension. From this equation string we can infer that arg1 and arg3 are matrices, arg2 is an order-3 tensor, and that the resultresult of this einsum operation is a matrix.”

Matrix transpose

In this case, ‘a’ is a 2*3 tensor, let’s assume i = 2 and j=3.
We use einsum to transpose this 2*3 tensor to a 3*2 tensor.

Matrix-Matrix Multiplication

In this case, a is a 2*3 tensor and b is a 3*5 tensor.
Let’s assume i = 2, k=3 and j=5.
Using einsum to do a matrix multiplication and getting a 2*5 tensor.

New matmul funciton with Einstein Summation:

Summary

That’s the matrix multiplication part of fastai part II Lesson 8.
We could use four steps to improve the efficiency of matrix multiplication.

for loop -> elementwise operation -> broadcasting -> Einstein Summation

As explained by Tim Rocktäschel, Einstein Summation could not be used for all cases but it works well for most operations in deep learning.

References

Lesson 8 fastai part II
https://course.fast.ai/videos/?lesson=8
01_matmul.ipynb
https://github.com/fastai/course-v3/blob/master/nbs/dl2/01_matmul.ipynb
Broadcasting semantics
https://www.tensorflow.org/xla/broadcasting
<EINSUM IS ALL YOU NEED — EINSTEIN SUMMATION IN DEEP LEARNING> by Tim Rocktäschel
https://rockt.github.io/2018/04/30/einsum