More efficient matrix multiplication (fastai PartII-Lesson08)
Four steps to improve matrix multiplication
In Lesson 8, we implement some functions of fastai and Pytorch from scrach.
One of such trials is to build a more efficient matrix multiplication using Python.
It goes through fours steps until get the final version of a fast matrix multiplication method:
- matrix multiplication with raw python loops
- use elementwise operation to reduce one loop
- use broadcasting to reduce one more loop
- use einstein summation to combine products and sums
download codes from https://github.com/fastai/course-v3
the following explainations will be based on codes from the file ‘course-v3\nbs\dl2\01_matmul.ipynb’
Step 1: matrix multiplication with raw python loops
there are three ‘for loops’ to realize the multiplication, straightforward but not efficient
Step 2: use elementwise operation to reduce one loop
- an example of elementwise addition
See the detailed process of (a + b)
a[0] + b[0] = 10. + 2. = 12.
a[1] + b[1] = 6. + 8. = 14.
a[2] + b[2] = -4. + 7. = 3.a + b = tensor([12., 14., 3.])
In our case, the final ‘for loop’ in matmul funciton:
could be replaced with:
- New matmul funciton with elementwise operation:
Step 3: use broadcasting to reduce one more loop
Refered to Tensorflow XLA guide, Broadcasting is the process of making arrays with different shapes have compatible shapes for arithmetic operations. The terminology is borrowed from Numpy broadcasting.
We look at some examples of broadcasting operations.
- Broadcasting with a scalar
- Broadcasting a vector to a matrix
So we modify matmul function further with broadcasting, which will eleminate another ‘for loop’.
it will be replaced as:
- New matmul funciton with broadcasting:
Step4: use einstein summation to combine products and sums
Einstein Summation (einsum
) is a compact representation for combining products and sums in a general way.
An article written by Tim Rocktäschel explains einsum
well.
“ Einsum is implemented in numpy via np.einsum
, in PyTorch via torch.einsum
, and in TensorFlow via tf.einsum
. All three einsum functions share the same signature einsum(equation,operands)
where equation
is a string representing the Einstein summation and operands
is a sequence of tensors.”
“A typical call to einsum has the following form
where ◻ is a placeholder for a character identifying a tensor dimension. From this equation string we can infer that arg1 and arg3 are matrices, arg2 is an order-3 tensor, and that the resultresult of this einsum operation is a matrix.”
- Matrix transpose
In this case, ‘a’ is a 2*3 tensor, let’s assume i = 2 and j=3.
We use einsum
to transpose this 2*3 tensor to a 3*2 tensor.
- Matrix-Matrix Multiplication
In this case, a is a 2*3 tensor and b is a 3*5 tensor.
Let’s assume i = 2, k=3 and j=5.
Using einsum
to do a matrix multiplication and getting a 2*5 tensor.
- New matmul funciton with Einstein Summation:
Summary
That’s the matrix multiplication part of fastai part II Lesson 8.
We could use four steps to improve the efficiency of matrix multiplication.
for loop -> elementwise operation -> broadcasting -> Einstein Summation
As explained by Tim Rocktäschel, Einstein Summation could not be used for all cases but it works well for most operations in deep learning.
References
- Lesson 8 fastai part II
https://course.fast.ai/videos/?lesson=8 - 01_matmul.ipynb
https://github.com/fastai/course-v3/blob/master/nbs/dl2/01_matmul.ipynb - Broadcasting semantics
https://www.tensorflow.org/xla/broadcasting - <EINSUM IS ALL YOU NEED — EINSTEIN SUMMATION IN DEEP LEARNING> by Tim Rocktäschel
https://rockt.github.io/2018/04/30/einsum