url: http://www.sohu.com/a/228731477_281328

Broadcasting Application

Published in

AI³ | Theory, Practice, Business

3 min readAug 19, 2019

For the fastai Part 2 courses, it focuses on how to rewrite the pytorch library. Therefore, We know how the library works, and we can apply it more smoothly.

Broadcasting means how two arrays of different sizes should be handled in an operation. Normally, a smaller array will be broadcast to a larger one to keep the size consistent. The loop operation in the Broadcasting process is carried out at the bottom of C, so the speed is relatively fast. But sometimes broadcasting may lead to performance degradation.

Broadcasting with a scalar

>>> a = tensor([10., 6, -4])
>>> a + 1
tensor([11., 7., -3.])

In this example, the value 1 is broadcast three times, and it does an element-wise sum. So every time you’re broadcasting a scaler to a tensor. This is the simplest kind of broadcasting. And in the process, it’s not operating at python speed. Actually, it’s operating at C or CUDA speed.

Broadcasting a vector to a matrix

>>> c = tensor([10.,20,30]); 
>>> m = tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
>>> m.shape,c.shape
(torch.Size([3, 3]), torch.Size([3]))
>>> m + c
tensor([[11., 22., 33.],
        [14., 25., 36.],
        [17., 28., 39.]])

In this example, we’re broadcasting a vector to a matrix. It’s broadcasting the row of c across each row of the matrix m.

Matrix multiplication

Here’s two matrices, we multiplied two matrix which is shown in Fig 1.

The first element of the result is the first row of Rank 1 matrix multiplied with the first column of Rank 2 matrix, and sum them up, then we get the first element is 22. The result is shown in Fig 2.

The rest can be done in the same manner, then we can calculate the final result in Fig 3.

We translate it by using code. The code is showing below.

def matmul(a,b):
    ar,ac = a.shape # n_rows * n_cols
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc):
            for k in range(ac): # or br
                c[i,j] += a[i,k] * b[k,j]
    return c

Here, we have three loops. c[i,j] += a[i,k] * b[k,j] is the basic step.This is the vast majority of what’s going to do in deep learning. It’s a huge work and cost time. So we apply the broadcasting in matrix multiplication.

Matmul with broadcasting

def matmul(a,b):
    ar,ac = a.shape
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    for i in range(ar):
        c[i]   = (a[i  ].unsqueeze(-1) * b).sum(dim=0)
    return c

Here, c[i] = (a[i ].unsqueeze(-1) * b).sum(dim=0), this is get broadcast so that we can get rid of going through each of range bc. By doing so, not only getting rid of several loops, but also reduce a lot of errors.

When operating on two arrays/tensors, Numpy/PyTorch compares their shapes element-wise. It starts with the trailing dimensions and works its way forward. The tensors can only be used for broadcasting operation under the two following conditions:

they are equal, or
one of them is 1, in which case that dimension is broadcasted to make it the same size.

Broadcasting Application

Matmul with broadcasting

Written by Yuanrui Dong