Broadcasting in PyTorch/NumPy

moodayday™
AI³ | Theory, Practice, Business
8 min readAug 11, 2019

Hello happy PyTorchers & NumPyers, and welcome to my very post on Medium. We are going to talk about a new exciting stuff that I can’t wait to air!

Well, sorry, if this picture fooled you, but the thing we are going to talk about here is a slightly different kind of broadcasting. If you ended up on this page, then I’m convinced you will find this broadcasting thing extremely interesting.

Note: In the code snippets in this post, I’m going to use PyTorch only but please keep in mind that Broadcasting works the same in PyTorch as in NumPy. The only change in that you will need to import NumPy arrays instead of PyTorch tensors.

Let’s start with this…

Element-wise operations

I’m sure you work on more high-level things but let me give this rather lame example.

Let’s imagine that we have the salaries of employees in two departments of your company as a PyTorch tensor (or a NumPy array). And we want you to add some bonus to each salary, a specific bonus amount for each employee. Here I have the salaries and bonuses in 2 * 4 Python lists:

# The salaries of employees in two departments
>>> salaries = [[5900, 3500, 2800, 4900],
[ 3330, 4500, 1000, 1290]]
# One specific bonus for each employee
>>> bonuses = [[300, 450, 0, 400],
[ 90, 890, 1000, 90]]

Now we can compute the new salaries by adding the bonus to their normal salary.

# Element-wise summation on Python lists
>>> new_salaries = [[salaries[i][j]+bonuses[i][j] for j in range(len(salaries[i]))] for i in range(len(salaries))];
>>> new_salaries[[6200, 3950, 2800, 5300],
[3420, 5390, 2000, 1380]]

Ugly, right? Believe me, it is! If you are still not convinced, look at how much more elegant it can be when we use a PyTorch tensor instead.

from torch import tensor

Now let’s create the list of salaries as a PyTorch tensor:

# The salaries of employees in two departments
# just like previously.
>>> salaries = tensor([[5900, 3500, 2800, 4900],
[ 3330, 4500, 1000, 1290]])

And the same for the bonuses:

# ...and one bonus for each employee.
>>> bonuses = tensor([[300, 450, 0, 400],
[ 90, 890, 1000, 90]])

PyTorch allows us to do manipulate the two batches of data together, all like one.

# Tensors make element-wise operations so much natural!
>>> new_salaries = salaries + bonuses;
>>> new_salariestensor([[6200, 3950, 2800, 5300],
[3420, 5390, 2000, 1380]])

It provides a nice way to transport operations that are perfomed on element-by-element basis into a higher view by manipulating the containers of the elements only.

Broadcasting

This element-wise operations layer that PyTorch tensors add on top of the traditional Python lists is great, but broadcasting takes the fun to the next levEl.

Look, in our previous example, the two containers (the lists) had the same shape (2 * 4). What if I just wanted to add one specific value to all the salaries of each department? I mean what if bonuses

# First bonus to be broadcast on all salary 
# of the first departments, and the same for the second!
>>> bonuses = tensor([[300],
[450]])

Is this little change just enough to keep us out of the fun?

Well, no! The happy thing is that it still works!… thanks to broadcasting! Look:

>>> new_salaries = salaries + bonuses; new_salaries; tensor([[6200, 3800, 3100, 5200],
[3780, 4950, 1450, 1740]])

We can even push the craziness a little bit further and add half of each salary as a bonus to each employee.

>>> new_salaries = 1.5 * salaries; new_salaries

Do you… realize what we did just now?

We’ve just multiplied each element of this group of data (the salary tensor) by 1.5 by simply multiplying directly the array by the scalar! …and it wOrked! Amazing, right?

To show that there is no limit to the fun, I can do this

# Show me the salaries that are 
# still less than 2000
>>> new_salaries[new_salaries < 2000]
tensor([1000, 1290])

Now

# I'm the boss, and I'm happy today: give'em 2000.
>>> new_salaries[new_salaries < 2000] = 2000
>>> new_salariestensor([[5900, 3500, 2000, 4900],
[3330, 4500, 2000, 2000]])

This is just how I broadcast my happiness over them all!

Well, this is what we call broadcasting! (well…)

Now, the definition

The documentation, the term broadcasting describes how Numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints (we will talk about those later), the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Element-wise operations are cool, but it would be really boring if it required the operands to be the same size. The broadcasting rule relaxes this constraint when the tensors’ shapes meet certain constraints.

Cool, brother! But, is there more to it than simple stretching?

The answer is, Yes!

Clearly the cleanness of the code is one advantage. Using broadcasting in NumPy/PyTorch makes your code more elegant, because you focus on the big picture of what you are doing instead of getting your precious hands dirty dealing with oily indexes running through your matrices.

And there is more to it:

  1. High-performance! PyTorch is written in C. The language C is more low-level than Python and that implies that implementations in C are usually faster than in Python. The highly optimized C code that PyTorch uses to handle the operations under the hood is a lot faster than the best thing you can do in pure Python. So, letting PyTorch handle the looping in its C code for you is the smartest way to go about it.
  2. Less error prone. The less code you write, the less you are prone to make mistakes.
  3. and… it’s sexy!

Note: As per the official documentation, there are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

General Broadcasting Rules

When operating on two arrays, PyTorch/NumPy compares their shapes element-wise. Two dimensions are compatible when:

  1. they are equal, or
  2. one of them is 1

The size of the resulting array is the maximum size along each dimension of the input arrays.

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

It starts with the trailing dimensions, and works its way forward. You can align the trailing to make the reading easier. A few examples:

>>> import torch# For each corresponding dimension, numbers should either be equal 
# OR one of them should be 1.
>>> a = torch.ones(8,1,2) # 8*1*2
>>> b = torch.ones(8,9,1) # 8*9*1
>>> (a + b).shape # works fine :)
torch.Size([8, 9, 2])

As you can see, on each of the three dimensions, the sizes on a and b were either equal (8 on both sides) or one of them was 1.

# In case the number of dimensions don't match, 
# align on the right to see clearly
>>> a = torch.ones( 8,1,2) # 1*1*8*1*2
>>> b = torch.ones(2,4,8,9,1) # 2*4*8*9*1
>>> (a + b).shape # works fine :)
torch.Size([2, 4, 8, 9, 2])

Here, the number of dimensions was automatically corrected by PyTorch, and the sizes of each dimension matched according to the broadcasting rules. So, it worked fine!

# For each corresponding dimension, numbers should either be equal 
# OR one of them should be 1.
>>> a = torch.ones( 8,1,2) # 2
>>> b = torch.ones(2,8,9,9) # ... and 9!!
>>> a + b # It doesn't work! :(
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-36-c47caaccbb14> in <module>
3 a = torch.ones( 8,1,2) # 2
4 b = torch.ones(2,8,9,9) # ... and 9!!
----> 5 a + b
RuntimeError: The size of tensor a (2) must match the size of tensor b (9) at non-singleton dimension 3

You could read the rules again to convince yourself.

A few visual examples

The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

>>> a = tensor([1,2,3])
>>> b = 2
>>> a * b
tensor([2, 4, 6])

Another example:

>>> a = tensor([[ 0,  0,  0],
[10, 10, 10],
[20, 20, 20],
[30, 30, 30])
>>> b = tensor([1, 2, 3])>>> a + btensor([[ 1, 2, 3],
[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])

The smaller tensor (b) is “broadcast” to match the bigger one (a).

The following one wouldn’t have worked.

Honestly, who can blame PyTorch for that? How could we possibly expect it to handle that trailing column in the second tensor?

I mean, sometimes you just have to follow the rules, that’s it! ;)

Here is yet another example:

>>> a = tensor([[0], 
[10],
[20],
[30]])
>>> b = tensor([1.0, 2.0, 3.0])>>> a + b tensor([[ 1, 2, 3],
[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])

Here, the stretching (the broadcasting!) is done on both tensors.

Cool, right?

One exception to the rule: in-place operations

One complication is that in-place operations do not allow the in-place tensor to change shape as a result of the broadcast. See for yourself:

>>> import torch>>> a = torch.ones(8,1,1) # a and...
>>> b = torch.ones(8,9,9) # ...b are compatible for broadcasting.
>>> a.add_(b) # But not for in-place addition!
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-38-58efc0204f63> in <module>
1 a = torch.ones(8,1,1)
2 b = torch.ones(8,9,9)
----> 3 a.add_(b)
RuntimeError: output with shape [8, 1, 1] doesn't match the broadcast shape [8, 9, 9]

Conclusion

This is what I wanted to share about broadcasting in PyTorch/Numpy. All the rules presented here work the same in Numpy. The only difference is that Numpy uses arrays instead of tensors. You can play around with those objects to discover more about them.

I hope you enjoyed reading this article. Please don’t forget to clap & write a response if you enjoyed it.

There is a lot more to be broadcasted soon! Let’s keep in touch!

Links

Check Jeremy Howard, on broadcasting. That’s where the idea of this post originated from.

--

--