[Pytorch] Contiguous vs Non-Contiguous Tensor / View — Understanding view(), reshape(), transpose()

Kathryn
Analytics Vidhya
Published in
10 min readJan 28, 2021

Tensor and View

View uses the same data chunk from the original tensor, just a different way to ‘view’ its dimension.

Before we dive into the discussion about what does contiguous vs. non-contiguous mean, we need to first understand the relations between Tensor and View in Pytorch.

View is nothing but an alternative way to interpret the original tensor’s dimension without making a physical copy in the memory. For example, we can have a 1x12 tensor, i.e. [1,2,3,4,5,6,7,8,9,10,11,12] and then use .view(4,3) to change the shape of the tensor into a 4x3 structure.

x = torch.arange(1,13)
print(x)
>> tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
x = torch.arange(1,13)
y = x.view(4,3)
print(y)
>>
tensor([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])

If you change the data in the original tensor x, it also reflects in the view tensor y, because instead of creating another copy of the original tensor x, the view tensor y is reading the data from the same memory address as of the original tensor x. Vice versa, the change of value in a view tensor would simultaneously change the value in the original tensor, because the view tensor and its original tensor share the same chunk of memory block.

x = torch.arange(1,13)
y = x.view(4,3)
x[0] = 100
print(y)
>>
tensor([[100, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[ 10, 11, 12]])
x = torch.arange(1,13)
y = x.view(4,3)
y[-1,-1] = 1000
print(x)
>> tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1000])

A Sequence of Data Can be Viewed with Different Dimensions in a Contiguous manner

The tensor data is stored as 1D data sequence.
Technically
, .view() is an instruction that tells the machine how to stride over the 1D data sequence and provide a tensor view with the given dimension.

Intuitively, you can imagine the .view() function defines the new dimensions, say (2,2,3), as an empty template boxes, like the following illustration. Then, the data is consumed from the beginning of the 1D data sequence, and is filled into these boxes from the innermost spots, one-by-one until the box is full, it moves to the next dimension (box). This follows a contiguous order.

Thus, you can have any combination of the dimensions in view(), as long as the total number of the boxes matches the number of elements in the 1D array, e.g., 2x2x3 = 6x2 = 12. You can as well have (3,2,2), or (4,3) as along as the total number of elements add up.

Strides

If you are unfamiliar with how the computer strides in the memory (1D sequence) to form a N-D dimensional tensor, I encourage you spend 5 mins reading only one section in the following article:

[Read] Data structure of a Numpy 2D array section in this post.

Numpy strides() returns (N bytes to Next Row, M bytes to Next Column)
Pytorch stride() returns (N elements to Next Row, M elements to Next Column).

Let’s look at the strides in a 2D array

# x is a contiguous data. Recall that view() doesn't change data arrangement in the original 1D tensor, i.e. the sequence from 1 to 12.x = torch.arange(1,13).view(6,2)
x
>>
tensor([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12]])
# Check stride
x.stride()
>> (2, 1)

The strides (2, 1) tells us: We need to stride 1 (the last dimension, which is dimension 0) number to reach the next number along axis 0, and we need to stride 2 (the dimension 1) numbers to travel to the next number along axis 1.

What about the strides in a 3D array

y = torch.arange(0,11).view(2,2,3)
y
>>
tensor([[[ 0, 1, 2],
[ 3, 4, 5]],

[[ 6, 7, 8],
[ 9, 10, 11]]])
# Check stride
y.stride()
>> (6, 3, 1)

Indicated by the strides (6, 3, 1), we can observe starting from every position i, (i+1) leads you travel along axis 0, (i+3) travels along axis 1, and (i+6) travels along axis 2. The formula to retrieve (A, B, C) position in the 1D tensor is done by: A * 6 + B * 3 + C * 1

For example, in the above tensor we deliberately choose a sequence of numbers ranging from 1 from 12, because we can use that as our indicator of their positions in the 1D array.

  • Index (0, 0, 0)
    Position in 1D: 0 * 6 + 0 * 3 + 0 * 1 = 0
  • Index (1, 0, 0)
    Position in 1D: 1* 6 + 0 * 3 + 0 * 1 = 6
  • Index (0, 1, 0)
    Position in 1D: 0 * 6 + 1* 3 + 0 * 1 = 3
  • Index (0, 0, 1)
    Position in 1D: 0 * 6 + 0* 3 + 1* 1 = 1
  • Index (1, 1, 1)
    Position in 1D: 1* 6 + 1* 3 + 1* 1 = 10

Okay, now we finished the introduction of contiguous view, and also learned how the strides works in a N dimensional tensor in Pytorch. Now let’s take a look at what the non-contiguous data is like.

Non-Contiguous Data Structure: Transpose( )

First of all , Transpose(axis1, axis2) is simply “swapping the way axis1 and axis2 strides.

# Initiate a contiguous tensor
x = torch.arange(0,12).view(2,2,3)
x
>>
tensor([[[ 0, 1, 2],
[ 3, 4, 5]],

[[ 6, 7, 8],
[ 9, 10, 11]]])
x.stride()
>> (6,3,1)
# Now let's transpose axis 0 and 1, and see how the strides swap
y = x.transpose(0,2)
y
>>
tensor([[[ 0, 6],
[ 3, 9]],

[[ 1, 7],
[ 4, 10]],

[[ 2, 8],
[ 5, 11]]])
y.stride()
>> (1,3,6)

Alright, the y is a x.transpose(0,2), which swaps how the x tensor stride in axis 0 and axis 2, and therefore the resulting strides of y is (1,3,6). It means we need to jump 6 numbers to retrieve the next number in axis 0, and jump 3 numbers to retrieve the next number in axis 1, and jump 1 to get the next number in axis 2. (Striding formula: A * 1+ B * 3+ C * 6)

The different aspect of transpose is: Now the data sequence is not following a contiguous order anymore. It does not fill the sequential data one-by-one from the innermost dimension, and jump to the next dimension when filled up. Now it jumps 6 numbers in the innermost dimension, so it is not continuous.

Transpose( ) has Non-Contiguous Data Structure but Still a View Not a Copy

transpose() still returns a View but not a copy of the original tensor. Therefore, it is a non-contiguous ‘View’. It changes the ways of strides on the original data, and any data modification on the original tensor would affect the view, and vice versa.

# Change the value in a transpose tensor y
x = torch.arange(0,12).view(2,6)
y = x.transpose(0,1)
y[0,0] = 100
y
>>
tensor([[100, 2, 4, 6, 8, 10],
[ 1, 3, 5, 7, 9, 11]])
# Check the original tensor x
x
>>
tensor([[100, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[ 10, 11]])

Another way to peak the 1D data sequence stored in a tensor is through the method .storage()

y.storage()
>>
100
1
2
3
4
5
6
7
8
9
10
11
[torch.LongStorage of size 12]

Check Contiguous and Non-Contiguous in Pytorch

Pytorch has a method .is_contiguous() that tells you whether the tensor is contiguous.

x = torch.arange(0,12).view(2,6)
x.is_contiguous()
>> True
y = x.transpose(0,1)
y.is_contiguous()
>> False

Convert a Non-Contiguous Tensor (or View) to Contiguous

Pytorch has a method .contiguous() that would convert the non-contiguous tensor or view to contiguous.

z = y.contiguous()
z.is_contiguous()
>> TRUE

It makes a copy of the original ‘non-contiguous’ tensor, and then save it to a new memory chunk following the contiguous order. We can observe it by its strides.

# This is contiguous
x = torch.arange(1,13).view(2,3,2)
x.stride()
>> (6, 2, 1)
# This is non-contiguous
y = x.transpose(0,1)
y.stride()
>> (2, 6, 1)
# This is a converted contiguous tensor with new stride
z = y.contiguous()
z.stride()
>> (4, 2, 1)

One way I use to differentiate whether the tensor / view is contiguous, is by observing whether the (A, B, C) in strides satisfies A > B > C. If it doesn’t, it means at least one dimension is skipping a longer distance than the dimension above it, which makes it non-contiguous.

We can also observe how the converted contiguous tensor z stores the data in a new order.

# y is a non-contiguous 'view' (remember view uses the original chunk of data in memory, but its strides implies 'non-contiguous', (2,6,1).
y.storage()

>>
1
2
3
4
5
6
7
8
9
10
11
12
# Z is a 'contiguous' tensor (not a view, but a new copy of the original data. Notice the order of the data is different). It strides implies 'contiguous', (4,2,1)
z.storage()
>>
1
2
7
8
3
4
9
10
5
6
11
12

Difference Between view( ) and reshape( )

While both of the functions can change the dimensions of the tensor (basically it’s just a different ways of striding in the 1D data), the main difference between the two is:

1/ view(): Does NOT make a copy of the original tensor. It changes the dimensional interpretation (striding) on the original data. In other words, it uses the same chunk of data with the original tensor, so it ONLY works with contiguous data.

2/ reshape(): Returns a view while possible (i.e., when the data is contiguous). If not (i.e., the data is not contiguous), then it copies the data into a contiguous data chunk, and as a copy, it would take up memory space, and also the change in the new tensor would not affect the value in the original tensor.

With contiguous data, reshape() returns a view.

# When data is contiguous
x = torch.arange(1,13)
x
>> tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Reshape returns a view with the new dimension
y = x.reshape(4,3)
y
>>
tensor([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
# How do we know it's a view? Because the element change in new tensor y would affect the value in x, and vice versa
y[0,0] = 100
y
>>
tensor([[100, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[ 10, 11, 12]])
print(x)
>>
tensor([100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

Next, let’s see how reshape() works on non-contiguous data.

# After transpose(), the data is non-contiguous
x = torch.arange(1,13).view(6,2).transpose(0,1)
x
>>
tensor([[ 1, 3, 5, 7, 9, 11],
[ 2, 4, 6, 8, 10, 12]])
# Reshape() works fine on a non-contiguous data
y = x.reshape(4,3)
y
>>
tensor([[ 1, 3, 5],
[ 7, 9, 11],
[ 2, 4, 6],
[ 8, 10, 12]])
# Change an element in y
y[0,0] = 100
y
>>
tensor([[100, 3, 5],
[ 7, 9, 11],
[ 2, 4, 6],
[ 8, 10, 12]])
# Check the original tensor, and nothing was changed
x
>>
tensor([[ 1, 3, 5, 7, 9, 11],
[ 2, 4, 6, 8, 10, 12]])

Finally, let’s see if view() can work on non-contiguous data.
No, it can’t!

# After transpose(), the data is non-contiguous
x = torch.arange(1,13).view(6,2).transpose(0,1)
x
>>
tensor([[ 1, 3, 5, 7, 9, 11],
[ 2, 4, 6, 8, 10, 12]])
# Try to use view on the non-contiguous data
y = x.view(4,3)
y
>>
-------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
----> 1 y = x.view(4,3)
2 y

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Summary

  • A ‘View’ uses the same chunk of memory block as the original tensor, and thus any changes among this memory chunk will affect all the views and the original tensor that’s associated with it.
  • A ‘View’ can be contiguous or non-contiguous.
  • A non-contiguous tensor view can be converted to a contiguous one, and it would make a copy of it, so the data will not be associated with the original data chunk anymore.
  • Stride Position Formula: Given a strides (A, B, C), the position of the index (j, k, v) in the 1D data array is (A *j + B*k + C*v)
  • Difference between view() and reshape():
    view() cannot apply on ‘non-contiguous’ tensor /view. It returns a view.
    reshape() can apply on both ‘contiguous’ and ‘non-contiguous’ tensor/view. When possible, it will return a view; When the data is non-contiguous, it makes a new copy of it.

I hope this article fortifies your tensor intuition and makes you swift on identifying tensor dimension in your work! If you find this post to be useful, please leave a “clap” to this post. Your encouragement is what keeps me forward! Thanks for your reading!

--

--