Analytics Vidhya
Published in

Analytics Vidhya

A thorough Understanding of Numpy Strides and Its Application in Data Processing

Striding is like taking steps with a given window size in the data. It is a very common technique which you will see in all kinds of data processing tasks. This tutorial will introduce the most basic concept of Stride, and how to use it in Numpy, and finally wrap up with a Pandas dataframe data processing use case.

In a nutshell, you will be introduced:

  1. What is stride?
  2. Data structure of a Numpy 2D array, in terms of its memory allocation
  3. How to stride on a Numpy array:
    “Usage of numpy.lib.stride_tricks.as_strided()”
  4. An useful use case in data processing:
    How to match a pattern in a sequence and return the start and end index using Striding?

What is Stride?

In a simple analogy, striding is like taking steps in your data with a window of a fixed size. Consider the 1D array of 8 numbers (i.e., [10, 8, 6, 2, 5, 1, 4, 9]) and the red box each time looks at 2 numbers (window = 2), and then slide from the current start position i to the i+3 position (step = 3).

Fig 1. The basic concept of stride step and window

For the N number elements in the array, given window size (W) and stride step (S), it will slide this many times: ((N-W) // S) + 1
In our example, it’s ((8–2) // 3)+1 = 3 times.

(Note: here the symbol ‘//’ is a Python division symbol for rounds-down the quotient to an integer. For example, 4 // 2 is 2, and 5 // 2 is also 2.)

Data structure of a Numpy 2D array

We, as a human, have the concept of 2D array — we’d like to picture a 2D array as a table-like structure, which has i columns and j rows. However, a 2D array in the memory is a sequential block, just like an 1 D array. It would store a row after a row, and a column after another column, as the illustration shows:

Fig2. How to compute the bytes of stride

Every ‘square’ in the illustration is the memory space required for 1 number element, and its required space depends on the data type you initiated for the array. So for a ‘int32’ data type, a square equals to 4 bytes in the memory blocks, and for a ‘float’ data type, a square equals to 8 bytes in the memory blocks, and this ‘unit block’ would tell the machine how much in the memory space to “stride” for ‘the next number’ (next column).

Let’s take a look in the Numpy 1D array:

import numpy as npx = np.array([10, 8, 2, 5, 4, 9], dtype='int32')
x.shape
>> (6,)
x.strides
>> (4,)
x = np.array([10, 8, 2, 5, 4, 9], dtype='float')
x.shape
>> (6,)
x.strides
>> (8,)

Since x is an 1D array, its shape is (i, ) where the i is the number of elements in the array, i.e., 6 elements in our example, so the shape shows (6, ). However, notice that when the array is initiated with different data types, i.e., ‘int32’ to ‘float’, the strides steps changed from (4, ) to (8, ), because an ‘int32’ takes 4 bytes and ‘float’ takes up 8 bytes in memory. It means for this array, starting from the very first position in the memory, how many bytes it needs to consume in order to reach to the next number, so this is the striding and window concept we introduced — where the window is the N bytes of one number, and each step it needs to jump N bytes in the memory in order to get to the start position of the next number.

Next, let’s take a look at the Numpy array:

x = np.array([[10, 8],[2, 5],[4, 9]], dtype='int32')x
>> array([[10, 8],
[ 2, 5],
[ 4, 9]])
x.shape # shape: (Num of row, Num of column)
>> (3, 2)
x.strides # stride: (Bytes per row, Bytes per column)
>> (8, 4)

Here we initiated a 2D array of int32 data type, and it has 3 rows and two columns, i.e., shape is (3, 2).

Note: the first number in the shape tuple, (i, j), refers to the number of elements in the first layer of the array. So, in this 2D array, the array has 3 elements (i.e., sub-brackets) in the first layer, and for each sub-bracket, the second layer, there are 2 elements in it. Thus, the shape of this 2D array can be understood as (#row, #columns). However, visualizing a 2D array as a table-like structure is really just for the ease for human comprehension. In the computer, it is stored as a sequential block in memory, so the dimension of the array can be nested and continue to be expanded. Thus, you can have N-D array for N dimension as you like.

Okay, now let’s print x.strides , you will find it to be (8, 4), which is as expected, for:

  • shape is read as (# of rows, # of columns)
  • strides is read as (# strides step of a row, # strides step of a column)

The strides step is measured by bytes. Therefore, to jump to the next row, it needs to jump 1 integer (4 bytes) x 2 = 8 bytes, and the to jump to the next column, it is just 1 integer = 4 bytes, and thus we got (8, 4).

How to stride in a Numpy array

Alright, now we have introduced “what is stride” and how the “array is actually striding bytes in the memory space”. Now we want to known how to achieve this in Numpy array. If we are given an 1-D array, and wish to stride for certain window and step, and then construct a new 2-D array like this:

Here, we will use a Numpy function, which would allow you to access the array in the memory blocks (like a pointer), and then allow you to stride in the memory space:

numpy.lib.stride_tricks.as_strided ( Arr, New_arr_shape, Stride_steps )

This function takes 3 main arguments:

  • Array: This is the original array you want to stride on. In our example, it is the 1-D array that has 8 integers in it.
  • New array shape: This is the expected shape of the output array (after stride). In our example, it is the shape of the new 2-D array, which should has 3 rows and 2 columns, and thus shape should be (3, 2).
    (Go back to the section “What is stride” figure 1 for the illustration.)
  • Stride steps: This is the stride size measured in bytes. In our example, we want to jump over 3 indices in the array and each of them is an integer (4 bytes), and thus 3*4 = 12 bytes for the stride step for the row. For the column, the next integer is just 4 bytes away, so the stride step for the column. Thus, the stride steps is (12, 4) in our example.
    (Go back to the section “What is stride” figure 2 for the illustration.)
x = np.array([10,8,6,2,5,1,4,9], dtype='int32')x.shape        # The shape is a 1D array of 8 'int32' numbers
>> (8,)
x[0].nbytes # Each 'int32' occupied 4 bytes (32 bits/8=4 bytes)
>> 4
y = np.lib.stride_tricks.as_strided(x,((8-2)//3+1,2),(3*4,4))y
>> array([[10, 8],
[ 2, 5],
[ 4, 9]])

!Important! One thing very important to note is: When using this function, you need to know exactly ‘how many bytes you are striding each step’, because this function doesn’t check boundary of the array. So if you mess up the striding windows or steps, you would end up getting random data from your memory. For example, let me stride over 7 integers with windows=2, while my array only has 8 integers.

x = np.array([10,8,6,2,5,1,4,9], dtype='int32')
y = np.lib.stride_tricks.as_strided(x,(8-2)//3+1,2),(7*4,4))
y
>> array([[10, 8],
[9, 0], # The last 3 numbers are the
[1599227219, 1380275029]) # random data in the memory

Also, of course you don’t want to hardcoded those bytes and strides steps by literals. That was just for the demo purpose. You can make the code more general.

x = np.array([10,8,6,2,5,1,4,9], dtype='int32')total_len_x = x.shape[-1]
window = 2 # The red box only looks for 2 numbers
steps = 3 # The red box jump over 3 numbers each time
new_shape_row = (total_len_x - window)//steps + 1
new_shape_col = window
new_shape = (new_shape_row, new_shape_col)
n_bytes = x.strides[-1] # Since x.strides tells you how many
# bytes Numpy array travels to get the
# next number in the array, so it equals
# to x[0].nbytes.
stride_steps_row = n_bytes * steps
stride_steps_col = n_bytes
stride_steps = (stride_steps_row, stride_steps_col)
y = np.lib.stride_tricks.as_strided(x, new_shape, stride_steps)

Application of Pattern Match in an Array Using Stride

Say we have a problem like this to solve:

One solution to solve this problem is using the np.lib.stride_tricks.as_stride() to stride at step 1, with a window of the 3 (i.e., the length of the pattern), and then compare each chunk of the data with the pattern. For example, the stride will return [[1,2,1], [2,1,2], [1,2,3], [2,3,1], …] and then we can compare these unit size of data of length 3 with our target pattern. So first let’s wrap up the abovementioned codes into a function that would just do that:

def return_striding_content(arr, window, stride):
new_shape_row = (arr.shape[-1] - window)//stride + 1
new_shape_col = window
new_shape = (new_shape_row, new_shape_col)
n_bytes = x.strides[-1]
stride_steps_row = n_bytes * stride
stride_step_col = n_bytes
stride_steps = (stride_steps_row, stride_step_col)
return np.lib.stride_tricks.as_strided(arr, new_shape, stride_steps)# Construct the array
x = np.array([1,2,1,2,3,1,1,2,3], dtype='int32')
# Construct the sliding content
return_striding_content(x, 3, 1)
>> array([[1, 2, 1],
[2, 1, 2],
[1, 2, 3],
[2, 3, 1],
[3, 1, 1],
[1, 1, 2],
[1, 2, 3]])

Then we compare each of the the striding content that was returned by return_striding_content(x,3,1) with the pattern y.

p = [1,2,3]compare = (return_striding_content(x,3,1) == p).all(axis=1)compare
>> array([False, False, True, False, False, False, True])

Note that we need the .all() function to do the comparison on the entire arrays, so obviously the whole array from the return_striding_content() is different than the p array, i.e., after all, they don’t even have the same dimension. So, that’s why we need to specify the axis. If the axis=0, it would compare the vertical elements, and if axis=1, it would compare the horizontal elements [See this useful blog]. You can see the result shows that the 3rd and the 7th elements are the patterns we are to find, and that is corresponding to the start index of where the pattern starts in the original array x. Therefore, given the fixed windows of the pattern, the end index where the pattern ends would be simply the “start index + 2”.

[([i, i+2]) for i in np.where(compare==True)[-1]]
>> [[2,4], [6,8]]

Note: the [-1] applied after the np.where() is really nothing but just extracting the data from the tuple structure of the returning result. One can try to just print out np.where(compare==True) to see what is the returning result.

Reference

[1] IPython Interactive Computing and Visualization Cookbook, Second Edition (2018), by Cyrille Rossant:
https://ipython-books.github.io/46-using-stride-tricks-with-numpy/

That’s it! Hope you enjoy this tutorial about how to use the introduction of striding & how to use it with Numpy. If you find this post to be useful, please leave a “clap” to this post! You are also welcome to leave your thoughts, comments or feedback! Thanks for your reading!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store