The Gospel of Numpy

4 min readFeb 25, 2018

If numpy is a religion, then JVP must be one of the prophets.

The talk itself is enlightening. But one might be confused with the vast amount of information. Below, I’ve summarised the few skills that have much greater importance than the others. Note aside, this is not an ordered, an incomplete list.

Tensor / Multidimensional Arrays

Tensor is just a fancy term for multidimensional array.

In numpy, two dimensional and three dimensional arrays have the form

A[i,j] = A[row,column]
A[k,i,j] = A[depth,row,column]

Notice the difference between x and y.

>>> x = np.arange(16).reshape((2,2,4))
>>> x
 array([[[ 0, 1, 2, 3],
         [ 4, 5, 6, 7]],        [[ 8, 9, 10, 11],
         [12, 13, 14, 15]])>>> y = np.arange(16).reshape((4,4))
>>> y
 array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])

Please notice the representation of array x. It is not a (4,4) array, but rather a (2,2,4) array. The array starting from 0 and ending at 7 is located in the face, while array with the value of 8 until 15 is located in the back. It is like stacking the first array on top of the second array.

In array y, there is no stacking whatsoever.

Slicing

This is that important. Period.

Slicing in the numpy might be harnessed by using “:” symbol. Here, “:” denotes everything until, if not specified, then take everything.

#nth row
 A[:n,:] = A[:n]
#nth column
 A[:,n]

Where n is the slicing until the nth row or column we want to access

However, for unknown reasons there’s a shorthand notation for taking rows

#nth row
 A[:n,:] = A[:n]
#nth column
 A[:,n]

Here is the example of slicing the rows

#until nth row
 A[:n]#0th row
 >>> A[0]
 array([[ 0, 1, 2, 3],])#until 1st row
 >>> A[:1]
 array([[ 0, 1, 2, 3],
        [ 4, 5, 6, 7]])

Here is the example of slicing the columns

#until nth column
 A[:,:n]#until 1st column
 >>> A[:,:1]
 array([[0, 1],
        [4, 5]],
        [8, 9]])

Tensor Operations

Why should we ever need this? Aren’t most data could be represented into 2D arrays? While this seems true at first, some task requires you to convert 2D array into 3D array for fast calculation, i.e. avoiding loops

This is particularly important for computing the euclidean distance between points. The first part of solving this problem involves subtracting a point with any other points.

In this problem we have array x and y, both are two dimensional arrays.

>>> x = np.arange(6).reshape((2,3))
>>> x
 array([[0, 1, 2],
        [3, 4, 5]])>>> y = np.arange(12).reshape((4,3))
>>> y
 array([[0,  1,  2],
        [3,  4,  5],
        [6,  7,  8],
        [9, 10, 11])

Column 0,1 and 2 refers to the x,y,z coordinate of a point and the rows depict the the corresponding point.

However, the broadcasting rule forbid us to directly subtract them. To elude this, we need to convert one of the array into 3D array.

>>> x = x.reshape((2,1,3))
>>> x
 array([[[0, 1, 2]],
        [[3, 4, 5]]])

If we substract y from x

>>> x - y
array([[[ 0,  0,  0],
        [-3, -3, -3],
        [-6, -6, -6],
        [-9, -9, -9]],        [[ 3,  3,  3],
        [ 0,  0,  0],
        [-3, -3, -3],
        [-6, -6, -6]]])

Axes

Axes are operations along an axis. Because of that, axes depends on the dimension of the array.

2D array have 2 corresponding axes; the first (axis 0) running vertically downwards across rows; the second (axis 1) running horizontally across column

The apprehension of knowing this skill is crucial in several problems. The most infamous case of this is summing across an array.

>>> x = np.arange(16).reshape((4,4))>>> x
 array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])
 
 >>> a = x.sum(axis=0)
 >>> a
 array([ 24, 28, 32, 36])>>> b = x.sum(axis=1)
 >>> b
 array([ 6, 22, 38, 54])

Since sum of axis zero is applied downward (y direction) , the 0th element would be the summation of 0,4,8,12. The 2nd element would equal to the sum of 2,6,10,14 and etc.

However, sum of axis one is applied across horizontally (x direction). Because of that, the first element is calculated from the sum of 0,1,2,3.

By the way, x.sum(axis=1) is equivalent to x.sum(1)

3D arrays have 3 corresponding axes. However, axis 0 is running across depth(z direction), whereas axis 1 is applied downward (y direction) and axis 2 is applied horizontally (x direction)

>>> x = np.arange(16).reshape((2,2,4))
>>> x
 array([[[ 0, 1, 2, 3],
         [ 4, 5, 6, 7]],        [[ 8, 9, 10, 11],
         [12, 13, 14, 15]])>>> a = x.sum(0)
>>> a
 array([[[ 8, 10, 12, 14],
         [16, 18, 20, 22]])>>> b = x.sum(1)
>>> b
 array([[[ 4, 6, 8, 10],
         [20, 22, 24, 26]])>>> c = x.sum(2)
>>> c
 array([[[ 6, 22],
         [38, 54]])

Sum in array a is applied on the depth (axis 0)

>>> a
 array([[[ 0+8,  1+9, 2+10, 3+11],
         [4+12, 5+13, 6+14, 7+15]])

Sum in array b is applied on vertical direction (axis 1)

>>> b
 array([[[ 0+4,  1+5,  2+6,    3+7],
         [8+12, 9+13, 10+14, 11+15]])

Sum in array c is applied on horizontal direction (axis 2)

>>> c
 array([[[   0+1+2+3,     4+5+6+7],
         [ 8+9+10+11, 12+13+14+15])

The Gospel of Numpy

Tensor / Multidimensional Arrays

Slicing

Tensor Operations

Axes

Written by jedi