A Beginners Guide to NumPy

Kanishk Srivastava
8 min readMay 22, 2019

--

Colour contour of a matrix; Source: http://math.oregonstate.edu/~higdon/figures/fig_close_color_60.jpg

What is Numpy?

NumPy is a python library which adds support for large multi-dimensional arrays and matrices, along with a large number of high-level mathematical functions to operate on these arrays and matrices.
NumPy builds upon on two earlier python modules dealing with arrays. These two modules were called Numeric and Numarray. Numeric is a deprecated module for high-performance numerical computing. Numarray is a complete rewrite of Numeric; and much like its predecessor, its use is also deprecated.
NumPy brings together the best of both worlds by combining the build of Numeric and the features of Numarray.

NumPy is an abbreviation for “Numerical Python” or “Numeric Python”.

Why Numpy?

Python is almost readable like pseudo-code; therefore, it allows the user to express compelling ideas within a few lines of code while maintaining readability. To use something as powerful as python in a scientific environment, we use a few libraries; NumPy is one such library.

Using NumPy with Python gives it functionality comparable to MATLAB since both of them are interpreted languages, and both of them allow the user to write fast programs as long as the operations performed are on arrays or matrices rather than scalar quantities. Complimentary python packages such as SciPy and Matplotlib add the missing MATLAB-like functionality.

NumPy is especially useful for computer vision problems where multidimensional arrays represent images with multiple colour channels. Operations such as slicing, masking and indexing are very efficient ways to access specific pixels in an image.

Setting Up

For most users (especially Windows users), the easiest way to get started is to download python distributions such as Anaconda and Pyzo, which include all of the widely used packages baked into the distribution.

For people who prefer not to use such python distributions, NumPy is available through python’s standard pippackage manager through the following commands:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

Note: It is recommended to use the --user command, which installs NumPy for the current user and does not write to the system directory. Also, do not use sudo pip to install NumPy; it can cause issues with the install.

For other ways to install, and system-wide install instructions, visit https://scipy.org/install.html.

The Basics

In this section, we will outline a convenient and basic introduction to widely used NumPy modules. Without further delays, let’s get started!

Importing NumPy

Now that NumPy is set up, we can move on to use it. However, before we use it, we have to import it. NumPy can be imported just like any other python module.

import numpy

However, NumPy is generally not imported in this form. The standard is to rename the imported module to np.

import numpy as np

Arrays

The ndarray object is at the core of the NumPy package. A NumPy array is an n-dimensional collection of homogenous data types with performance improvements baked into the compiled code.

There are a few fundamental differences between a python array and an ndarray. Here are a few differences:

  • NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
  • The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.

Each dimension of an ndarray array is called an axis. The shape of an array gives its measure along each axis.
We demonstrate the ndarray object’s defining attributes as follows:

For a two-dimensional ndarray
For a three-dimensional ndarray

Initialising a NumPy array
There are multiple ways to initialise an ndarray.

We can initialise an ndarray using a list by calling the np.array() function. The level of nesting in the list determines the dimensionality of the ndarray.

The data type (dtype) is intrinsically derived from the input list; however, it can be explicitly defined by changing the dtype parameter while initialising the array (nd.array(input_list, dtype = *data type*)).
The default datatype is float64.

Sometimes, while initialising the array, we do not know the value of every element in the array; however, we do know the number of elements the array will hold.
In such situations, we can initialise the n-dimensional array filled with zeroes (using np.zeroes), with ones (using np.ones) or we can have random values for each element (using np.empty).

The random values used while calling np.empty depend on the memory state at the time of function call.

For a better understanding, look at the animation below:

To generate sequences within a range, NumPy offers arange which is analogous to Python’s range. To generate sequences of a specified number within a range, we use the linspace function. The main difference between the two is that arange uses a ‘step’ value to generate the elements and linspace takes in the number of elements we need as an argument.

Printing Arrays
In Python, to print an array, we use for-loops; and to print multidimensional arrays, we use nested for-loops.
This tedious task is simplified in Numpy and a simple print statement does the job for all n-dimensional arrays.

The variable ‘a’ can be any n-dimensional array

When an array is too large to be printed, NumPy automatically skips the middle and prints only a few corner elements.

Note: To disable this behaviour, and to force NumPy to print the entire result as it is, use np.set_printoptions(threshold=np.nan).

Array Indexing
One-dimensional arrays can be indexed, sliced and iterated much like lists in Python.

Since the arrays might be multi-dimensional while using NumPy, a slice must be specified for each axis of the array.

Note: Even if the slice is assigned to a new variable, it is only a view into the actual array; therefore, any changes made to the slice will be reflected in the original array.

While using slicing to index arrays in NumPy, the result is always a subarray; however, integer indexing allows us to create arbitrary arrays using the data from other arrays. Consider this as accessing elements from one array using a different array’s elements as indices.

We can also use boolean indexing to set a simple logical function as a condition to pick out arbitrary elements from an array.

Here is an example to understand the difference:

Note: Combining slicing with integer indexing is possible; however, doing so will yield an array of a lower rank than the original array as shown in the last few lines of the example code above.

Shape Manipulations

Depending on the number of elements along each axis, each matrix has a shape. However, we frequently need to change the shape of arrays in order to perform various operations on it.

NumPy has various built-in functions to reshape an array. We have listed a few widely used functions below. For a more comprehensive list, visit the NumPy documentation.

Note: The reshape function returns its argument with a modified shape, whereas the ndarray.resize method modifies the array itself.

Note: A 1-D array does not reflect any changes if it undergoes a transpose operation.

The arrays generated by ravel() are ordered in “C-style”, i.e., the first element of each row latches onto the last element of the previous row. When an array is flattened, it is generally not copied to a new location in memory, since they are generally stored in memory in “C-style” manner by default. However, if multiple arrays are combined or sliced, the array may be copied to a new location in memory.
The functions ravel() and reshape() can also be instructed to use FORTRAN-style arrays using an optional argument.

Arithmetic Operations

Basic mathematical functions in NumPy operate elementwise on NumPy arrays. Mathematical operations are available both as functions and operator overrides.

We can also use functions to calculate the sum of all elements of a matrix, the axis-wise sum of a matrix and element-wise square root of a matrix.

Visit the NumPy documentation for a detailed list of operations.

Broadcasting

We can perform arithmetic operations directly on NumPy arrays; however, such actions require the two participating arrays to be of the same length and size. This rule turns out to be quite limiting for us; thankfully, NumPy offers an in-built workaround.

This technique is also used widely outside of NumPy in libraries such as Tensorflow, Theano and Octave.
The term broadcasting describes how NumPy treats arrays with different shapes while performing operations. The fulfilment of this procedure requires replicating the smaller array along the last mismatched dimension. We do not replicate the shorter array in memory; instead, the storage is effectively manipulated to get the desired output.

There is a very easy way to figure out if two arrays are broadcasting compatible. Two dimensions are compatible when:

  • They are equal, or
  • One of them is ‘1’

If both of the conditions are not met, we get the exception,ValueError: operands could not be broadcast together.

Tip: The size of the resultant array is the maximum size along each axis of the input arrays.

It is not necessary that the input arrays are of the same size. For instance, if we have a 256 x 256 x 3 array of RGB values, and we want to scale up each colour channel by a different amount, we can multiply the image array with a one-dimensional array with three elements.

Here are a few examples to understand how the rules work:

How stretching works; Source: http://www.numpy.org/devdocs/user/theory.broadcasting.html
Why mismatched dimensions can’t work together; Source: http://www.numpy.org/devdocs/user/theory.broadcasting.html
Two-dimensional stretching; Source: http://www.numpy.org/devdocs/user/theory.broadcasting.html

Visit the NumPy documentation for more on broadcasting.

For a more in-depth guide to NumPy, you can refer to the documentation at https://docs.scipy.org/doc/.

I hope you got something out of this blog. If you found this post illuminating at all, consider 👏 clapping👏 for this post and following for more upcoming content.

--

--