NumPy in 50 Cells of Notebook

Originally posted here. All the codes are in a jupyter notebook and can be downloadable here.

In this post I will introduce the NumPy package and show how to use some of its most common features, functions and attributes. I will describe each feature with an example.

This tutorial consists of the following parts:

  • What is NumPy?
  • How to create NumPy arrays
  • Indexing, Fancy Indexing
  • Slicing
  • Universal Functions (Ufuncs)
  • Broadcasting
  • Masking, Sorting and Comparison
  • Further learning

1- What is NumPy?

NumPy is the basis of Pandas and many other packages. What makes NumPy such an incredible package is its data type (ndarray). ndarray stands for n-dimensional array, which basically looks like a Python list. However, it is a lot faster than a regular Python list. A Python list can contain different kinds of data types such as integers, strings, Boolean, True, False and even lists. On the other hand, NumPy arrays can hold only one type of data, and therefore doesn’t have to check the type of data type for every single element of the array when it is doing the computations. This feature makes NumPy a great tool for data science research and projects.

Before we get started, let’s check the version of NumPy and Python.

#import NumPy
import numpy as np
# sys was imported to check the python version
import sys
# check the version of python and NumPy
print('NumPy version:', np.__version__)
print('Python version',sys.version)
>>> NumPy version: 1.12.1
>>> Python version 3.6.1 |Anaconda custom (64-bit)| (default, Mar 22 2017, 20:11:04) [MSC v.1900 64 bit (AMD64)]

2- How to create NumPy arrays

There are many ways to create arrays in NumPy. Let’s take a look at a few of them here.

# Create one dimensional NumPy array
np.array([1, 2, 3])
>>> array([1, 2, 3])
# Array of zeros
np.zeros(3)
>>> array([ 0.,  0.,  0.])
# Array of 1s
np.ones(3)
>>> array([ 1.,  1.,  1.])
# Array of 3 random integers between 1 and 10
np.random.randint(1,10, 3)
>>> array([4, 8, 4])
# Create linearly spaced array
np.linspace(0, 10, 5 )
>>> array([  0. ,   2.5,   5. ,   7.5,  10. ])
# Create 2-dimensional array
np.array([[1,2,3],
[4,5,6],
[7,8,9]])
>>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Create 3x4 array values between 0 and 1
np.random.random((3,4))
>>> array([[ 0.85957774,  0.90323213,  0.08000421,  0.45366519],
[ 0.15077925, 0.57901453, 0.72878536, 0.88573099],
[ 0.51431053, 0.46266243, 0.54166614, 0.72836133]])

Let’s create 1-Dimensional and 2-Dimensional arrays.

a = np.array([1,2,3])
b = np.random.randint(0,10, (3,3))
print(a)
print(b)
>>> [1 2 3]
>>> [[5 9 0]
[5 7 8]
[0 1 9]]
# Adding new values into the the array
a = np.append(a, 4)
a
>>> array([1, 2, 3, 4])
# Print the shape and dimension of arrays
print("Shape of a:", np.shape(a))
print("Shape of b:", np.shape(b))
print('Dimension of a:', np.ndim(a))
print('Dimension of b:', np.ndim(b))
>>> Shape of a: (4,)
>>> Shape of b: (3, 3)
>>> Dimension of a: 1
>>> Dimension of b: 2
# Number of elements in the arrays
print('Number of elements in a:', np.size(a))
print('Number of elements in b:', np.size(b))
>>> Number of elements in a: 4
>>> Number of elements in b: 9

3- Indexing and Fancy Indexing

Indexing allows us to access the elements in the list. Indexing is the simplest way to do that. However, there are other ways, too such as fancy indexing, slicing and masking.

# a is 1D array, we created before
a
>>> array([1, 2, 3, 4])
# b is 2D array, we created before
b
>>> array([[4, 7, 4],
[7, 1, 0],
[9, 8, 6]])
# Get the first element of a 
# These two print statements generate the same result

print(a[0])
print(a[-4])
>>> 1
>>> 1
# Get the last element of a 
# These two print statements generate the same result

print(a[-1])
print(a[3])
>>> 4
>>> 4
# Get the first row of b
# These two print statements generate the same result

print(b[0])
print(b[0,:])
>>> [1 3 8]
>>> [1 3 8]
# Get the second column of b
b[:,1]
>>> array([3, 4, 0])

Fancy indexing allows us to pick certain values in the list quickly.

# To understand the fancy indexing better we will create two new arrays. 
x = np.array(['a', 'b', 'c'])
y = np.array([['d', 'e', 'f'],
['g', 'h', 'k']])
print(x)
print(y)
>>> ['a' 'b' 'c']
>>> [['d' 'e' 'f']
['g' 'h' 'k']]
# Fancy indexing on 1-D array
# Get the value of c in array x

ind = [2]
x[ind]
>>> array(['c'], 
dtype='<U1')
# Fancy indexing on 2D array
# Get the values e, h in array y

ind2 = [[0,1],[1]]
y[ind2]
>>> array(['e', 'h'], 
dtype='<U1')

4- Slicing

Slicing is the way to choose a range of values in the array. We use a colon (:) in square brackets.

This is the structure of slicing in NumPy. [Start : Stop : Step]

# Create an array of integers from 1 to 10
X = np.arange(1, 11, dtype=int)
X
>>> array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
# Get the first two elements of X 
X[:2]
>>> array([1, 2])
# Get the number 3,4 and 5 
X[2:5]
>>> array([3, 4, 5])
# Get odd numbers 
X[::2]
>>> array([1, 3, 5, 7, 9])
# Get even numbers
X[1::2]
>>> array([ 2,  4,  6,  8, 10])
# Create 2-D array 
Y= np.arange(1,10).reshape(3,3)
Y
>>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Get the first and second row
Y[:2,:]
>>> array([[1, 2, 3],
[4, 5, 6]])
# Get the second and third column
Y[:, 1:]
>>> array([[2, 3],
[5, 6],
[8, 9]])
# Get the element of 5 and 6
Y[1,1:]
>>> array([5, 6])

5- Universal Functions (Ufuncs)

Universal functions are useful when it comes to doing statistical and mathematical operations in NumPy arrays. NumPy Ufuncs are significantly faster than Python because the same operation in Python might require loops.

To see the list of available Ufuncs press tab after np. For example: np.{TAB}

# Find the maximum element of X
np.max(X)
>>> 10
# Mean of values in the X
np.mean(X)
>>> 5.5
# Get the 4th power of each element in X
np.power(X, 4)
>>> array([    1,    16,    81,   256,   625,  1296,  2401,  4096,  6561, 10000])
# Trigonometric functions 
print(np.sin(X))
print(np.tan(X))
>>> [ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427 -0.2794155
0.6569866 0.98935825 0.41211849 -0.54402111]
>>> [ 1.55740772 -2.18503986 -0.14254654 1.15782128 -3.38051501 -0.29100619
0.87144798 -6.79971146 -0.45231566 0.64836083]
# (sinx)2 + (cosy)2 = 1 famous trigonometric equation
np.square(np.sin(X)) + np.square(np.cos(X))
>>> array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
# array Y created before
Y
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# The same rules apply for 2-D arrays
np.multiply(Y, 2)
>>> array([[ 2,  4,  6],
[ 8, 10, 12],
[14, 16, 18]])
# split Y into 3 subarrays
np.split(Y, 3)
>>> [array([[1, 2, 3]]), array([[4, 5, 6]]), array([[7, 8, 9]])]

6- Broadcasting

Broadcasting makes it possible to use Ufuncs and many other operations on different sizes of arrays. There are some rules in order to do broadcasting. I won’t go into details here. However, I will refer a tutorial below.

# Add 5 to each element of X
X + 5
>>> array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
# Or 
np.add(X, 5)
>>> array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
# Create new array Z 
Z = np.arange(3)[:, np.newaxis]
Z
>>> array([[0],
[1],
[2]])
# Multiple Y and Z
np.multiply(Y, Z)
>>> array([[ 0,  0,  0],
[ 4, 5, 6],
[14, 16, 18]])

7- Masking, Comparing and Sorting

Masking is another very useful method of NumPy arrays.

# Create an array of 10 elements between 1 and 5
x = np.random.randint(1,5, 10)
x
>>>> array([3, 3, 4, 4, 4, 2, 1, 2, 3, 4])
# Create (3,3) size of array elements from 1 and 5
y = np.random.randint(1,5, (3,3))
y
>>> array([[2, 1, 4],
[4, 3, 2],
[3, 2, 4]])
# Sort elements in array x
np.sort(x)
>>> array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])
# Sort values along the rows
np.sort(y, axis=0)
>>> array([[2, 1, 2],
[3, 2, 4],
[4, 3, 4]])
# Sort values along the columns
np.sort(y, axis=1)
>>> array([[1, 2, 4],
[2, 3, 4],
[2, 3, 4]])
# == , !=, < , >, >=, <= operations on arrays
# This returns a Boolean

x > 3
>>> array([False, False,  True,  True,  True, False, False, False, False,  True], dtype=bool)
# Use masking feature to get the values of comparisons
x[x>3]
>>> array([4, 4, 4, 4])
# Another example 
x[(x <= 3) & ( x > 1 )]
>>> array([3, 3, 2, 2, 3])

8- Further Learning

All the code in this tutorial is in my github. There is also a jupiter notebook that can be downloaded. I highly recommend to rewrite the codes and try it on your own.

I have a list of resources for NumPy in my blog which has courses, tutorials, articles, etc.