A Complete Guide on NumPy — Part 1

Melih GULUM
11 min readMay 29, 2023

--

In this article, I want to talk about the NumPy library which is one of the most important library for data processing. However, you may not be using NumPy very often in your work. You didn’t realize it, but you used it. Let’s make it clear. Most libraries in Python like Pandas, almost all visualization libraries and so other. They rely on NumPy because of it’s power on numerical processing.

Travis Oliphand created NumPy in 2005. NumPy is open-source Python library that is used to work with arrays and perform mathematical operations. NumPy stands for Numerical Python. NumPy is widely used in Data Science and Machine Learning.

NumPy is very useful for developers as it has functions in the field of matrices, random number generation, linear algebra, Fourier Transform, Mathematical and logical operations on arrays.

📖 I will cover these routines and useful information to better understand NumPy. In order not to make the article too long, I will divide this article into two. You can find the topics for the second half in the table of contents or in A Complete Guide onNumPy — Part 2 in Medium.

⚠️ All the images were drawn and all the code was written by me. If you want to use it, please refer me to the source.

As you can see in the code block below, the most common way to use NumPy is to import it with the alias ‘np’.

import numpy as np

Benefits of NumPy

  • Fast
    — NumPy is written in C (and also C++ and Fortran) programming language, consequently it is much faster than Python Lists.
  • Fast Array Loops
    — Iterators are used in python lists to loop over the list and they keep track where they are in in the list. But the block of code written in Python Lists can easily be written in NumPy arrays with less looping and code.
  • Memory Usage
    — NumPy arrays are fixed length, on the other hand Python Lists can be expanded in size.
    — NumPy uses less memory and storage space.

NumPy’s data structure and the key concept is n-dimensional arrays (ndarray). These arrays can have one or more dimensions, or they can be 0-dimensional.

1. Scalar (0D)

2. Vector (1D)

3. Matrix (2D)

4. Tensor (>=3D)

NumPy Arrays (Image by Author)

What is a Tensor?

Let’s dive deep into the tensors we just talked about.

Data (mostly numerical) is the one the most important component in the machine learning. Therefore, we have to store, represent and use data in some way. For this reason, a data structure called Tensor is used in Machine Learning. To put it briefly, Tensors are containers for data.

Fun Fact: Tensorflow performs mathematical operations on multidimensional array, so that’s where the name TensorFlow comes from.

Key Attributes of Tensors

  1. Rank (Dimension — Number of Axes)
Tensor Ranks (Image by Author)
x = np.array([45, 154, 0, 1, 87])
x.ndim

OUT:
1

2. Shape

Shape is a tuple of integers, meaning the number of dimensions of each axis contained in the tensor.

scalar = np.array(12)
vector = np.array([1, -34, 87, 547, -97])
matrix = np.array([[1, -4, 7],
[-2, 5, -8],
[3, -6, 9]])
tensor = np.array([[[10, 11, 12], [13, 14, 15]],
[[20, 21, 22], [23, 24, 25]],
[[30, 31, 32], [33, 34, 35]]])

print("Scalar Shape : ", scalar.shape,
"\nVector Shape : ", vector.shape,
"\nMatrix Shape : ", matrix.shape,
"\nTensor Shape : ", tensor.shape)

OUT:
Scalar Shape : ()
Vector Shape : (5,)
Matrix Shape : (3, 3)
Tensor Shape : (3, 2, 3)

3. Data Type (dtype)

The data type of tensor refers to the type of data contained in it; such as float32, uint8, float64 and so on.

Data Types (Image by Author)

Tensor Types

Tensor Types (Image by Author)

i. Scalar

This tensor type contains only one number. Scalar tensors are 0-dimensional, so scalars (tensors) are also called 0D Tensors.

x = np.array(12)

print("Data : ", x,
"\nShape : ", x.shape,
"\nSize : ", x.size,
"\nRank : ", x.ndim,
"\nData Type : ", x.dtype,
"\nType : ", type(x))

OUT:
Data : 12
Shape : ()
Size : 1
Rank : 0
Data Type : int64
Type : <class 'numpy.ndarray'>

ii. Vector

A vector consists of a group (array) of numbers. Vector is a one-dimensional tensor and is also called a 1D Tensor.

x = np.array([12, 897, 0, 46, -7, 574])

print("Data : ", x,
"\nShape : ", x.shape,
"\nSize : ", x.size,
"\nRank : ", x.ndim,
"\nData Type : ", x.dtype,
"\nType : ", type(x))

OUT:
Data : [ 12 897 0 46 -7 574]
Shape : (6,)
Size : 6
Rank : 1
Data Type : int64
Type : <class 'numpy.ndarray'>

iii. Matrix

Simply, Matrix (2D) is made up of an array of vectors. A Matrix has two axes and these axes are often called rows and columns.

x = np.array([[1, -4, 7],
[-2, 5, -8],
[3, -6, 9]])

print("Data : ", x,
"\nShape : ", x.shape,
"\nSize : ", x.size,
"\nRank : ", x.ndim,
"\nData Type : ", x.dtype,
"\nType : ", type(x))

OUT:
Data : [[ 1 -4 7]
[-2 5 -8]
[ 3 -6 9]]
Shape : (3, 3)
Size : 9
Rank : 2
Data Type : int64
Type : <class 'numpy.ndarray'>

iv. 3D Tensor & Higher Dimensionality

If you stack the matrices, you get a 3D tensor. 4D tensor can be create by packing 3D tensor in array. Likewise, higher dimensionality tensor can be created.

x = np.array([[[10, 11, 12], [13, 14, 15], [16, 17, 18]],
[[20, 21, 22], [23, 24, 25], [26, 27, 28]],
[[30, 31, 32], [33, 34, 35], [36, 37, 38]]])

print("Data : ", x,
"\nShape : ", x.shape,
"\nSize : ", x.size,
"\nRank : ", x.ndim,
"\nData Type : ", x.dtype,
"\nType : ", type(x))

OUT:
Data : [[[10 11 12]
[13 14 15]
[16 17 18]]

[[20 21 22]
[23 24 25]
[26 27 28]]

[[30 31 32]
[33 34 35]
[36 37 38]]]
Shape : (3, 3, 3)
Size : 27
Rank : 3
Data Type : int64
Type : <class 'numpy.ndarray'>

Creating NumPy Arrays

NumPy has several ways to create arrays of various types. But the array() function is the most used function. In this section, we’ll talk about other common ways to create NumPy arrays.

a. np.array()

This is the simplest and most used function when creating arrays.

np.array([[5, 8], [6, 4], [1, 3]])

OUT:
array([[5, 8],
[6, 4],
[1, 3]])

b. np.empty()

This function returns uninitialized values with the given shape and data type.

np.empty([3,3])

OUT:
array([[1.43432079e-316, 0.00000000e+000, 6.93765304e-310],
[1.61768476e-316, 0.00000000e+000, 0.00000000e+000],
[9.88131292e-324, 0.00000000e+000, 1.18575755e-321]])
np.empty([3,3], dtype=int)

OUT:
array([[ 35036752, 0, 2318354852094746726],
[8386676005153153121, 8079584626855538537, 7234308714703254117],
[8389209317440384800, 44651278441, 112]])

c. np.arange()

With the given start and (not included) end values, we can easily create vectors with the np.arange() function. We can also change the step interval (default 1) between points.

np.arange(1,10)

OUT:
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
x = np.arange(1, 10, 2)
print("x: ", x)

y = np.arange(1.8, 10.6, 0.8)
print("y: ", y)

OUT:
x: [1 3 5 7 9]
y: [1.8 2.6 3.4 4.2 5. 5.8 6.6 7.4 8.2 9. 9.8]

d. np.zeros()

Sometimes it useful to start with the initial values. A common way to do this is to create an array filled with zeros. np.zeros() returns an array containing zeros.

np.zeros(4)

OUT:
array([0., 0., 0., 0.])
np.zeros((4,2), dtype=int)

OUT:
array([[0, 0],
[0, 0],
[0, 0],
[0, 0]])

e. np.ones()

We can also initialize an array with ones.

np.ones(4)

OUT:
array([1., 1., 1., 1.])
np.ones((4, 2), dtype=int)

OUT:
array([[1, 1],
[1, 1],
[1, 1],
[1, 1]])

f. np.full()

Unlike np.zeros() and np.ones(), we can specify an initialization value and it will fill the array with it in np.full().

np.full((4), 1)

OUT:
array([1, 1, 1, 1, 1])
np.full((4,2), 5)

OUT:
array([[5, 5],
[5, 5],
[5, 5],
[5, 5]])
np.full((4,2), (5, 8))

OUT:
array([[5, 8],
[5, 8],
[5, 8],
[5, 8]])

Array Manipulation

NumPy gives you a chance to manipulate arrays. NumPy has different kinds of functions to manipulate the arrays.

Array Manipulation Functions (Image by Author)

I. np.shape()

As we learned earlier, the shape function returns the shape of an array in a tuple of integers.

scalar = np.array(32)
vector = np.array([45, -5, 0.5, 654])
matrix = np.array([[45, 8],
[8, 90]])
tensor = np.array([[[10, 11, 12], [13, 14, 15]],
[[20, 21, 22], [23, 24, 25]],
[[30, 31, 32], [33, 34, 35]]])

print("Scalar: ", scalar.shape,
"\nVector: ", vector.shape,
"\nMatrix: ", matrix.shape,
"\nTensor: ", tensor.shape)

OUT:
Scalar: ()
Vector: (4,)
Matrix: (2, 2)
Tensor: (3, 2, 3)

II. np.reshape()

Sometimes we may need to convert the shape of arrays to any compatible shape without changing data., therefore we use np.reshape().

x = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
x.reshape(8)

OUT:
array([1, 2, 3, 4, 5, 6, 7, 8])
np.arange(32).reshape(4, 1, 8)

OUT:
array([[[ 0, 1, 2, 3, 4, 5, 6, 7]],
[[ 8, 9, 10, 11, 12, 13, 14, 15]],
[[16, 17, 18, 19, 20, 21, 22, 23]],
[[24, 25, 26, 27, 28, 29, 30, 31]]])

III. np.ravel(), np.flatten()

They both flat an array to 1D. The flattened array contains the data of the given array.

The key difference is :

  • flatten() returns a copy of the array.If we change the data in the flattened array, the original array will not change.
  • However ravel() returns a view of the original array whenever possible. If we change the data in the flattened array, the data of the original array will also change.
  • Also reshape(-1) tries the return a view whenever possible.
x = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]],
[[9, 10], [11, 12]]])
x.ravel()

OUT:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
x = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]],
[[9, 10], [11, 12]]])
x.flatten()

OUT:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).reshape(-1)

OUT:
array([1, 2, 3, 4, 5, 6, 7, 8])

IV. np.transpose()

As we are familiar with linear algebra, np.transpose() returns an array with the axes transposed. Besides there is a special T attributes for arrays.

x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(x.transpose(), x.T, sep="\n\n")

OUT:
[[1 4 7]
[2 5 8]
[3 6 9]]

[[1 4 7]
[2 5 8]
[3 6 9]]

V. np.expand_dims()

We can expand the dimension/shape of array by inserting a new axis at the specified position.

x = np.array([1,2])
y = np.expand_dims(x, axis=0)

print("x shape: ", x.shape,
"\ny shape: ", y.shape)

OUT:
x shape: (2,)
y shape: (1, 2)
x = np.array([1,2])
y = np.expand_dims(x, axis=1)

print("x shape: ", x.shape,
"\ny shape: ", y.shape)

OUT:
x shape: (2,)
y shape: (2, 1)

VI. np.concatenate(), np.stack()

np.concatenate() concatenates along an existing axis, whereas np.stack() concatenates along a new axis.

⚠️ Be careful!!! The array must have the same shape

There are more join functions, some of them are vertical (vstack()), horizontal (hstack()), or depth-wise (dstack()).

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
c = np.array([[7, 8]])

d = np.concatenate((a, b, c), axis=0)
print(d, "\nShape: ", d.shape)

OUT:
[[1 2]
[3 4]
[5 6]
[7 8]]
Shape: (4, 2)
a = np.array([[1, 2]])
b = np.array([[3, 4]])
c = np.array([[5, 6]])

d = np.stack((a, b, c), axis=0)
print(d, "\nShape: ", d.shape)

OUT:
[[[1 2]]
[[3 4]]
[[5 6]]]
Shape: (3, 1, 2)

VII. np.split(ary, indices_or_sections, axis=0)

If we can concatenate the arrays, we can split them too.

There are more split functions, some of them are vertical (vsplit()), horizontal (hsplit()), or depth-wise (dsplit()).

In the first code block below, the indices_or_sections parameter is specified as an array. In the code block below, the indices_or_sections parameter is specified as an array.

  • ary[:3]
  • ary[3:6]
  • ary[6:9]
  • ary[9:12]
  • ary[12:]

Also the indices_or_sections parameter can be specified as an integer. The second code block is an example of this. If it is not possible to split the array, Python will throw an error.

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14])

np.split(x, [3, 6, 9, 12])

OUT:
[array([1, 2, 3]),
array([4, 5, 6]),
array([7, 8, 9]),
array([10, 11, 12]),
array([13, 14])]
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15])
np.split(x, 3)

OUT:
[array([1, 2, 3, 4, 5]),
array([ 6, 7, 8, 9, 10]),
array([11, 12, 13, 14, 15])]

VIII. np.insert(), np.append()

The np.append() function adds values to the end of an array. On the other hand the np.insert() inserts values along the specified axis before the given indexes.

⚠️ If you specify the axis, the values must have the correct shape with the array. If not specified, returns a vector array.

x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.append(x, [[10, 11, 12]], axis=0)

OUT:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.insert(x, 0, 9, axis=1)

OUT:
array([[9, 1, 2, 3],
[9, 4, 5, 6],
[9, 7, 8, 9]])
x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.insert(x, 0, [9, 5, 6], axis=1)

OUT:
array([[9, 1, 2, 3],
[5, 4, 5, 6],
[6, 7, 8, 9]])
x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.insert(x, [0, 0, 0], [9, 5, 6], axis=1)

OUT:
array([[9, 5, 6, 1, 2, 3],
[9, 5, 6, 4, 5, 6],
[9, 5, 6, 7, 8, 9]])

IX. np.delete()

This function deletes a specified axis in an array.

x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.delete(x, 2, axis=0)

OUT:
array([[1, 2, 3],
[4, 5, 6]])
x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.delete(x, [1, 2], axis=0)

OUT:
array([[1, 2, 3]])
x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.delete(x, [0, 5, 7])

OUT:
array([2, 3, 4, 5, 7, 9])

X. np.resize()

The resize() changes the shape of arrays, just like the reshape() function we saw earlier. The main difference between them is that if the new shape is larger than the original, the resize() function will fill it with copies of the original array.

x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

np.resize(x, (3, 5))

OUT:
array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 1],
[2, 3, 4, 5, 6]])

XI. np.unique()

This function finds the unique values in the array.

x = np.array([[6, 1, 6],
[6, 1, 6],
[7, 4, 1]])
np.unique(x)

OUT:
array([1, 4, 6, 7])
x = np.array([[6, 1, 6],
[6, 1, 6],
[7, 4, 1]])
np.unique(x, axis=0)

OUT:
array([[6, 1, 6],
[7, 4, 1]])

Conclusion

I have tried to cover NumPy in this blog both theoretically and practical examples. You may also want to read the A Complete Guide to NumPy — Part 2 to better understand NumPy.

Thank you for reading !!!

--

--

Melih GULUM

Data Enthusiast | Deep Learning | Machine Learning | CV | NLP