Getting Started with NumPy Arrays
Welcome to the first part of “The Numpy Nerd: A Guide to Mastering Arrays and Awkward Social Interactions” tutorial series! In this part, I’ll introduce you to NumPy and cover the basics of using the library for scientific computing in Python.
NumPy, short for “Numerical Python,” is a popular library for scientific computing in Python. It provides powerful data structures, such as arrays and matrices, and a wide range of operations for manipulating them. NumPy is particularly useful for numerical computations, data analysis, and machine learning applications. Topics covered in this story are the following:
Installing and importing Numpy
What are arrays
Difference between a Python list and a NumPy array
Creating Numpy arrays
Array Dimensions
Attributes of an array
Indexing and Slicing arrays
Adding and removing elements
So let’s begin…………………………………..
Installing and importing NumPy
To get started with NumPy, you’ll first need to install the library. NumPy can be installed using pip, the Python package manager or Anaconda. Open up your terminal or command prompt and type the following command:
pip install numpy
if you are using Jupyter Notebook/ Google colab, you have to add a “!” at the begning.
!pip install numpy
If you are using Anaconda, use the following code
conda install numpy
Once NumPy is installed, you can import it into your Python script or interactive session using the import
statement:
import numpy as np
In the above statement, we’ve imported NumPy and given it the alias “np”. This is a common convention used in the Python community when working with NumPy.
Now that we have NumPy installed and imported, we can start creating and manipulating arrays.
What are arrays ?
An array is a collection of elements, all of the same type, arranged in a specific order. Arrays are like a set of boxes that you can put things in. And NumPy arrays are like a set of boxes that you can put things in, but you can also do really cool math stuff with them, like adding or multiplying all the things in the boxes together. It’s like having a math-powered toy chest!
Need a more interesting explanation?
Arrays are like a bunch of friends who you can count on to help you with something. And NumPy arrays are like a bunch of super-smart friends who not only help you, but also do all the math for you, so you can focus on the fun stuff, like building a giant tower of friends (or data)!
Difference between a Python list and a NumPy array
- Data type: In Python, a list can contain elements of different data types, such as integers, strings, and floats. In contrast, a NumPy array can only contain elements of the same data type, which is typically a numerical data type such as int or float.
- Memory usage: NumPy arrays are more memory-efficient than Python lists because they are implemented in C and can take advantage of low-level optimizations. In addition, NumPy arrays can be processed more quickly by CPUs and GPUs, which makes them ideal for scientific computing and data analysis.
- Mathematical operations: NumPy arrays are designed for numerical operations and come with a built-in library of mathematical functions that can operate on entire arrays at once, which is known as vectorization. In contrast, Python lists require looping over each element to perform mathematical operations, which can be slow for large datasets.
- Size and dimensions: Python lists can be of any size and can contain nested lists, which can have varying sizes. NumPy arrays, on the other hand, are typically used for large datasets with a fixed size and a fixed number of dimensions.
Overall, NumPy arrays are more efficient and easier to work with for numerical operations, while Python lists are more flexible and can contain elements of different data types and sizes.
Creating Numpy arrays
Welcome to the magical world of Numpy arrays! It’s like playing with a set of Legos, but instead of building a spaceship, you’re constructing mathematical models that would make Einstein proud. So let’s put on our thinking caps and get ready to create arrays that are so powerful, they could solve a Rubik’s Cube in under 30 seconds!
Creating NumPy arrays is easy and can be done using a variety of methods, including manually specifying the values of the array, using NumPy functions to create arrays with specific properties, and reading in data from external sources. Once you have created a NumPy array, you can use NumPy’s extensive suite of array operations to manipulate, analyze, and visualize your data in a variety of ways.
np.array()
In NumPy, arrays are created using the np.array()
function. Here's an example:
import numpy as np
# Create a one-dimensional array
>>> arr1d = np.array([1, 2, 3, 4, 5])
>>> print(arr1d)
[1 2 3 4 5]
# Create a two-dimensional array
>>> arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
>>> print(arr2d)
[[1 2 3]
[4 5 6]
[7 8 9]]
In the above code, we’ve created a one-dimensional array arr1d
containing the numbers 1 to 5, and a two-dimensional array arr2d
containing the numbers 1 to 9 arranged in a 3x3 grid. We can access individual elements of an array using indexing and slicing, just like we would with a list in Python.
np.zeros()
Creating arrays with zeros
# Creating a 1D array of zeros
>>> a = np.zeros(3)
>>> print(a)
[0. 0. 0.]
# Creating a 2D array of zeros
>>> b = np.zeros((3, 3))
>>> print(b)
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
np.ones()
Creating arrays with ones(1)
# Creating a 1D array of ones
>>> a = np.ones(2)
>>> print(a)
[1. 1.]
# Creating a 2D array of ones
>>> b = np.ones((2, 4))
>>> print(b)
[[1. 1. 1. 1.]
[1. 1. 1. 1.]]
np.eye()
np.eye(n, m)
defines a 2D identity matrix. The elements where i=j (row index and column index are equal) are 1 and the rest are 0, as such:
>>> np.eye(3)
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> np.eye(3, 5)
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.]])
np.arange()
numpy.arange
creates arrays with regularly incrementing values.
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(2, 10, dtype=float)
array([2., 3., 4., 5., 6., 7., 8., 9.])
>>> np.arange(2, 3, 0.1)
array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
np.linespace()
numpy.linspace
will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values.
>>> np.linspace(1., 4., 6)
array([1. , 1.6, 2.2, 2.8, 3.4, 4. ])
Array Dimensions
Why did the NumPy array feel depressed? Because it had too many dimensions to deal with.
You might occasionally hear an array referred to as a “ndarray,” which is shorthand for “N-dimensional array.” An N-dimensional array is simply an array with any number of dimensions. You might also hear 1-D, or one-dimensional array, 2-D, or two-dimensional array, and so on .In NumPy, arrays can have any number of dimensions, from 0 (a scalar) to N (a multi-dimensional array).
Vectors : One-dimensional arrays
Matrices : Two-dimensional arrays
Tensors : 3-D or higher dimensional arrays
Axis
In NumPy, the term “axis” refers to the dimensions of an ndarray. The number of axes of an ndarray is referred to as its “ndim” attribute. For example, a 1D array has one axis, a 2D array has two axes, and so on.
The axes of an ndarray can be visualized as follows:
- For a 1D array, the single axis is typically represented by a horizontal line.
- For a 2D array, the first axis is the vertical axis (rows) and the second axis is the horizontal axis (columns).
- For a 3D array, the first axis is the depth axis, the second axis is the vertical axis (rows), and the third axis is the horizontal axis (columns).
When performing operations on ndarrays, it’s important to understand which axis or axes are being operated on. Many NumPy functions and methods have an “axis” parameter that allows you to specify the axis or axes to operate on.
For example, consider the following 2D array. If we want to calculate the sum of each row, we would set the axis parameter to 1:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]
row_sums = arr.sum(axis=1)
print(row_sums) # Output: [ 6 15 24]
If we want to calculate the sum of each column, we would set the axis parameter to 0:
col_sums = arr.sum(axis=0)
print(col_sums) # Output: [12 15 18]
Attributes of an array
In numpy, an array has several attributes that provide information about its properties. Here are some of the most commonly used attributes:
- ndarray.ndim will tell you the number of axes, or dimensions, of the array.
arr = np.array([1, 2, 3])
print(arr.ndim) # Output: 1
arr = np.array([[1, 2, 3],[4,5,6]])
print(arr.ndim) # Output: 2
- ndarray.size will tell you the total number of elements of the array. This is the product of the elements of the array’s shape.
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr.size) # Output: 9
- ndarray.shape will display a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 3 columns, the shape of your array is (2, 3).
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr.shape) # Output: (3, 3)
- ndarray.dtype This attribute gives the data type of the elements in the array.
arr = np.array([1, 2, 3])
print(arr.dtype) # Output: int64
Specifying your data type
- While the default data type is floating point (np.float64)
- you can explicitly specify which data type you want using the “dtype” keyword.
>>> x = np.ones(5)
>>> print(type(x[1]))
<class 'numpy.float64'>
>>> x = np.ones(5, dtype=np.int32)
>>> print(type(x[1]))
<class 'numpy.int32'>
- ndarray.itemsize : This attribute gives the size of each element in the array in bytes.
arr = np.array([1, 2, 3])
print(arr.itemsize) # Output: 8
- ndarray.data : This attribute gives the buffer containing the actual elements of the array.
arr = np.array([1, 2, 3])
print(arr.data) # Output: <memory at 0x7fcb7380bc70>
Indexing and Slicing arrays
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
Indexing allows you to select individual elements from an array by specifying their position within the array. In numpy, indexing works similarly to indexing in Python lists. You can use square brackets to access elements of an array, with the index starting from 0.
arr = np.array([1, 2, 3, 4, 5])
# Accessing the first element of the array
print(arr[0]) # Output: 1
# Accessing the third element of the array
print(arr[2]) # Output: 3
Indexing with 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Accessing the element at row 1, column 2
print(arr[1, 2]) # Output: 6
Indexing with 3D arrays
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
# Accessing the element at depth 1, row 0, column 1
print(arr[1, 0, 1]) # Output: 6
Slicing
Slicing allows you to access a subset of an array by specifying a range of indices. In numpy, you can use the colon (:) operator to specify a range of indices.
arr = np.array([1, 2, 3, 4, 5])
# Slicing 1D array to get the first three elements
print(arr[:3]) # Output: [1 2 3]
# Slicing the array to get the last two elements
print(arr[-2:]) # Output: [4 5]
# Slicing a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d[:2, 1:]) # Output: [[2 3] [5 6]]
# Slicing a 3D array
arr_3d = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
print(arr_3d[:, :2, 1:]) # Output: [[[2], [4], [6]], [[8], [10], [12]]]
data = np.array([1, 2, 3])
data[1]
data[0]
data[0:2]
data[1:]
data[-2:]
Adding and removing elements
In NumPy, you can add and remove elements from arrays in several ways. Here are some of the most common methods:
Adding Elements:
numpy.append(arr, values, axis=None)
- adds one or more elements to the end of the array. Thearr
parameter specifies the array to which the elements are added,values
is the array or scalar to be appended, andaxis
specifies the axis along which the values should be appended. Ifaxis
is not specified, the array is flattened before the append operation.numpy.insert(arr, obj, values, axis=None)
- inserts one or more elements into the array at a specified position. Thearr
parameter specifies the array into which the elements are inserted,obj
is the index or indices at which the values should be inserted,values
is the array or scalar to be inserted, andaxis
specifies the axis along which the values should be inserted. Ifaxis
is not specified, the array is flattened before the insert operation.
import numpy as np
a = np.array([1, 2, 3])
b = np.append(a, [4, 5, 6])
# Output: array([1, 2, 3, 4, 5, 6])
c = np.insert(a, 1, [4, 5, 6])
# Output: array([1, 4, 5, 6, 2, 3])
Removing Elements:
numpy.delete(arr, obj, axis=None)
- removes one or more elements from the array. Thearr
parameter specifies the array from which the elements are removed,obj
is the index or indices of the values to be removed, andaxis
specifies the axis along which the values should be removed. Ifaxis
is not specified, the array is flattened before the delete operation.
a = np.array([1, 2, 3, 4, 5])
b = np.delete(a, [2, 3])
# Output: array([1, 2, 5])
In the next part, [Maximizing the Power of NumPy: Advanced Array Operations for Data Science] we’ll dive deeper into NumPy’s powerful Shape Manipulation, Iterating through an array, Stacking, Splitting, Changing Dimensions and Sorting Stay tuned!!!!
Social Interaction Tip: If you’re struggling to start a conversation with someone, try asking them if they’re a Python or a Java person. Then you can talk about programming languages and maybe even collaborate on a project.
The notes are prepared from Numpy Official Website