Photo by Justin Lim on Unsplash

Maximizing the Power of NumPy: Advanced Array Operations for Data Science

Nishitha Kalathil

--

Do you ever feel like you’re stuck in a data science rut? Like you’ve been using the same old array operations for ages and need to spice things up? Well, fear not, dear data scientist, for NumPy is here to save the day! With its advanced array functions and mystical powers, NumPy can take your data to places you never thought possible. So grab a cup of coffee (or potion), put on your thinking cap (or wizard hat), and get ready to dive into the wild world of NumPy!

Welcome back to “The Numpy Nerd: A Guide to Mastering Arrays and Awkward Social Interactions” tutorial series! If you’re anything like me, you’ve spent countless hours obsessing over the perfect array operations in NumPy, while your friends and family slowly back away from you at social gatherings. But fear not, my fellow Numpy Nerd! In this second part of our tutorial series, we’re going to delve even deeper into the wonderful world of NumPy and explore some truly advanced array operations for data science.

Focus!!!!!!!!!!

Shape Manipulation

Iterating through an array

Stacking, Splitting

Changing Dimensions

Sorting

Shape Manipulation

Are you tired of your arrays having the same old boring shape? Do you want to give them a new and exciting form? Well, fear not my fellow data nerds, because NumPy has got you covered! Here, we will explore the wonderful world of shape manipulation in NumPy. From stretching and squeezing to flattening and reshaping, we will learn how to bend and twist our arrays to our will. So grab your favorite shaping tool and let’s get started!

ndarray.reshape(a, newshape)

This will gives a new shape to an array without changing its data. The new shape should have the same number of elements as the original array. If the new shape is not compatible with the original array, a ValueError is raised.

a = np.arange(6)    # [0 1 2 3 4 5]
b = a.reshape(2, 3) # [[0 1 2]
[3 4 5]]

In the above example, we created a 1D array arr and then reshaped it to a 2D array new_arr with 2 rows and 3 columns using the reshape method.

ndarray.resize(a, newshape)

The resize method changes the shape of an ndarray in place. If the new shape has more elements than the original array, the new elements are filled with zeros. If the new shape has fewer elements than the original array, the extra elements are discarded.

arr = np.array([1, 2, 3, 4, 5, 6])
arr.resize((2, 3))
print(arr)
# output
[[1 2 3]
[4 5 6]]
arr = np.array([1, 2, 3, 4, 5, 6])
arr.resize((3, 4))
print(arr)
# output
[[1 2 3 4]
[5 6 0 0]
[0 0 0 0]]

ndarray.transpose(*axes) :

This method returns a view of the input array with its axes transposed.

a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> a.transpose()
array([[1, 3],
[2, 4]])
>>> a.transpose((1, 0))
array([[1, 3],
[2, 4]])
>>> a.transpose(1, 0)
array([[1, 3],
[2, 4]])

ndarray.T :

View of the transposed array. Same as self.transpose().

a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> a.T
array([[1, 3],
[2, 4]])

ndarray.flatten :

This method returns a 1D array representing the input array. The returned array is a copy of the input array, and any changes made to the returned array will not affect the original array.

a = np.array([[1,2], [3,4]])
>>> a.flatten()
array([1, 2, 3, 4])

ndarray.ravel :

This method returns a 1D array representing the input array. Unlike flatten(), ravel() returns a view of the input array whenever possible, meaning that any changes made to the returned array will affect the original array.

x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.ravel(x)
array([1, 2, 3, 4, 5, 6])

Iterating through an array

Are you tired of staring at arrays all day long, wondering what secrets they hold? Do you dream of a world where you can effortlessly loop through an array and extract its precious data? Well, my friend, it’s time to wake up and smell the numpy! Here, we’ll show you how to iterate through an array like a boss, so you can unlock the full potential of your data science skills. Get ready to dive deep into the world of numpy and unleash the power of iteration!

Image from ActiveColab.com

Iterating through an array allows you to access each element of the array in turn. In numpy, you can use a for loop to iterate through the elements of an array.

arr = np.array([1, 2, 3, 4, 5])
# Iterating through the array to print each element
for element in arr:
print(element)

# Output:
# 1
# 2
# 3
# 4
# 5
# Iterating through 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Iterating through the array
for i in range(len(arr)):
for j in range(len(arr[i])):
print(arr[i][j])
# Iterating through 3D array
arr = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
# Iterating through the array
for i in range(len(arr)):
for j in range(len(arr[i])):
for k in range(len(arr[i][j])):
print(arr[i][j][k])

numpy.ndenumerate

a = np.array([[1, 2], [3, 4]])
>>> for index, x in np.ndenumerate(a):
... print(index, x)
(0, 0) 1
(0, 1) 2
(1, 0) 3
(1, 1) 4

numpy.ndindex(*shape)

for index in np.ndindex(3, 2, 1):
... print(index)
(0, 0, 0)
(0, 1, 0)
(1, 0, 0)
(1, 1, 0)
(2, 0, 0)
(2, 1, 0)

numpy.flatiter

A flatiter iterator can not be constructed directly from Python code by calling the flatiter constructor.

x = np.arange(6).reshape(2, 3)
>>> fl = x.flat
>>> type(fl)
<class 'numpy.flatiter'>
>>> for item in fl:
... print(item)
...
0
1
2
3
4
5

numpy.iterable

Check whether or not an object can be iterated over.

from collections.abc import Iterable
>>> a = np.array(1.0) # 0-dimensional numpy array
>>> isinstance(a, Iterable)
True
>>> np.iterable(a)
False

Stacking arrays

Are you tired of juggling multiple arrays like a circus performer? Do you wish you could magically combine them into one cohesive unit? Fear not, dear data scientist! The power of stacking arrays in NumPy is here to save the day. No more juggling, no more dropping the ball. Let’s dive into the magical world of stacking arrays and unlock the full potential of NumPy!

Stacking arrays is a common operation in NumPy that allows you to combine two or more arrays into a single array.

The ability to stack arrays in NumPy is a powerful tool for manipulating and combining data in scientific computing and data analysis.

With hstack(), vstack(), and dstack(), you can combine your arrays in any way your heart desires.

The hstack() function is used to horizontally stack arrays. It takes a sequence of arrays as input and returns a single array where the input arrays are stacked horizontally (i.e., along the columns). For example, given two arrays a and b, the expression np.hstack((a, b)) would return a single array where the columns of a and b are stacked horizontally.

import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = np.hstack((a, b))

print(c)

#Output
array([[1, 2, 5, 6],
[3, 4, 7, 8]])

The vstack() function is used to vertically stack arrays. It takes a sequence of arrays as input and returns a single array where the input arrays are stacked vertically (i.e., along the rows). For example, given two arrays a and b, the expression np.vstack((a, b)) would return a single array where the rows of a and b are stacked vertically.

c = np.vstack((a, b))
print(c)

#output
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])

The dstack() function is used to stack arrays along a third dimension. It takes a sequence of arrays as input and returns a single array where the input arrays are stacked along a new axis. For example, given two arrays a and b, the expression np.dstack((a, b)) would return a single array where the input arrays are stacked along a new third dimension.

c = np.dstack((a, b))
print(c)

#output
array([[[1, 5],
[2, 6]],

[[3, 7],
[4, 8]]])

In addition to these functions, NumPy also provides other functions for stacking arrays, such as stack(), column_stack(), and concatenate(). These functions offer more advanced stacking capabilities and can be useful in more complex scenarios.

The stack() function is used to stack arrays along a new axis. Here's an example:

import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = np.stack((a, b), axis=0)

print(c)

#output
array([[[1, 2],
[3, 4]],

[[5, 6],
[7, 8]]])

The column_stack() function is used to stack arrays as columns. Here's an example:

c = np.column_stack((a, b))
print(c)

# output
array([[1, 4],
[2, 5],
[3, 6]])

The concatenate() function is used to concatenate arrays along a specified axis. Here's an example:

c = np.concatenate((a, b), axis=1)
print(c)

#output
array([[1, 2, 5, 6],
[3, 4, 7, 8]])

Splitting array

Have you ever felt like your array is just too big to handle? Don’t worry, we’ve all been there. That’s where NumPy’s array splitting magic comes in. With just a few lines of code, you can chop that mammoth-sized array into smaller, more manageable pieces. And who doesn’t love a good chop, am I right?

In NumPy, there are several ways to split a large array into smaller ones. This can be useful for tasks like data preprocessing or model training, where you may want to work with smaller subsets of a larger dataset.

Here are some of the functions and methods available in NumPy for splitting arrays:

numpy.split():

This function takes an array and splits it into multiple sub-arrays along a specified axis. The syntax is as follows: numpy.split(array, indices_or_sections, axis=0). The indices_or_sections argument can be either an integer (indicating the number of equal-length sub-arrays to create), or a list of indices indicating where to split the array. The axis argument specifies which axis to split the array along.

arr = np.array([1, 2, 3, 4, 5, 6])
subarrays = np.split(arr, 3)
print(subarrays)

#output
[array([1, 2]), array([3, 4]), array([5, 6])]

numpy.array_split():

This function is similar to numpy.split(), but allows for splitting an array into non-equal-length sub-arrays. The syntax is the same as numpy.split().

arr = np.array([1, 2, 3, 4, 5, 6])
subarrays = np.array_split(arr, 4)
print(subarrays)

# output
[array([1, 2]), array([3, 4]), array([5]), array([6])]

numpy.hsplit():

This method splits an array horizontally (along the columns). The syntax is as follows: numpy.hsplit(array, indices_or_sections).

arr = np.array([[1, 2, 3], [4, 5, 6]])
subarrays = np.hsplit(arr, 3)
print(subarrays)

#output
[array([[1],
[4]]),
array([[2],
[5]]),
array([[3],
[6]])]

numpy.vsplit():

This method splits an array vertically (along the rows). The syntax is as follows: numpy.vsplit(array, indices_or_sections).

arr = np.array([[1, 2, 3], [4, 5, 6]])
subarrays = np.vsplit(arr, 2)
print(subarrays)

#Output
[array([[1, 2, 3]]), array([[4, 5, 6]])]

Changing Dimensions

With NumPy, you can manipulate arrays with the dexterity of a master juggler, seamlessly shuffling dimensions and splitting arrays with the precision of a Rubik’s cube champion. Let’s dive deeper into the art of changing dimensions in NumPy and unleash the full potential of your data science skills.

np.newaxis:

This function is used to increase the dimension of an existing array by one more dimension, and it is often used to convert a 1D array into a 2D array. It can be placed in a slice operation to add a new axis at the specified position

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
print(arr.shape) # (5,)

arr_new = arr[:, np.newaxis]
print(arr_new.shape) # (5, 1)

In the above example, we add a new axis to a 1D array arr at position 1 by using np.newaxis. This converts the 1D array into a 2D array.

np.expand_dims:

This function is similar to np.newaxis, but it can be used to insert a new axis at any specified position. It takes two arguments: the array to modify, and the position where the new axis should be inserted.

import numpy as np

arr = np.array([[1, 2, 3],
[4, 5, 6]])

arr_new = np.expand_dims(arr, axis=1)
print(arr_new.shape) # (2, 1, 3)

In the above example, we add a new axis at position 1 by using np.expand_dims. This converts the 2D array into a 3D array.

Both np.newaxis and np.expand_dims provide a convenient way to reshape arrays by adding new dimensions, which is useful in a variety of applications, including deep learning and computer vision.

Sorting arrays

Ah, the beauty of order! As nerdy data scientists, we take great pleasure in sorting and organizing our arrays. Luckily, NumPy provides an array of functions (pun intended) to help us do just that. From quicksort to mergesort, we’ll explore the various sorting algorithms and their applications in data science. Let’s dive in and sort things out!

np.sort(): This function returns a sorted copy of the input array. It does not modify the original array.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) 
sorted_arr = np.sort(arr)
print(sorted_arr)

#output
[1 1 2 3 3 4 5 5 6 9]

ndarray.sort(): This function sorts the input array in place, modifying the original array.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) 
arr.sort()
print(arr)

#output
[1 1 2 3 3 4 5 5 6 9]

np.argsort(): This function returns the indices that would sort an array.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) 
sorted_indices = np.argsort(arr)
print(sorted_indices)

#output
[1 3 6 0 9 2 4 8 7 5]

np.lexsort(): This function performs an indirect sort on multiple keys. It sorts the last key first, then the second-to-last key, and so on.

names = np.array(['Alice', 'Bob', 'Charlie', 'David', 'Emily']) 
ages = np.array([25, 19, 37, 31, 24])
sorted_indices = np.lexsort((ages, names))
print(sorted_indices)

#output
[0 1 2 3 4]

np.partition(): This function returns a partitioned copy of the input array, with the k-th element in its final position.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) 
partitioned_arr = np.partition(arr, 5)
print(partitioned_arr)

#output
[1 3 2 1 3 4 5 5 6 9]

np.argpartition(): This function returns the indices that would partition an array.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) 
partitioned_indices = np.argpartition(arr, 5)
print(partitioned_indices)

#output
[1 9 6 3 0 2 8 4 7 5]

ndarray.argsort(): This function sorts the indices of the input array in place.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) 
indices = arr.argsort()
sorted_arr = arr[indices]
print(sorted_arr) # [1 1 2 3 3 4 5 5 6 9]

#output
[1 1 2 3 3 4 5 5 6 9]

ndarray.searchsorted(): This function performs a binary search on the input array to find the indices where the specified values should be inserted to maintain order.

arr = np.array([1, 3, 4, 4, 6]) 
indices = arr.searchsorted([0, 2, 5])
print(indices)

#output
[0 1 4]

Overall, NumPy provides a rich set of functions for sorting arrays that can handle a variety of use cases. By understanding these functions and how to use them, data scientists can efficiently manipulate and sort arrays in their scientific computing work.

In the next section, we will discuss basic array operations, broadcasting, arithmetic operations and so on. Stay tuned!!!!

The notes are prepared from Numpy Official Website

--

--