Mastering NumPy: Part 5(Advanced Features)

9 min readMay 24, 2023

This is the 6th blog post in our comprehensive series (Access all posts of this series here) on NumPy!

Note: Don’t forget to bookmark the full 50 examples series — Mastering NumPy: A 5 Parts Series to Master Numpy

Introduction

NumPy is not just limited to basic array creation and manipulation. It offers a wide range of advanced features and functionalities that cater to the diverse needs of data scientists and researchers. In this blog post, we will explore ten examples of advanced features in NumPy, including masked arrays, sorting, interpolation, and more, showcasing the versatility and sophistication of the library.

Examples:

Creating a masked array to handle missing or invalid values

You can use the numpy.ma.masked_array() function in NumPy to create a masked array that can handle missing or invalid values. A masked array is an array that contains a mask indicating which values are valid and which values are invalid or missing. The mask is a Boolean array where True values indicate that the corresponding element is invalid or missing.

Here’s an example of creating a masked array:

import numpy as np

# Create an array with missing values
arr = np.array([1, 2, -999, 4, 5])

# Create a mask for invalid values
mask = arr == -999

# Create a masked array
masked_arr = np.ma.masked_array(arr, mask=mask)
print("Masked array:", masked_arr)
print("Valid values:", masked_arr.data)

# Masked array: [1 2 -- 4 5]
# Valid values: [1 2 4 5]

In the code above, we create a mask using arr == -999, where True indicates that the corresponding element is invalid or missing. We then create a masked array using np.ma.masked_array(arr, mask=mask), passing the original array arr and the mask.

You can use the masked array to perform various operations, such as computations, aggregations, or calculations, while taking into account the missing or invalid values. The mask ensures that the operations are applied only to the valid elements of the array.

2. Sorting an array

To sort an array using numpy.sort(), you can pass the array as an argument to the numpy.sort() function. This function returns a sorted copy of the input array.

Here’s an example of how to use numpy.sort():

import numpy as np

# Create an array
arr = np.array([5, 3, 1, 4, 2])

# Sort the array
sorted_arr = np.sort(arr)

# Print the sorted array
print(sorted_arr)

# [1 2 3 4 5]

Note that numpy.sort() returns a sorted copy of the array, leaving the original array unchanged. If you want to sort the array in place (i.e., modify the original array), you can use the sort() method of the array itself:

import numpy as np

# Create an array
arr = np.array([5, 3, 1, 4, 2])

# Sort the array in place
arr.sort()

# Print the sorted array
print(arr)

# [1 2 3 4 5]

In this case, we use the sort() method of the array arr directly, which sorts the array in place without creating a new copy.

3. Finding unique elements and their counts in an array

To find the unique elements and their counts in an array using numpy.unique()

import numpy as np

arr = np.array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])
unique_elements, counts = np.unique(arr, return_counts=True)

In this code, unique_elements will contain the unique elements from the array arr, and counts will contain the corresponding counts for each unique element. The return_counts=True argument in np.unique() is used to also obtain the counts.

4. Applying element-wise functions to an array

The numpy.vectorize() function allows you to apply element-wise functions to arrays. It's a convenient way to "vectorize" functions that are not originally designed to operate on arrays. However, it's important to note that numpy.vectorize() does not provide any performance improvements over plain Python functions. It simply applies the function to each element in the array sequentially.

Here’s an example to demonstrate how to use numpy.vectorize():

import numpy as np

# Define a function that operates on a single element
def square(x):
    return x ** 2

# Create an input array
arr = np.array([1, 2, 3, 4, 5])

# Use numpy.vectorize() to vectorize the square function
vec_square = np.vectorize(square)

# Apply the vectorized function to the array
result = vec_square(arr)

print(result)

In this example, we defined a simple square() function that squares a single value. We then created an input array arr containing [1, 2, 3, 4, 5]. By using np.vectorize(square), we created a vectorized version of the square() function called vec_square. Finally, we applied vec_square to the input array arr, which squared each element individually and returned the result.

Keep in mind that numpy.vectorize() is a convenience function and may not provide performance gains for complex computations.

5. Using the numpy.where() function to perform conditional operations on arrays

The numpy.where() function is a powerful tool in NumPy that allows you to perform conditional operations on arrays. It provides a compact and efficient way to select elements from one array based on a condition and replace elements from another array.

The general syntax of numpy.where() is:

numpy.where(condition, x, y)

Here’s an example to demonstrate how to use numpy.where():

import numpy as np

# Create an input array
arr = np.array([1, 2, 3, 4, 5])

# Create a condition
condition = arr % 2 == 0

# Create two arrays to choose values from based on the condition
x = np.array([10, 20, 30, 40, 50])
y = np.array([100, 200, 300, 400, 500])

# Use numpy.where() to select elements based on the condition
result = np.where(condition, x, y)

print(result)
# [100  20 300  40 500]

In this example, we have an input array arr containing [1, 2, 3, 4, 5]. We create a condition condition that checks whether each element of arr is even (arr % 2 == 0). We then define two arrays, x and y, with the same shape as arr, representing the values we want to choose from based on the condition. In this case, we select elements from x when the condition is True (even numbers) and elements from y when the condition is False (odd numbers).

By using numpy.where(condition, x, y), we apply the condition to the arrays x and y element-wise and return a new array result with the selected values.

numpy.where() can also be used with scalar values instead of arrays. For example:

import numpy as np

# Create an input array
arr = np.array([1, 2, 3, 4, 5])

# Create a condition
condition = arr % 2 == 0

# Use numpy.where() with scalar values
result = np.where(condition, 10, -10)

print(result)
# [-10  10 -10  10 -10]

In this case, we use scalar values 10 and -10 instead of arrays x and y. When the condition is True (even numbers), the corresponding elements in the result are set to 10, and when the condition is False (odd numbers), the elements are set to -10.

6. Computing the cross product of two arrays

The numpy.cross() function allows you to compute the cross product of two arrays. The cross product is a vector that is perpendicular to the two input vectors and has a magnitude equal to the product of the magnitudes of the input vectors multiplied by the sine of the angle between them.

The general syntax of numpy.cross() is:

numpy.cross(a, b, axisa=-1, axisb=-1, axisc=-1, axis=None)

Here’s an example to demonstrate how to use numpy.cross():

import numpy as np

# Create two input arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Compute the cross product using numpy.cross()
cross_product = np.cross(a, b)

print(cross_product)
# [-3  6 -3]

In this example, we have two input arrays a and b, each containing three elements. By using numpy.cross(a, b), we calculate the cross product of a and b and obtain the resulting vector [-3, 6, -3].

The numpy.cross() function can also operate on multidimensional arrays by specifying the axisa and axisb parameters. These parameters determine the axis along which the cross product is taken. By default, they are set to -1, which means the last axis.

import numpy as np

# Create two multidimensional input arrays
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8, 9], [10, 11, 12]])

# Compute the cross product along the last axis using numpy.cross()
cross_product = np.cross(a, b)

print(cross_product)
# [[-6  12  -6]
 # [-6  12  -6]]

In this case, we have two 2D arrays a and b, each with two rows and three columns. By using numpy.cross(a, b), we calculate the cross product along the last axis and obtain a resulting array with the same shape as the input arrays.

7. Interpolating values within an array

The numpy.interp() function is a powerful tool for interpolating values within an array. It performs linear interpolation to estimate values between two data points.

The basic syntax of numpy.interp() is as follows:

numpy.interp(x, xp, fp, left=None, right=None, period=None)

Let’s break down the parameters:

x: This is the array of x-coordinates at which you want to interpolate values.
xp: This is the array of x-coordinates representing the data points.
fp: This is the array of y-coordinates representing the data points.
left (optional): This is an optional scalar or array-like value specifying the value to return for x values outside the range of xp. If not provided, it defaults to fp[0].
right (optional): This is an optional scalar or array-like value specifying the value to return for x values outside the range of xp. If not provided, it defaults to fp[-1].
period (optional): This is an optional scalar value specifying the length of the periodic interval. If specified, the xp array is considered periodic, which means the interpolation will wrap around at the endpoints. This is useful when working with cyclic data.

Here’s an example to demonstrate how to use numpy.interp():

import numpy as np

# Data points
xp = [1, 3, 4, 7]
fp = [2, 0, 5, 1]

# X-coordinates for interpolation
x = [1.5, 2.5, 4.5, 6]

# Interpolate values
interpolated_values = np.interp(x, xp, fp)

print(interpolated_values)
# [1.  0.5 2.5 3. ]

In this example, we have data points defined by xp and fp. We want to interpolate values at the x-coordinates [1.5, 2.5, 4.5, 6]. The numpy.interp() function estimates the y-coordinates at these x-coordinates using linear interpolation and returns the interpolated values.

Note that numpy.interp() assumes that the xp array is sorted in ascending order. If it's not, you need to sort it before using the function.

8. Reshaping an array while preserving its data using numpy.ravel()

In NumPy, the ravel() function is used to flatten an array, meaning it reshapes a multi-dimensional array into a 1-dimensional array while preserving the original data.

import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Flatten the array using ravel()
flattened = np.ravel(arr)

print(flattened)
# [1 2 3 4 5 6]

In this example, we have a 2D array arr with dimensions 2x3. After applying ravel(), the array is flattened into a 1D array flattened, preserving the original data but reshaping it into a linear sequence.

9. Calculating the correlation coefficient between two arrays using numpy.corrcoef()

The numpy.corrcoef() function is used to calculate the correlation coefficient between two arrays. It returns a correlation matrix representing the correlation between each pair of variables in the input arrays.

import numpy as np

# Example arrays
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

# Calculate correlation coefficient
correlation_matrix = np.corrcoef(x, y)
correlation_coefficient = correlation_matrix[0, 1]

print(correlation_coefficient)

n this example, the arrays x and y are created with some arbitrary values. The numpy.corrcoef() function is then used to calculate the correlation coefficient between x and y. The resulting correlation matrix is a 2x2 matrix, and the correlation coefficient is located at index (0, 1) or (1, 0) since it represents the correlation between x and y.

10. Saving and loading arrays to/from disk using numpy.save() and numpy.load()

Here’s an example that demonstrates the saving and loading of arrays:

import numpy as np

# Save an array
arr = np.array([1, 2, 3, 4, 5])
np.save("my_array.npy", arr)

# Load the array
loaded_arr = np.load("my_array.npy")

print(loaded_arr)

In this example, the array arr is created, saved to disk using numpy.save(), and then loaded back into the loaded_arr variable using numpy.load().

When using numpy.save() and numpy.load(), it's important to note that the saved files will have the .npy extension by default. This format preserves the array data along with its shape and dtype information. If you want to save the array in a different format or compress it, you can explore other options such as numpy.savetxt() or numpy.savez().

Conclusion

In this blog post, we explored some of the advanced features offered by NumPy, going beyond the basics of array creation and manipulation. We covered masked arrays for handling missing or invalid values, sorting and interpolation techniques, vectorization, and more. These advanced features expand the capabilities of NumPy, enabling you to tackle complex data challenges and extract valuable insights.

Note: Don’t forget to bookmark the full NumPy 50 examples series — Mastering NumPy: A 5 Parts Series to Master Numpy

Mastering NumPy: Part 5(Advanced Features)

Introduction

Examples:

Conclusion

Written by Nandeda Narayan