[Day-3]NumPy Array-Essential Functions(1)

Advait Joshi
10 min readApr 13, 2024

--

Hello everyone! Today I delved into 9 numpy library functions. These library functions help create an array, add/sort an array, and concatenate an array. While reading about sorting functions, I stumbled upon some advanced sorting algorithms used in various codes (For Example Quick sort, median, heap sort, merge sort, etc,) apart from the standard algorithms like bubble sort, insertion sort, etc. I will write about the various advanced sorting algorithms in another Blog Post.

The functions that we discuss today are:

  1. np.array()
  2. np.zeros()
  3. np.ones()
  4. np.empty()
  5. np.arange()
  6. np.linspace()
  7. dtype
  8. np.sort()
  9. np.concatenate()

Let us Begin…

np.array() :

To create a NumPy array, you can use the function np.array().

import numpy as np
a = np.array([1, 2, 3])

A simple visualization of this code is:

Be aware that these visualizations are meant to simplify ideas and give you a basic understanding of NumPy concepts and mechanics. Arrays and array operations are much more complicated than are captured here!

np.zeros() :

Besides creating an array from a sequence of elements, you can easily create an array filled with 0’s:

np.zeros(2)
#Output:
array([0., 0.])

np.ones() :

an array filled with 1’s:

np.ones(2)
#Output:
array([1., 1.])

np.empty() :

The function empty creates an array whose initial content is random and depends on the state of the memory. The reason to use empty over zeros (or something similar) is speed - make sure to fill every element afterward!

# Create an empty array with 2 elements
np.empty(2)
#Output
array([3.14, 42. ]) # may vary

np.arange() :

Creates an array with a range of elements:

np.arange(4)
#Output:
array([0, 1, 2, 3])

We can also create an array that contains a range of evenly spaced intervals. To do this, you will specify the first number, the last number, and the step size.

np.arange(2, 9, 2)
#Output:
array([2, 4, 6, 8])

np.linspace() :

np.linspace() is also used to create an array with values that are spaced linearly in a specified interval:

np.linspace(0, 10, num=5)#Output:
array([ 0. , 2.5, 5. , 7.5, 10. ])

A bit simpler breakdown of this code is:

The function np.linspace() from the NumPy library creates an array of evenly spaced numbers over a specified interval. Here's how the arguments are interpreted:

  • The first argument, 0, represents the starting point of the interval.
  • The second argument, 10, represents the ending point of the interval.
  • The num parameter specifies the number of samples or elements you want in the array.

So, np.linspace(0, 10, num=5) creates an array with 5 elements evenly spaced between 0 and 10. The function divides the interval [0, 10] into 5 equal parts and returns these points as elements in the array. Therefore, the output is [0. , 2.5, 5. , 7.5, 10.]. Each number in the output array represents a point evenly distributed along the interval from 0 to 10.

dtype :

This is used for specifying your data type. While the default data type is floating point (np.float64), you can explicitly specify which data type you want using the dtype keyword.

x = np.ones(2, dtype=np.int64)
x
#Output:
array([1, 1])

np.sort() :

Sorting an element is simple with np.sort(). You can specify the axis, kind, and order when you call the function.

arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])

You can quickly sort the numbers in ascending order with:

np.sort(arr)
#Output:
array([1, 2, 3, 4, 5, 6, 7, 8])

In addition to sort, which returns a sorted copy of an array, you can use:

  • argsort, which is an indirect sort along a specified axis,
  • lexsort, which is an indirect stable sort on multiple keys,
  • searchsorted, which will find elements in a sorted array, and
  • partition, which is a partial sort.

Here, I delved a bit deeper and understood the syntax of these functions.

np.argsort():

numpy.argsort() is a function in the NumPy library that returns the indices that would sort an array.

Here’s a breakdown of how it works:

Parameters:

  • a: The input array to be sorted.
  • axis (optional): Specifies the axis along which to sort. If not provided, the array is flattened before sorting.
  • kind (optional): Specifies the sorting algorithm. Possible values are 'quicksort', 'mergesort', 'heapsort', 'stable'. The default is 'quicksort'.

Return Value:

  • An array of indices that would sort the input array along the specified axis.

Explanation:

  • numpy.argsort() sorts the elements of an array and returns the indices of the sorted elements rather than the sorted elements themselves. This is useful when you want to sort an array but also need to track the original positions of the elements.

Example:

import numpy as np

arr = np.array([3, 1, 2])
indices = np.argsort(arr)
print(indices)
# Output:
[1 2 0]

In this example, arr is [3, 1, 2]. After applying numpy.argsort(), it returns [1, 2, 0], indicating that if you sort the elements of arr, the element at index 1 (1) comes first, followed by the element at index 2 (2), and finally, the element at index 0 (3).

A 2-dimensional array example:

import numpy as np

arr = np.array([[3, 1, 2],
[4, 5, 6]])

indices = np.argsort(arr, axis=1)
print(indices)
#Output:
[[1 2 0]
[0 1 2]]

In this example, we have a 2D array arr with shape (2, 3). When we apply np.argsort() along axis=1, it sorts each row independently.

  • For the first row [3, 1, 2], after sorting, the indices become [1, 2, 0]. This means that the smallest element in the first row is at index 1, followed by index 2, and then index 0.
  • For the second row [4, 5, 6], after sorting, the indices become [0, 1, 2]. This means that the smallest element in the second row is at index 0, followed by index 1, and then index 2.

So, the output indices is a 2D array where each row contains the sorted indices of the corresponding row in the original array.

numpy.lexsort() :

numpy.lexsort() is a function in the NumPy library that performs an indirect stable sort using a sequence of keys.

Here's a detailed explanation:

Parameters:

  • keys: A sequence of arrays (or tuples) that specify the keys to sort by. The last key array (or tuple) is the primary key.

Return Value:

  • An array of indices that would sort the input array based on the sequence of keys.

Explanation:

  • numpy.lexsort() is useful for sorting multiple arrays based on a sequence of keys. It sorts the arrays in a lexicographical (dictionary) order. It sorts the elements based on the last key first, then the second-last key, and so on, until the first key.

Example:

import numpy as np

a = [1, 5, 1, 4, 3]
b = [9, 4, 0, 4, 0]
indices = np.lexsort((b, a))
print(indices)
# Output:
[2 0 4 3 1]

In this example, we have two arrays a and b. We want to sort a first, and if elements in a are equal, we use b to break the tie. The output indices indicates the sorted order of elements based on the lexicographical order defined by (b, a). So, the resulting sorted order is [2, 0, 4, 3, 1], which corresponds to the indices of the sorted elements.

In the context of numpy.lexsort(), when we say "we want to sort a first, and if elements in a are equal, we use b to break the tie," it means that:

  1. The primary sorting key is the array a.
  2. If there are elements in a that are equal, then the secondary sorting key, which is b, is used to determine the final order.

Let’s take an example:

a = [1, 5, 1, 4, 3]
b = [9, 4, 0, 4, 0]

When we sort these arrays using numpy.lexsort((b, a)), here's how it works:

  1. First, the function considers the last key, which is a. It sorts the elements of a in ascending order.
  • After sorting a, it becomes [1, 1, 3, 4, 5].

2. Then, it considers the second-last key, which is b. It rearranges the elements within each group formed by sorting on the last key (a).

  • Within the groups where elements in a are equal (i.e., at indices 0 and 2), it sorts based on the corresponding values of b.
  • After sorting b within the groups formed by sorting on a, it remains [9, 0, 0, 4, 4].

So, the final sorted order is determined by combining the sorting results of a and b. The indices of the sorted elements are [2, 0, 4, 3, 1]. This means that the array [1, 5, 1, 4, 3] sorted in the lexicographical order defined by (b, a) results in [1, 1, 3, 4, 5] with corresponding indices [2, 0, 4, 3, 1].

A simpler analogy to understand this function:

Imagine you have a set of cards, each with two attributes: “Color” (denoted by array a) and "Number" (denoted by array b).

  • You want to first sort the cards by their “Color”.
  • If two cards have the same “Color”, you then look at their “Number” to determine the final order.

Let’s take a concrete example:

Suppose you have the following cards:

Card 1: Color = Red, Number = 3
Card 2: Color = Blue, Number = 5
Card 3: Color = Red, Number = 1
Card 4: Color = Green, Number = 4
Card 5: Color = Red, Number = 2

You want to sort them using numpy.lexsort((b, a)).

  1. First, you sort by “Color” (a). This groups the cards by color:
  • Red: Card 1, Card 3, Card 5
  • Blue: Card 2
  • Green: Card 4

2. Within each group of cards with the same “Color”, you then sort by “Number” (b).

  • For the Red cards, you sort them by their “Number”: Card 3 (Number 1), Card 5 (Number 2), Card 1 (Number 3).
  • For the Blue cards, there’s only one, so no sorting is needed.
  • For the Green card, again, no sorting is needed since there’s only one card.

3. The final sorted order is:

  • Card 3 (Red, Number 1)
  • Card 5 (Red, Number 2)
  • Card 1 (Red, Number 3)
  • Card 4 (Green, Number 4)
  • Card 2 (Blue, Number 5)

So, numpy.lexsort() helps you sort based on multiple criteria, like sorting a list of cards first by color and then by number within each color group.

np.searchsorted() :

numpy.searchsorted() is a function in the NumPy library that finds the indices where elements should be inserted to maintain order in an array. This function assumes that the input array is sorted.

Here’s a detailed explanation:

Parameters:

  • a: The sorted input array.
  • v: The value or array of values to be searched for.
  • side (optional): Specifies whether to return the index of the first occurrence of the value ('left') or the index just after the last occurrence ('right'). The default is 'left'.

Return Value:

  • An array of indices where the values would be inserted into the sorted array to maintain order.

Explanation:

  • numpy.searchsorted() performs a binary search to find the indices where the values in v can be inserted into the sorted array a while maintaining the sorted order. It returns the indices such that inserting v[i] into a at index indices[i] will keep a sorted.

Example:

import numpy as np

a = np.array([1, 2, 4, 5, 6])
v = np.array([0, 3, 5, 7])
indices = np.searchsorted(a, v)
print(indices)
# Output:
[0 2 3 5]
  • In this example, a is [1, 2, 4, 5, 6], and v is [0, 3, 5, 7]. The function np.searchsorted() finds the positions where the values in v can be inserted into a while maintaining order. So, the output indices indicates the positions [0, 2, 3, 5] where the corresponding values in v would be inserted into a.

numpy.partition() :

numpy.partition() is a function in the NumPy library that performs an indirect partition along the given axis. It rearranges the elements in such a way that the values smaller than the k-th element are moved before this element, while the remaining elements are moved after it. The elements in each partition are not sorted.

Here’s a detailed explanation:

Parameters:

  • a: The array to be partitioned.
  • kth: The index of the element around which the partitioning is done. All smaller elements move before this element, and all larger elements move after it. If the value is an integer k, the partition will be at index k. If it is a sequence of integers, the partition will be at the indices specified by the sequence.
  • axis (optional): Specifies the axis along which to perform the partition. The default is -1, which means the last axis.
  • kind (optional): Specifies the sorting algorithm used to find the k-th element. Possible values are 'introselect' and 'median'. The default is 'introselect'.

Return Value:

  • An array with the same shape as a, where the elements along the specified axis are partitioned around the k-th element.

Explanation:

  • numpy.partition() rearranges the elements in the input array a in such a way that the elements smaller than the k-th element are moved before it, and the elements larger than or equal to the k-th element are moved after it. The order of elements within each partition is arbitrary. This function is useful when you want to find the smallest or largest k elements in an array without fully sorting it.

Example:

import numpy as np

arr = np.array([3, 4, 2, 1, 5])
partitioned = np.partition(arr, 2)
print(partitioned) # Output: [2 1 3 4 5]

In this example, arr is [3, 4, 2, 1, 5]. When we perform np.partition(arr, 2), it partitions the array around the element at index 2 (which is 2). After partitioning, the elements smaller than 2 are moved before it, and the elements larger than or equal to 2 are moved after it. The resulting array is [2, 1, 3, 4, 5].

The last function for the day:

np.concatenate() :

If you start with these arrays:

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

You can concatenate them with np.concatenate() :

np.concatenate((a, b))
#Output:
array([1, 2, 3, 4, 5, 6, 7, 8])

Or, if you start with these arrays:

x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])

You can concatenate them with:

np.concatenate((x, y), axis=0)
#Output:
array([[1, 2],
[3, 4],
[5, 6]])

This is all that I learned about the functions of numpy arrays. Tomorrow I will be learning more about a NumPyArray and various NumPy functions used to play around with arrays. I will document everything on my medium posts for a better understanding. Stay tuned for the next blog.

All The Best People☺️.

Follow me on Linkedin:

www.linkedin.com/in/advaitszone

To read my previous blog “[Day-2]NumPy~Arrays.”:

https://medium.com/@advaitszone/day-2-numpy-arrays-9300848f5f7a

--

--