12 NumPy Methods For Beginners: A Complete Roadmap To Be A Data Scientist

Datasthan
9 min readAug 21, 2022

--

Last week I wrote a blog on Pandas, “12 Pandas Methods To Master: A Complete Roadmap To Be A Data Scientist”.. and the response I got was just mind blowing.. never thought I could help this much people.

And so, here’s my third blog…

So, we’re done with Python and Pandas, what’s next?

Obviously it’s gonna be NumPy. But..

Why NumPy?

So in data science what is the thing we deal with?

Data, right?

Yes, and how do we store the data?

Array? List? Dictionary?

Right.. Now think here we’re dealing with tons and tons of data and the data where we’re storing is super slow. What is gonna be the experience?

Terrible.

Absolutely, and that is why we use NumPy. In data science NumPy is the fundamental package to perform high-level mathematical computations on multi-dimensional arrays.

So we store the data in NumPy arrays, and use NumPy to perform those computations.

But why is NumPy faster?

It is because the NumPy package integrates with C/C++ and these languages take way less time to execute than a normal Python array, also NumPy uses much less memory to store data. Thus the huge time difference, and thus more efficient.

Now that it’s clear why NumPy, let’s import numpy as np in your life.

Today we’ll be talking about 12 NumPy methods for beginners.

“Beginners” because the usage of NumPy is so much that I’ve to write another blog just to cover the advanced part of NumPy. We’ll discuss them in a later part when we’ll be getting into actual machine learning stuff. For now, I think this will suffice.

Creating your own array:

In NumPy there are a lot of ways to create your own array to play with.

# creating an 1D array a = np.array([1, 2, 3])print(a)
-> [1 2 3]

This .array() method is used to create an array of any dimension you want. Now, in NumPy, dimensions are called axes. So 1D array will have 1 axis, 2D array will have 2 axes and so on.

# creating a 2D arraya = np.array([[1, 2, 3], [4, 5, 6]])print(a)
-> [[1 2 3], [4 5 6]]

In this way, we can create n-dimensional arrays in NumPy.

# creating an array of zerosa = np.zeros(2)print(a)
-> [0. 0.]

Rather than putting any integers, we can just create an array of zeros, with this .zeros method.

# creating an array of onesa = np.ones(2)print(a)
-> [1. 1.]

Also we can go for an array of only 1, with .ones().

# creating an arranged arraya = np.arange(4)print(a)
-> [0 1 2 3]

This .arage() method is used to create an array by giving a range of elements. There are 4 inputs we can give into this method: start, stop, step and dtype.

# creating an arranged arraya = np.arange(2, 9, 2, 'int')
b = np.arange(2, 9, 2, 'float')
print(a)
print(b)
-> [2 4 6 8]
-> [2. 4. 6. 8.]

Thus we can use .arange() to create an array. However in NumPy there is one more method which almost works exactly like .arange(), and that is .linspace().

# creating an array using linspacea = np.linspace(0, 10, num=5)print(a)
-> [ 0. 2.5 5. 7.5 10. ]

Here in .linspace() also we can give multiple inputs like .arange().

But as a beginner, you should be clear what is the difference between this .arange() and .linspace()… as they almost work the same, then why keep both?

np.arange() returns values within a range which has a space between values.

a = np.arange(0, 10, 5)print(a)
-> [0 5]

While, the np.linspace() return a set of samples within a given interval.

a = np.linspace(0, 10, 5)print(a)
-> [ 0. 2.5 5. 7.5 10. ]

We can see that linspace returns 5 equally spaced values within the interval and arange returns 2 values with a step of 5 within the range 0 and 10.

.ndim , .size , .shape:

These methods we use when we have an array in which we want to know what is the dimension of that array,the total number of elements, and the shape of it.

# using .ndim , .size , .shapea = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9])print(a)
print(a.ndim)
print(a.size)
print(a.shape)
-> [[1 2 3]
[4 5 6]
[7 8 9]] # 3D array
-> 2
-> 9
-> (3, 3)

The .ndim is used to get the dimension of the array, the .size is used to find the total number of elements in the array and .shape is used to get the shape of the array( here it is a 3x3 array, which means 3 rows and 3 columns ).

reshape():

Sometimes we want to reshape an array, with .reshape() we can do that.

# reshaping an arraya = np.arange(6)
print(a)
-> [0 1 2 3 4 5]
b = a.reshape(3, 2)
print(b)
-> [[0 1]
[2 3]
[4 5]]

Thus we can use the .reshape() method to change the shape of an array.

One thing to remember here is that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array. If you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements.

sort():

Sorting as we know is one of the most used techniques in our coding life.

# sorting an array in ascending ordera = np.array([2, 1, 5, 3, 7, 4, 6, 8])

print(np.sort(a))
-> [1 2 3 4 5 6 7 8]

That’s how we sort an array in ascending order. We can also sort the array in descending order.

# sorting an array in descending ordera = np.array([2, 1, 5, 3, 7, 4, 6, 8])

asc = np.sort(a)
print(asc)
desc = np.flip(asc)
print(desc)
-> [1 2 3 4 5 6 7 8]
-> [8 7 6 5 4 3 2 1]

We can use the .flip() to reverse the order of the array.

More on sorting, here.

concatenate():

Sometimes we need to add multiple arrays to perform some calculations and that’s why we’ve concatenate() in NumPy.

# concatenating two arraysa = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
print(np.concatenate((a, b)))
-> [1 2 3 4 5 6 7 8]
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])
print(np.concatenate((x, y), axis=0))
-> [[1 2]
[3 4]
[5 6]]

Indexing and Slicing:

This is another important point why we’re using NumPy arrays instead of Python lists. On a Python list we cannot perform operations on all the elements of two lists directly. For example, we cannot multiply two lists directly, we will have to do it element-wise. This is where the role of NumPy comes into play.

# using index values to get some elements from the arraya = np.array([1, 2, 3, 4, 5, 6, 7, 8])
arr = a[np.array([1, 3, -3])]
print(arr)
-> [2 4 6]

The thing is in any array, every element has its own index value, and it starts with 0. So the first element in an array have 0th index, second element have the 1st index and so on.

Also in reverse order we can do that, but here it starts with -1. So for the last element( 8 ) the index is -1, the second last( 7 ) has -2 and so on.

Thus here in the index we passed 3 inputs, giving us the element of 1st index, 3rd index and -3rd index.. and based on that we got our output.

# using basic slicinga = np.arange(20)
print(a)
-> [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
print(a[10:]) # first
-> [10 11 12 13 14 15 16 17 18 19]
print(a[-8:18:2]) # second
-> [12 14 16]

In case of slicing, we don’t pass the index number( not in every case though ), here we pass the actual element from which we want to start. So in slicing we can pass 3 inputs: Start, End, Steps. Well in the case of End, it is actually ( End-1 ). So it’ll always stop 1 step before the end.

In the above array( a ), we tried to get ( a[10:] ) in first example. Here as we’ve left empty the second position, we get the complete array starting from 10.
Whereas in the second example, we gave ( a[-8:18:1] ). Here we gave -8 as starting, which means we gave the -8th index( instead of the actual number ), which is 12 here. Next, we gave 18 as the end element, thus we got 17 in the output. At last we mentioned 2 as our steps( by default steps = 1 ). So as we gave 2 here, we’ll be getting every second element from element 12 to element 18.

More on indexing and slicing, here.

Creating an array from an existing array:

You can easily create a new array from an existing array.

# creating a new array from an existing onea = np.array([1, 2, 3, 4, 5, 6, 7, 8])
arr = a[2:6]
print(arr)
-> [3 4 5 6]

Here, we grabbed a section of your array from index position 2 through index position 6.

You can also stack two existing arrays vertically and horizontally.

# creating a new array from an existing onea1 = np.array([[1, 1], [2, 2]])
a2 = np.array([[3, 3], [4, 4]])
print(np.vstack((a1, a2))) # vertically
print(np.hstack((a1, a2))) # horizontally
-> [[1 1]
[2 2]
[3 3]
[4 4]]
-> [[1 1 3 3]
[2 2 4 4]]

So we can stack two or more arrays like this using .vstack() for vertical stacking and .hstack() for horizontal stacking.

where():

In NumPy we can search within an array for a certain value, and in return we will get the index of that particular element we searched for.

# searching through an arrayarr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)
-> (array([3, 5, 6]),)

Thus we can use the .where() method to search for any element/elements in an array.

# searching even numbers through an arrayarr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
print(x)
-> (array([1, 3, 5, 7]),)

unique():

If we want to find all the unique values from an array, we can go for .unique().

# searching unique elements from an arrayarr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [1, 2, 3, 4]])
unique_values = np.unique(arr)
print(unique_values)
-> [ 1 2 3 4 5 6 7 8 9 10 11 12]

We can also get the unique rows or columns, by using the axis argument.

axis=0 for unique rows and axis=1 for unique columns.

# searching unique elements from an arrayarr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [1, 2, 3, 4]])
unique_rows = np.unique(arr, axis=0)
print(unique_values)
-> [[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]

transpose():

Any array, be it an 1D array or 2D or any dimensional array, we can call it a matrix. And many times we need to transpose a matrix. And that’s why we can opt for the .transpose() method.

# transposing an arrayarr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [1, 2, 3, 4]])print(arr.transpose())
-> [[ 1 5 9 1]
[ 2 6 10 2]
[ 3 7 11 3]
[ 4 8 12 4]]
print(arr.T)
-> [[ 1 5 9 1]
[ 2 6 10 2]
[ 3 7 11 3]
[ 4 8 12 4]]

So to transpose a matrix, we can either use .transpose() or .T. Both works the same.

Basic mathematical functionalities:

In NumPy arrays we can perform some basic mathematical functionalities like plus, minus, multiplication, division.

# adding two arraysarr1 = np.array([1, 2])
arr2 = np.ones(2, dtype=int)
print(arr1 + arr2)
-> [2 3]
# subtracting two arrays
print(arr1 - arr2)
-> [0 1]
# multiplication of two arrays
print(arr1 * arr2)
-> [1 2]
# division of two arrays
print(arr1 / arr2)
-> [1. 2.]

Most useful array operations:

These are some of the most used methods in NumPy.

# finding minimumarr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(arr.min())
-> 1
# finding maximum
print(arr.max())
-> 8
# finding mean
print(arr.mean())
-> 4.5
# finding variance
print(arr.var())
-> 5.25
# finding standard deviation
print(arr.std())
-> 2.29128784747792
# sum of all the elements within an array
print(arr.sum())
-> 36
# multiplication of all the elements within an array
print(arr.prod())
-> 40320

Once again, the use of NumPy in data science is like nothing. So please learn and play with this library as much as you can. For reference, you can check out here.

Hope this helped. I’m going to share more of my journey, which will be totally beginner-friendly. Check out my other blogs on Medium or we can connect on LinkedIn.

--

--