Humpty dumpty Numpy. The Basics
Hello All!
So today I’m going to talk about Numpy. What is Numpy? It sounds kinda stumpy. Just kidding. Numpy is a pretty cool python library that I’ve used in my work but never really spent the time learning the basics and understanding the pros & cons. Its a great resource to actually learn before Pandas even though they carry the same concepts. I’ve also posted some super basic follow along examples in my github: Numpy Exercises. I’ve also just been mainly referencing along everyone’s go to book “Python for Data Analysis” by O’reilly. This wonderful Numpy cheatsheet is also pretty much a must have.
What is Numpy?
Numpy (Numerical Python) is a fantastic library in python that lets you perform fast operations on arrays. It is a core library that is used in many other libraries such as Pandas. It is a bit more low level, but I have used it for data transformations. Fundamental to Numpy is this object called “Ndarray”. Which stands for N-Dimensional array. Which means pretty much what its called. You can create an array of infinite dimensions (an array of arrays!). And this object contains a lot of cool attributes and functions. I think of ndarrays similar to a python list. But if python lists were a Toyota Corolla, ndarrays would be like a Lambourgini (not to shame on Corollas, I had one and loved it). One of the primary great things about ndarrays versus a regular python list is that it utilizes what they call “Vectorization” while performing operations on the array. How I think of this is that instead of writing something like a for loop, the same operation just gets applied to all elements in one go. I will be showing some examples of this later on in this article.
The Basics
Every array object has a shape attribute that describes the dimensions called .shape. For example a shape of (2,4) I think of this almost as rows and columns. 2 rows (two arrays) with 4 columns (values). It would look something like this [[1,2,3,4],[5,6,7,8]]. Every array object also tells you what data type it is using .dtype.
Creating an Ndarray
There are two primary ways to create an ndarray object.
- You can pass regular python arrays into an ndarray object and it will initialize for you using the array() method.
import numpy as npdata = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data)
arr1output: array([[1, 2, 3, 4],
[5, 6, 7, 8]])
2. Other functions such as empty() which fills it in with random values, zeros() which fills it with 1’s & 0's, can create filler values for you if you specify the dimensions.
arr2 = np.empty((2,2))
arr2 output: array([[ 1.49166815e-154, -3.11108916e+231],
[ 1.49166815e-154, 2.82471801e-309]])
You can set your own dtype when initializing the ndarray by passing in the argument dtype = np.float64 when initializing the array or convert it afterwards using .astype(float).
Pro tip: It tries to infer which data type it is if you declare it which may or may not be what you want. And all elements must be the same type in the array object or else it might flip out.
Vectorization & Broadcasting
As mentioned before, vectorization lets you express batch operations on data without writing for loops. The same function can be applied to all elements at once. Broadcasting lets you combine two array objects together through some operation. It will try to combine them element wise in order. This is also why the data type for all the elements in an numpy array needs to be the same!
data = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data)arr1 * arr1
output: array([[ 1, 4, 9, 16],
[25, 36, 49, 64]])arr1 * 2
output: array([[ 2, 4, 6, 8],
[10, 12, 14, 16]])
Indexing & Slicing
You may ask now, how do I access each value in my fancy array if I wanted to?
Single Dimension Array
For a one dimensional array it is like a python list. Indexing starts from 0. The key difference is that these indexes or slices are “views” and not “copied” data unless you store it into another variable.
## to access the first valuedata = [1,2,3,4]
arr1 = np.array(data)
arr1[0]output: 1
Multiple Dimension Array
For multiple dimension array, you can use comma separated lists of indices to select individual elements.
## to access the first array, second value, both methods workdata = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data)arr2[0][2]
output: 3arr2[0,2]
output: 3
Slicing
Similar to python list slicing, you can retrieve a view of a range of the array. The same pattern of how you index multiple dimension arrays you can slice multiple dimension arrays with commas mixed with colons.
data = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data)#looks at the first dimension, grabs up to the first arrayarr1[:1]
output: array([[1, 2, 3, 4]])#looks at the first dimension, grabs up to the second valuearr1[1,:2] #
output: array([5, 6])
And Lastly, Reshaping & Transposing
Reshaping
Phew, one last basic. Almost done! You can change existing ndarray into a differently shaped ndarray by using the method called .reshape.
# reshaping this (2,2) ndarray, two arrays with 4 values each, into a (4,2) ndarray, 4 arrays with 2 values each.data = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data)
arr1.reshape((4,2))output: array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Pro Tip: Reshaping will throw error if the array doesn’t fit into the new shape.
Pro Tip 2: You can actually chain these methods into just one line
data = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data).reshape((4,2))
Transposing
Arrays with have the Transpose method and the special .T attribute. In a two dimensional array for example, I think of this as switching the “rows” into “columns”.
data = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data).Toutput:array([[1, 5],
[2, 6],
[3, 7],
[4, 8]])
This is useful for matrix computations by using the built in numpy dot method. If you need a refresher on how the dot product works, here is a good resource: Math is fun
data = [[1,2,3,4],[5,6,7,8]]
arr1 = np.array(data)
np.dot(arr.T, arr)output: array([[26, 32, 38, 44],
[32, 40, 48, 56],
[38, 48, 58, 68],
[44, 56, 68, 80]])
Conclusion
Hurrah! You have made it to the end. What next? A TON of practice and context. Sometimes I just find myself plugging and chugging because I just want to get it done. But to really master it, take the time to understand what you are doing and why you are doing it. A good example of numpy in the wild is when you import a csv file, you can basically create a giant 2-dimensional ndarray, and apply fast/iterable transformations on your data. Another example is when you want to generate a labeled dataset to feed a model using packages from scikit-learn. These are just the super basics written in as little words as possible. There are so many great resources and examples online. The cool things you can do with numpy are pretty endless. I didn’t go as depth in all the operations you can do but you can always reference the numpy manual on scipy.org. Until next time!