Subtracting NumPy arrays of different shapes efficiently
The idea is to simply extend the dimensionality. There is a way to subtract a shape (n,3) array w
X
so that each row is subtracted from the whole array without explicitly using a loop.
The last axis would be the rows.(5, 2, 3)
Let's say you have a large dataset with numerical values and you wanted to remove all 0 values look at another example array([[0.,0.,0.]])
The code below actually achieves that.
The cool thing about NumPy is reshaping
You can find the minimum value within each column by specifying axis=0
.With a three-column array, you will get four values as your result.
Comparison operators syntax in NumPy follows a similar syntax with R. VBA, DAX etc.
We remember from reshaping that indexing can be used to Transpose arrays
Business Case
We can imagine we work for a financial institution and you have been asked to find unique customers using python. I am excited you like your new role.
Your first code looks like below.
Practically you have done two things. The first thing is that you are reproducible. This is good. It means your work can actually scale across multiple systems.
NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.
Numpy is the python Ecosystem?
Love Coding, Keep coding.
use isin numpy.select condition
Live Coding
import pandas as pd
import numpy as np
import io
dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print df['Value'].values
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
ts = pd.Series(df['Value'].values, index=df['Date'])pd.Series( [i for i in range(20)], pd.date_range('2016-01-02', periods=20, freq='D'))
References