Python Loop Replacement: Handling Conditional Logic (PyTorch & NumPy)

9 min readDec 21, 2023

Not your SQL Select clause — Using Where and Select to vectorize conditional logic

In a previous article, “Accelerate Numerical Calculations in NumPy With Intel oneAPI Math Kernel Library”, I address why NumPy constructs, powered by Intel oneAPI, can achieve outsized performance and code readability and code maintainability advantages over a typical “roll your won” Python code segment.

In this article I want to demonstrate how you can vectorize a loop even though it contains tricky conditional logic.

Summarized: Here are the steps you shoudl take.

Find Large loop iteration loop

2. If it contains conditional logic, consider pytorch.where, np.where or np.select

3. else try to find a NumPy or PyTorch replacement using UFuncs, aggregations, etc

Numpy.where & PyTorch.where

Image: Where statement is for more simple logic conditions — Powerpoint by author

One thing that could prevent us from effectively getting vector performance when converting a loop to a vector approach is when the original loop has if then else statements in it — called conditional logic

The Numpy & PyTorch Where allows us to tackle conditional loops in a fast vectorized way

Apply conditional logic to an array to create a new column orupdate contents of an existing column

Syntax:

numpy.where(condition, [x, y, ]/)
Return elements chosen from x or y depending on condition.

To understand what NumPy where does, look at the simple example below See a simple example below to add 50 to all elements currently greater than 5

a = np.arange(10)
np.where(a > 5, a + 50, a )
# if a > 5 then return a + 50
# else return a

output:
array([ 0,  1,  2,  3,  4,  5, 56, 57, 58, 59])

This could come n handy for many AI applications, but let’s choose labeling data

There may be better wyas to binarize data but here is a simple example of converting conrinuous data into categorical values

arr = np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])

Let’s say all values 10 and above represent a medical parameter threshold that indicates further testing, while values below 10 indicate normal range

We might like to print the values as words such as [‘More Testing’, ‘Normal’, ‘More Testing’, ‘More Testing’, …]

arr = np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
np.where(arr < 10, 'Normal', 'More Testing')

output:
array(['More Testing', 'Normal', 'More Testing', 'More Testing',
       'More Testing', 'Normal', 'Normal', 'More Testing'], dtype='<U12')

Or we could binarize data for use in a classifier:


# Simple Numpy Binarizer Discretizer
# convert continous data to discrete integer bins
arr = np.array([11, 1.2, 12, 13, 14, 7.3, 5.4, 12.5])
print(np.where(arr < 6, 0, np.where(arr < 12, 1, 2)))

output:
[1 0 2 2 2 1 0 2]

NumPy where can be used to create index masks so that you can select or update masked items from an array.

In this example, I want to find all(rows, cols) of (all **multiples of **12 or all multiples of 9) in a 10x10 multiplication table and make all other values 0. Preserve the first row and first column as readable indexes for the table as follows:

## one solution - preserves the indexing edges for easy checking
res = 0
re((MultiplicationTable%12 == 0) | (MultiplicationTable%9 == 0), MultiplicationTable, 0)
res[0,:] = MultiplicationTable[0,:]
res[:,0] = MultiplicationTable[:,0]
res

output:
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  0,  0,  0,  0, 12,  0,  0, 18,  0],
       [ 3,  0,  9, 12,  0, 18,  0, 24, 27,  0],
       [ 4,  0, 12,  0,  0, 24,  0,  0, 36,  0],
       [ 5,  0,  0,  0,  0,  0,  0,  0, 45,  0],
       [ 6, 12, 18, 24,  0, 36,  0, 48, 54, 60],
       [ 7,  0,  0,  0,  0,  0,  0,  0, 63,  0],
       [ 8,  0, 24,  0,  0, 48,  0,  0, 72,  0],
       [ 9, 18, 27, 36, 45, 54, 63, 72, 81, 90],
       [10,  0,  0,  0,  0, 60,  0,  0, 90,  0]])

Numpy Where applied to California Housing data

In AI context, this could be applying categorical classifier to otherwise continuous values. For example, California housing dataset the target price variable is continuous. in the following fictitious scenario, A new stimulus package is considered whereby new house buyers will be given a coupon worth 50,000 off toward purchase of houses in California whose price (prior to coupon) is between 250,0000 and 350,000. Other prices will be unaffected. Generate array with the adjusted targets

Lets compare the naive Python loop versus the NumPy Where clause — examine for readability, maintainability, speed, etc

# Ficticious scenario:
from sklearn.datasets import fetch_california_housing

california_housing = fetch_california_housing(as_frame=True)
X = california_housing.data.to_numpy()
buyerPriceRangeLo = 250_000/100_000
buyerPriceRangeHi= 350_000/100_000
T = california_housing.target.to_numpy() 
t1 = time.time()
timing = {}
New = np.empty_like(T)
for i in range(len(T)):
    if ( (T[i] < buyerPriceRangeHi) & (T[i] >= buyerPriceRangeLo) ):
        New[i] = T[i] - 50_000/100_000
    else:
        New[i] = T[i]
t2 = time.time()
plt.title( "California Housing Dataset - conditional Logic Applied")
plt.scatter(T, New, color = 'b')
plt.grid()
print("time elapsed: ", t2-t1)
timing['Loop'] = t2-t1

Chart 1: Naive loop implementation of house price coupon

Next, examine the NumPy Where approach:

t1 = time.time()
#############################################################################
### Exercise: Addone moddify code below to compute same results as above loop
New = np.where((T < buyerPriceRangeHi) & (T >= buyerPriceRangeLo), T - 50_000/100_000, T )
##############################################################################
t2 = time.time()

plt.scatter(T, New, color = 'r')
plt.grid()
print("time elapsed: ", t2-t1)
timing['np.where'] = t2-t1
print("Speedup: {:4.1f}X".format( timing['Loop']/timing['np.where']))

Chart 1: NumPy Where implementation of house price coupon

Code more readable, more maintainable, faster 13X

As you can see, we generated the same data with NumPy where as we did with the original loop but we did so 13X faster (the speedup amount may vary a bit).

NumPy Select statement:

Image: Select statement is for more simple logic conditions — Powerpoint by author

The select statement is available in NumPy , but not yet in PyTorch, although I will provide a code snippet which can emulate the select clause using PyTorch. The select statement handle much more complex logic conditions as compared to the where statement.

Apply conditional logic to an array to create a new column orupdate contents of an existing column. This method handles more complex conditional sceanrios than numpy where.

Syntax:

[numpy.select(condlist, choicelist, default=0)[source]
Return an array drawn from elements in choicelist, depending on conditions.

function return an array drawn from elements in choicelist, depending on conditions.

This is very useful function for handing conditionals that otherwise slow down and map or apply, or else add complexity in reading the code

First we will create some new data

import numpy as np
import time

BIG = 10_000_000

np.random.seed(2022)
A = np.random.randint(0, 11, size=(BIG, 6)

Now I will apply some crazy logic to updating various columns of the array

timing = {}
t1 = time.time()
for i in range(BIG):
    if A[i,4] == 10:
        A[i,5] =  A[i,2] * A[i,3]
    elif (A[i,4] < 10) and (A[i,4] >=5):
        A[i,5] =   A[i,2] + A[i,3]
    elif A[i,4] < 5:
        A[i,5] =   A[i,0] + A[i,1]
t2 = time.time()
baseTime = t2- t1
print(A[:5,:])
print("time: ", baseTime)
timing['Naive Loop'] = t2 - t1

output:
[[ 0  1  1  0  7  1]
 [ 2  8  0  5  9  5]
 [ 3  8  0  3  6  3]
 [ 0 10 10  1  2 10]
 [ 5  7  5  1  7  6]]
time:  5.937685012817383

Try Vectorizing with masks

Just remove the references to i and remove the loop, create mask for each condition

# Try Vectorizing simply
t1 = time.time()
mask1 = A[:,4] == 10
A[mask1,5] =  A[mask1,2] * A[mask1,3]
mask2 = ((A[:,4].any() < 10) and (A[:,4].any() >=5))
A[mask2,5] =   A[mask2,2] + A[mask2,3]
mask3 = A[:,4].any() < 5
A[mask3,5] =   A[mask3,0] + A[mask3,1]
t2 = time.time()
print(A[:5,:])
print("time :", t2-t1)

fastest_time = t2-t1
Speedup = baseTime / fastest_time
print("Speed up: {:4.0f} X".format(Speedup))
timing['Vector Masks'] = t2 - t1

output:
[[ 0  1  1  0  7  1]
 [ 2  8  0  5  9  5]
 [ 3  8  0  3  6  3]
 [ 0 10 10  1  2 10]
 [ 5  7  5  1  7  6]]
time : 0.23482632637023926
Speed up:   25 X

Next, try NumPy.select — Much cleaner logic

Put condition inside a list of tuples, put execution choice inside a list of tuples, and choose a default action

condition = [ (A[:,4]  < 10) & (A[:,4] >= 5),
              ( A[:,4] < 5)]
choice = [ (A[:,2] + A[:,3]), 
           (A[:,0] + A[:,1] ) ]
default = [(A[:,2] * A[:,3])]
A[:,5] = np.select(condition, choice, default= default )

output:
[[ 0  1  1  0  7  1]
 [ 2  8  0  5  9  5]
 [ 3  8  0  3  6  3]
 [ 0 10 10  1  2 10]
 [ 5  7  5  1  7  6]]
time : 0.4723508358001709
Speed up:   13 X

plt.figure(figsize=(10,6))
plt.title("Time taken to process {:,} records in seconds".format(BIG),fontsize=12)
plt.ylabel("Time in seconds",fontsize=12)
plt.xlabel("Various types of operations",fontsize=14)
plt.grid(True)
plt.xticks(rotation=-60)
plt.bar(x = list(timing.keys()), height= list(timing.values()), align='center',tick_label=list(timing.keys()))

Plot: generated by author on Xeon system specified and code provided 08_05_NumPy_Where_Select.ipynb

Note: 13X speedup over Naive Python loop when using this NumPy.select in this simple example.

Next article — I will use the NumPy select to drastically speedup a Pandas Apply statement that is plagued with lots of conditional logic.

Play with these concepts on the Intel Developer Cloud:

Take the opportunity to play with replacing loop bound aggregations in your own code with NumPy aggregation functions instead.

For a sandbox to play in — register for a free account on the Intel Developer Cloud (cloud.intel.com), sign in and play by clicking on the icon in the lower left:

Then Launch JupyterLab on the shared access node in the icon on the right — see below:

Image: Screenshot from cloud.intel.com

Code

The code for this article and the rest of the series is located on github. For this article experiment with the file: 08_05_NumPy_Where_Select.ipynb

Article 1:

Accelerate Numerical Calculations in NumPy With Intel oneAPI Math Kernel Library. Explore the reasons why replacing inefficient Python loops with NumPy or PyTorch constructs is a great idea.

Article 2:

Python Loop Replacement: NumPy Optimizations Simple Stuff — ND array creation using NumPy, PyTorch, DPCTL. Explore simple ways of creating , converting and transforming Lists into NumPy NDarrays — a very basic getting started.

Article 3:

Introduction to NumPy* Universal functions (ufuncs). How I learned to stop worrying and let smart developers help me.

Article 4:

Replacing Python loops: Aggregations and Reductions. How to replace slow python loops by strategic function equivalents for aggregating data.

Article 5:

Replacing Python loops: Fancy Slicing and Broadcasting. Here I address fancy slicing and broadcasting to take advantage of key optimizations for loop replacement.

Article 6: (current article)

Python Loop Replacement: PyTorch & NumPy Optimizations. Not your SQL Select clause — Using Where and Select to vectorize conditional logic.

Article 7:

Loop Replacement Strategies: Applications to Pandas Apply. Examine how to accelerate Pandas Apply statement containing conditional logic.

Article 8:

NumPy Functions Composed. Compare Fast Inverse Square Root Method to NumPy ufuncs, Numba JIT, and Cython — Which One Wins?

Intel Developer Cloud System Configuration as tested:

x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0–223
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8480+
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 2
Stepping: 8
CPU max MHz: 3800.0000
CPU min MHz: 800.0000