Loss Optimization in Scientific Python
The process of training on data in order to find optimized parameters is a regular occurrence in Machine Learning. When you work it standard loss functions like MRSE, it is easy to find standard packages to train your model. However, when implementing (or writing) new algorithms, you might have to deal with your own loss functions yourself.
Scientific Python provides a number of optimization routines in the scipy.optimize package. In this article, we will implement two of those routines. We will make use of the Boston Housing Prediction data available from Kaggle. You can get it here. Let’s start by importing the needed packages.
import pandas as pd
import numpy as np
from scipy.optimize import fmin, minimize
Let’s fetch, load, and extract our numpy arrays.
train_df = pd.read_csv('./train.csv', index_col='ID')y = train_df['medv'].values
y = y.reshape(-1, 1)train_df['constant'] = 1
columns = ['constant', 'crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'black', 'lstat']
x = train_df[columns].values
We will be implementing simple linear regression y = w.x, so let’s instantiate an array w with an initial value of 0. We need to pay attention to our dimensions, so no static values for our dimensions.
w = np.zeros([x.shape[1], 1])
Next, let’s create our linear regression function.
def pred(x, w):
return np.dot(x, w)
We can then make predictions by calling this function.
y_pred = pred(x, w)
To train our model and optimize w, we need a loss function. Let’s define that next.
def loss(_w):
p = pred(x, _w)
e = y - p
se = np.power(e, 2)
rse = np.sqrt(np.sum(se))
rmse = rse / y.shape[0]
return rmse
fmin
The first optimization routine we will use is fmin. This uses a Nelder-Mead simplex algorithm to find the minimum of our function. It does not require our function to be differentiable. It however, does not generalize well to high-dimensional data. Let’s use this. We need to specify the number of iterations so this algorithm can find good values. Without it, the default number of steps for your data might provide
min = fmin(loss, w, maxiter=1000)
The function call returns the optimized values of w. We may then proceed to compare the values of our new predictions against our initial predictions when w was all 0, and our ground truth.
y_min = pred(x, min)
out = pd.DataFrame({'y': y[:,0], 'y_pred': y_pred[:,0], 'y_min': pred(x, min)})
out.head(n=15)
minimize
When dealing with high-dimensional data, or loss functions with a first-order or second-order derivative, we can make use of minimize(). It is a lot of things, so it is best to approach it carefully and gradually.
Let’s start by implementing the same solution as fmin().
nms = minimize(loss, w, method='nelder-mead')
The first thing you might notice is that we now have a method parameter. To replicate fmin(), we specify ‘nelder-mead’. Secondly, we now have more data in the result of the call to minimize(). In this case, we are interested in the x parameter. Let’s proceed to make a prediction.
out_2 = pd.DataFrame({'y': y[:,0], 'y_pred': y_pred[:,0], 'y_min': pred(x, nms.x)})
out_2.head()
It is possible to define another function, which is the derivative of our loss, and then use that in the call to minimize(). We will leave that for another post.
For now, we have all of the above code in a gist below.
Until next time.