How To Use “Model Stacking” To Improve Machine Learning Predictions

Published in

Geek Culture

5 min readJun 14, 2021

What is Model Stacking?

Model Stacking is a way to improve model predictions by combining the outputs of multiple models and running them through another machine learning model called a meta-learner. It is a popular strategy used to win kaggle competitions, but despite their usefulness they’re rarely talked about in data science articles — which I hope to change.

Essentially a stacked model works by running the output of multiple models through a “meta-learner” (usually a linear regressor/classifier, but can be other models like decision trees). The meta-learner attempts to minimize the weakness and maximize the strengths of every individual model. The result is usually a very robust model that generalizes well on unseen data.

The architecture for a stacked model can be illustrated by the image below:

How to build a stacked model?

Building a stacked model is most easily accomplished by using sklearn’s StackingRegressor/Classifier library. Below I’ll import all the necessary libraries, create a neural network architecture, and then show you how to create the stacked model.

# First import necessary libraries
import pandas as pd
from sklearn.ensemble import StackingRegressor# Decision trees
from catboost import CatBoostRegressor
from xgboost import XGBRegressor# Neural networks
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Add, Input, Dense, Dropout
from tensorflow.keras.layers import BatchNormalization, Embedding
from tensorflow.keras.layers import Flatten, Concatenate
from tensorflow.keras import regularizers
from keras.regularizers import l1
from keras.regularizers import l2from tensorflow.keras import regularizers# Wrapper to make neural network compitable with StackingRegressor
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor# Linear model as meta-learn
from sklearn.linear_model import LinearRegression# Create generic dataset for regression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split# Create regression dataset
X, y = make_regression(n_targets=1, random_state=42)# Convert to pandas
X = pd.DataFrame(X)
y = pd.DataFrame(y)#Rename column
y = y.rename(columns={0: 'target'})# Split into validation set
X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.2,
                                                  random_state=42)

A peek at our training data:

Our target variable:

Okay, now that we’ve got our training dataset defined we can work on building the actual model. For the model we are just going to create a CatBoostRegessor, XGBRegressor, LinearRegression and a few neural networks.

First some code to build a neural network. We’ll use it in the stacking model in a moment. (Understanding the neural network won’t be necessary for the tutorial, but it involves creating skip connections and batch normalizations which can help improve performance on the right data)

def create_neural_network(input_shape=510, depth=10, batch_mod=2, num_neurons=250, drop_rate=0.1, learn_rate=.001,
                      r1_weight=0.02,
                      r2_weight=0.02):
    '''A neural network architecture built using keras functional API'''
    act_reg = l1(r2_weight)
    kern_reg = l1(r1_weight)
    
    inputs = Input(shape=(input_shape,))batch1 = BatchNormalization()(inputs)hidden1 = Dense(num_neurons, activation='relu', kernel_regularizer=kern_reg, activity_regularizer=act_reg)(batch1)
    dropout1 = Dropout(drop_rate)(hidden1)
    hidden2 = Dense(int(num_neurons/2), activation='relu', kernel_regularizer=kern_reg, activity_regularizer=act_reg)(dropout1)
    
    skip_list = [batch1]
    last_layer_in_loop = hidden2
    
    for i in range(depth):
        added_layer = concatenate(skip_list + [last_layer_in_loop])
        skip_list.append(added_layer)b1 = None
        #Apply batch only on every i % N layers
        if i % batch_mod == 2:
            b1 = BatchNormalization()(added_layer)
        else:
            b1 = added_layer
        
        h1 = Dense(num_neurons, activation='relu', kernel_regularizer=kern_reg, activity_regularizer=act_reg)(b1)
        d1 = Dropout(drop_rate)(h1)
        h2 = Dense(int(num_neurons/2), activation='relu', kernel_regularizer=kern_reg, activity_regularizer=act_reg)(d1)
        d2 = Dropout(drop_rate)(h2)
        h3 =  Dense(int(num_neurons/2), activation='relu', kernel_regularizer=kern_reg, activity_regularizer=act_reg)(d2)
        d3 = Dropout(drop_rate)(h3)
        h4 =  Dense(int(num_neurons/2), activation='relu', kernel_regularizer=kern_reg, activity_regularizer=act_reg)(d3)last_layer_in_loop = h4c1 = concatenate(skip_list + [last_layer_in_loop])
    output = Dense(1, activation='sigmoid')(c1)
    
    model = Model(inputs=inputs, outputs=output)optimizer = Adam()
    optimizer.learning_rate = learn_rate
    
    model.compile(optimizer=optimizer,
                  loss='mse',
                  metrics=['accuracy'])return model

Now some code to build the stacking model:

def get_stacking(input_shape=None):
    '''A stacking model that consists of CatBoostRegressor,
    XGBRegressor, a linear model, and some neural networks'''
    # First we create a list called "level0", which consists of our base models"
    # These models will get passed down to the meta-learner later
    level0 = list()level0.append(('cat', CatBoostRegressor(verbose=False)))
    level0.append(('cat2', CatBoostRegressor(verbose=False, learning_rate=.0001)))
    level0.append(('xgb', XGBRegressor()))
    level0.append(('xgb2', XGBRegressor(max_depth=5, learning_rate=.0001)))
    level0.append(('linear', LinearRegression()))#Create 5 neural networks using our function above
    for i in range(5):
        # Wrap our neural network in a Keras Regressor to make it
        #compatible with StackingRegressor
        keras_reg = KerasRegressor(
                create_neural_network, # Pass in function
                input_shape=input_shape, # Pass in the dimensions to above function
                epochs=6,
                batch_size=32,
                verbose=False)
        keras_reg._estimator_type = "regressor"
        # Append to our list
        level0.append(('nn_{num}'.format(num=i), keras_reg))# The "meta-learner" designated as the level1 model
    # In my experience Linear Regression performs best
    # but feel free to experiment with other models
    level1 = LinearRegression()# Create the stacking ensemble
    model = StackingRegressor(estimators=level0, final_estimator=level1, cv=2, verbose=1)
    return model

And now we can put it all together:

#Get our input dimensions for neural network
input_dimensions = len(X_train.columns)# Create stacking model
model = get_stacking(input_dimensions)model.fit(X_train, y_train.values.ravel())# Creating a temporary dataframe so we can see how each of our models performed
temp = pd.DataFrame(y_val)# The stacked models predictions, which should perform the best
temp['stacking_prediction'] = model.predict(X_val)# Get each model in the stacked model to see how they individually perform
for m in model.named_estimators_:
        temp[m] = model.named_estimators_[m].predict(X_val)# See how each of our models correlate with our target
print(temp.corr()['target'])# See what our meta-learner is thinking (the linear regression)
for coef in zip(model.named_estimators_, model.final_estimator_.coef_):
    print(coef)

Below we can see with minimal fine tuning our stacking model performs substantially higher than any of our other models with a whopping correlation of .909. We got fantastic results despite the fact that none of the models were optimized, and some even harmful to the overall model with negative correlations, but with the help of the meta-learner it wasn’t a issue. With more fine tuning and pruning of bad performing models you can easily get even better results.

Here you can see how each individual model was impacting the predictions of the meta-learner. The combinations of all these coefficients helps mask or promote the strengths / weaknesses of each individual model.

Conclusion:

Creating stacking models can make it trivial to “squeeze” out every little bit of performance out of your models. In some data science problems every little bit of performance matters substantially, so stacking models can be a quick and convenient solution to achieve this.

However, keep in mind, stacking models usually require substantially longer to train and also have much slower latencies than other models. So, if you need rapid predictions sent to your users then stacking models may not be ideal.

Anyways, thanks for reading! Remember to clap if you enjoyed my brief tutorial.

Fully reproduceable notebook below:

https://colab.research.google.com/drive/11TUd7Yc6hEyAotGMBoAG0xB_NWQXOkD6#scrollTo=qimy45csuiAf

How To Use “Model Stacking” To Improve Machine Learning Predictions

What is Model Stacking?

How to build a stacked model?

Conclusion:

Written by Trevor Pedersen