Multiple Model Prediction [Machine Learning]

Roshan Alwis
Tech Vision
Published in
2 min readOct 26, 2016

I hope that everyone reads this article has an insight of what machine learning is. Most of the time that we might have stuck in a position that we need more accurate results but the system itself does not provide any.

There can be many reasons for that.
1. The size of the dataset might not enough to train a stable model.
2. All the features in the training data set might not support for the final result.
3. There can be noisy data within training data.
4. Selected algorithm might not fit into the scenario. etc…

The suggested approach to overcome these problems is given below which is very much close to ensemble learning. This method will generate multiple models instead of creating a single model. You can do a grid search for tuning the hyper-parameters of each model. But note that this method has intermediate calculations which can be an overhead for real-time applications.

Multiple Model Architecture

Step 1

Split data into train and validate frames.

Step 2

Create an Autoencoder model and measure the reconstruction error for each row in the training data set. Or you can choose alternative approach to remove the anomalies.

Step 3

Based on the reconstruction error remove the anomalies in the training data set. (Setup a threshold error value to remove the rows which have reconstruction error more than that)

Note : There is possibility of removing valid data which can lead to an unstable model.

Step 4

Do feature engineering process to fine tune the data set. If you want, you can split the feature engineered data set into training and validation again to validate the models.

Step 5

Train multiple models based on the training data. Here we have created 10 models. If you want you can build models using multiple algorithms. Select five models which have lesser root mean squared error.

Step 6

Apply testing data upon the selected models.

Step 7

Store the predicted results

Step 8

Sort the predicted results and select the middle 3 values. By doing this we can avoid the outliers.

Step 9

Calculate the weighted average of above 3 values based on the root mean squared error (RMSE) of the model that they have originated from. For models which have higher RMSE are assigned with low weights and vice versa. Calculated answer would be the final answer.

--

--

Roshan Alwis
Tech Vision

Software Engineer at Sysco Labs. (Computer Science & Engineering Graduand at University of Moratuwa)