Model Building

3 min readFeb 1, 2022

After we are confident with the data requirement, metrics, cross-validation. Model building is the next step. Model building process is well studied. Kaggle and some competition platforms are good places to learn and practise.

Build BaseLine

Before you start with a cool, deep learning model. First step in the model building is to create a baseline model. Baseline model will help you set the benchmark.
For the sake of simplicity, we consider Logistic regression as baseline for all the four models.

Different Models require different feature transformations. For example, Neural Networks require feature normalization for better performance, but the decision tree might not need feature normalization. Thus Feature Transformation is specific to models.

Different families of model that are discussed are

Linear Models
- Model that create linear boundaries and it is not capable of learn complex boundaries.
- Example are Logistic regression, LinearSvm.
Tree Based Models
- Model is built as tree of if and else set of rules.
- Examples are Decision Tree, Xgboost.
Bag of Word Models
- Learning does not take into consideration of order of input. “Can you help me”, “help me, Can you” (yoda style) are same.
- Examples are Naive Bayes, Svm.

Different models work in different ways. Some model will fail to learning that we were expecting.

Model Fune tuning:

Once we have metrics, cross validation and transformed features, the next step is to try different models and fine tune them. There are two ways to do it.

Manual Search

We define Manual Search as hand picking the parameter and playing with different parameter to get the best model. This technique is not to be underestimated. I personally have seen many Kaggle Master and kaggle Grandmaster do the manual search. It is better to start with manual search to get the clear picture

I follow this order:

Address underfitting problem
Address overfitting problem
Repeat this process till you have balanced model

Parameter tuning for underfitting case

For Over-fitting case, we just have to do the opposite.

If model performance is not satisfied, explore any features needs to be added or additional training is required.

Automatic Fine tuning:

There are many automatic fine turing such as Grid Search, Random Search, Genetic Algorithm etc. The paramter turning can be done in few lines of code. Please check out this link for more

Model Selection:

Different models are parameter tuned and report is generation. Which model is to choose.

Consider this two case,

Xgboost gives 82% and Decision tree gives 80%.
Ulmfit gives 90% and Bert givies 92%

Answer is the metrics is not the only way to decide the best model. This is where designing ml features is different from kaggle competition. To be honest, customers may not see the 2% difference.

Three metrics that can be considered are

Model Size
Easiness of retraining the model
Model Interpretability

My decision to choose is choose Decision and Ulmfit is that

Decision Tree, Ulmfit can be easily retrained with less resources.
Decision Tree, Ulmfit have less memory and easy to fit than Bert or Xgboost
Decision Tree, Ulmfit are more easy to tweak and understand the issue.

This article concludes with model training. In the next article, Focus will be on building ml features in B2B Space. Please click here for next article

Please click here to go to the first article

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com

Model Building

Manual Search

Model Selection:

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Prabu Palanisamy