Four Oversampling and Under-Sampling Methods for Imbalanced Classification Using Python
Random Oversampling, SMOTE, Random Under-sampling, and NearMiss
Oversampling and under-sampling are the techniques to change the ratio of the classes in an imbalanced modeling dataset. This step-by-step tutorial explains how to use oversampling and under-sampling in the Python imblearn
library to adjust the imbalanced classes for machine learning models. We will compare the following four methods with the baseline random forest model results:
- Random Oversampling
- SMOTE (Synthetic Minority Oversampling Technique)
- Random Under-Sampling
- Near Miss Under-Sampling
Resources for this post:
- Video tutorial on YouTube
- Python code is at the end of the post. Click here for the Colab notebook
- More video tutorials on imbalanced modeling and anomaly detection
- More blog posts on imbalanced modeling and anomaly detection.
Step 0: what is an imbalanced classification
First off, what is an imbalanced classification? An imbalanced classification is also called a rare event modeling. When the target label for a…