Four Oversampling and Under-Sampling Methods for Imbalanced Classification Using Python

Amy @GrabNGoInfo
GrabNGoInfo
Published in
13 min readFeb 19, 2022

--

Random Oversampling, SMOTE, Random Under-sampling, and NearMiss

Four Oversampling and Under-Sampling Methods for Imbalanced Classification Using Python. Random Oversampling, SMOTE, Random Under-sampling, and NearMiss
Picture Owned By GrabNGoInfo

Oversampling and under-sampling are the techniques to change the ratio of the classes in an imbalanced modeling dataset. This step-by-step tutorial explains how to use oversampling and under-sampling in the Python imblearn library to adjust the imbalanced classes for machine learning models. We will compare the following four methods with the baseline random forest model results:

  • Random Oversampling
  • SMOTE (Synthetic Minority Oversampling Technique)
  • Random Under-Sampling
  • Near Miss Under-Sampling

Resources for this post:

Step 0: what is an imbalanced classification

First off, what is an imbalanced classification? An imbalanced classification is also called a rare event modeling. When the target label for a…

--

--