START GUIDE

How to deal with an imbalanced dataset

Models trained on imbalanced datasets tend to perform poorly on minority classes because most machine learning algorithms for classification assume the classes are balanced. Not treating the imbalanced datasets correctly and not using correct metrics for model evaluation can cause severe problems if business decisions rely on the model’s outcome. This article shows few tricks when working with such datasets.

Rahul Pandey
DSciEr
Published in
9 min readMay 16, 2021

--

P.S. I made this banner

Classification algorithms are machine learning techniques that involve categorizing data into classes. It is one of the kinds of supervised machine learning, in which algorithms learn from labeled data. Since algorithms learn from the labeled data, hence the distribution of classes plays an important role. For example, training algorithms on the severely skewed dataset, also known as imbalanced datasets, can result in algorithms that perform poorly on minority classes. Fraud detection, churn prediction, spam detection are real-world examples of the imbalanced dataset. Therefore, building algorithms to tackle such imbalanced datasets requires special techniques…

--

--

Rahul Pandey
DSciEr
Editor for

MLOps Practitioner | Cloud AI and Data Architect | Leading ML Innovations at adidas 🖖