Sitemap

Handling Class Imbalance in Machine Learning: Practical Techniques That Actually Work

4 min readApr 18, 2025
Seavleu Heang (Genevieve)
Seavleu Heang (Genevieve)

Since I am currently working on a bidding project, I thought about writing down what I’ve been tackling so far, not to only hope this might help a lost soul who encounter the same issue but might also seeking for experienced individuals who might able to lend some hand.

Class imbalance is one thing that can make even a good model look horrible (I am lowkey scared of how low % we had).

If you ever trained on a classification model where one class shows up way more than the others (like frauds vs. legit transaction, or passed vs. failed bids), you know the struggle. Your accuracy looks great, but your model completely ignores the minority class, which is the one you probably care about most.

Let’s break down some real technique I used to deal with this properly.

Wait, why class imbalance matters?

In imbalanced datasets, your model tends to optimize for majority class, since that minimizes the loss most easily. But in real-world case as such medical diagnosis, fraud detection, or bidding prediction, missing the minority class can cost us a big time.

So instead of chasing accuracy, we aim for a better balance, improving recall, F1-Score, and precision for the minority class without sacrificing overall reliability.

1. Resampling: Balance Your Data First

Oversampling

This means increasing the presence of the minority class.

  • Random Oversampling: Easiest to try. You just duplicate minority samples. Can work, but might overfit.
  • SMOTE (Synthetic Minority Over-sampling Technique): Creates new synthetic samples based on feature-space similarities.
  • ADASYN: A smarter version of SMOTE. It focuses more on samples the model struggles with.

👉 Tip: If your features are numerical and reasonably structured, SMOTE/ADASYN tends to outperform raw duplication.

Undersampling

Here, you reduce the number of majority samples.

  • Random Undersampling: Just drop random majority samples. Risky, you might lose important info.
  • Tomek Links: Deletes overlapping pairs from different classes, helping clean the decision boundary.
  • Cluster Centroids: Replace majority class clusters with their centroids, smart way to preserve structure.

👉🏻 Tip: Try combining undersampling with oversampling. For instance, clean up the noisy boundary with Tomek, then use ADASYN to boost hard minority samples.

👉🏻 Extra boost: I’d suggest do a cross validation with K-Fold to ensure there are no overfitting when doing resampling.

2. Adjusting Algorithms to Care More

Cost-Sensitive Learning

Instead of treating every mistake equally, you assign higher weights to minority class errors.

This tell your model: “Hey, getting this class wrong is more costly.

Many ML libraries like sklearn, XGBoost, and LightGBM support this via class_weight or scale_pos_weight.

Adjust Decision Thresholds

Model often defualt to a threshold of 0.5. But in imbalanced cases, you might want to lower it to catch more minority cases

👉🏻After training, run a threshold sweep and tune for best F-1 score or recall, depending on your use case.

3. Ensembles Methods to the Rescue

Random Forest

This bagging-based method builds multiple trees on bootstrapped samples. Since it is naturally averages predictions and exposes class weights, it’s surprisingly robust even with imbalance.

XGBoost (and LightGBM)

Boosting models train sequentially, each one correcting the mistakes of the last. So they inherently focus more on misclassified (minority) samples.

Tune scale_pos_weight or is_unbalance=True to help it do even better.

Bagging

Train multiple base models on random subsets. This reduces variance and helps stabilize predictions on rare classes.

👉🏻 Ensemble Trick: Mix multiple strategies — e.g., oversample your data, feed it into a weighted XGBoost, and then blend results via voting or stacking.

4. Evaluation: Unreliable Accuracy. Use These Instead.

When 95% of your data is one class, 95% accuracy mean nothing. In that case I’d suggest to consider these workaround

  • Precision & Recall: especially important when False Negative (FN) or
  • False Positive (FP)
  • F1-score: the harmonic mean of precision and recall. Balanced.
  • Precision-Recall Curve: Better than ROC AUC when your data is skewed.

👉🏻 Always report macro and weighted scores, because they show how your model performs across all classes, not just the dominant one. Also if you are still doubting the accuracy of the model I suggest to come up with a custom evaluation metrics base on your intuition and the project direction.

My Go-To Strategy (When it’s real-world and messy)

Below is the basic principle that you have already apply but I will list it down again.

  • Clean your data
  • Run EDA to see how imbalanced it is
  • Try Tomek Links + ADASYN
  • Train with XGBoost + scale_pos_weight
  • Fine-tune threshold base on validation F1
  • Use cross-validation with stratified folds
  • Log everything. Plot everything. Don’t trust a single metric.

Final Thoughts

Handling class imbalance is a non-negotiable in many real-world machine learning problems. Don’t let the model cheat by only predicting the easy class.

This is my first time too, there were so many failed approach I tried, but it comes down to one thing, with the right combo of resampling, algorithm tuning, and proper evaluation, I believe you can help the model to see what really matters (target output).

Lastly, if you’re experimenting with all high imbalance like 95:5 or worse, try stacking all techniques and evaluating across different folds and seeds.

That’s it for today, I will see you guys in the next post;)

--

--

Seavleu Heang (Genevieve)
Seavleu Heang (Genevieve)

Written by Seavleu Heang (Genevieve)

write about AI and Web dev from experience here and there whenever I'm free

No responses yet