Member-only story
Bad, Good, and Great Practices in Machine Learning: Theory and Practice
Mastering Machine Learning: Avoiding Pitfalls, Implementing Best Practices, and Achieving Real-World Impact
Machine learning models are often used in the real world to solve complex problems. However, the success of these models should be evaluated not only by statistical metrics, but also by real-world performance and impact. In this article, we will discuss common mistakes, best practices, and advanced approaches in machine learning. We will examine each section in detail and support it with practical examples.
CHAPTER 1: Bad Machine Learning Practices
1.1 Ignoring Imbalances in Data
Imbalanced data is when one class is represented by more data than the other. It is usually observed a lot in fraud scenarios. If you develop a model without looking at your target distribution and then use accuracy to evaluate the results, you will have ignored the minority class and followed a wrong path.
Imbalanced data example:
pd.value_counts(df['Class'])
0 284315
1 492
Name: Class, dtype: int64