I’m sure that every Data Scientist/ ML Practitioner has faced the challenge of missing values in their dataset. It is a common data cleaning process, but frankly, a very overlooked and neglected one. However, an effective missing value strategy can have a significant impact on your model’s performance.
The reason as to why missing values occur is often specific to the problem domain. However, most of the time they occur from the following scenarios:
I’m sure you have come across a few of the following scenarios:
Well congratulations, because you might have outliers in your data!
In statistics, an outlier is a data point that differs significantly from other observations. From the figure above, we can clearly see that while most points lie in and around the linear hyperplane, a single point can be seen diverge from the rest. This point is an outlier.
For example, take a look at the list…
Logistic Regression is essentially a must-know for any upcoming Data Scientist or Machine Learning Practitioner. It is most likely the first classification model one has encountered. But, the question is, how does it really work? What does it do? Why is it used for classification? In this article, I hope to answer all these questions, and by the time you finish reading this article, you will have:
So, get ready for the wild adventure ahead, partner! …