Member-only story
Common Issues that Will Make or Break Your Data Science Project
A helpful guide on spotting data problems, why they can be detrimental, and how to properly address them
I believe most people would be familiar with the survey that indicates data scientists spend about 80% of their time preparing and managing data. That’s 4 out of 5 days in the workweek!
Though this may sound insane (or boring), you quickly realize why this trend exists, and I think it goes to show the importance of data cleaning and data validation.
Rubbish in, rubbish out.
Getting your data right is more than half the battle won in any analytics project. In fact, no fancy or complicated model will ever be sufficient to compensate for low-quality data.
For beginners who are just starting out in this field (certainly the case for me), I understand it can be difficult to know what exactly to look out for when dealing with a new dataset.
It is with this in mind, I want to present a guide of common data issues that you will stumble upon at some point in your journey, along with a framework on how to properly deal with these issues as well as their respective trade-offs.