Member-only story
Confessions of a Data Scientist: My Worst Model — and What It Taught Me
Handling Class Imbalance in Python
In my years as a data scientist, I’ve worked on a wide variety of projects as well as created solutions that meet business and consumer needs.
When working with data, especially large datasets, it is essential that you build and deploy models that can effectively deliver data solutions and improve the overall quality of your workflow.
However, most times things don’t go according to plan.
Recently, I worked on a project. I thought I had struck gold. It was a simple binary classification task, to predict customer churn for a mid-sized subscription service.
I was excited about it because it was the kind of problems I hoped to create solutions to with my data science and programming skill sets.
I could say the project was a data scientists dream — real impact, real stakes.
With excitement and confidence, I spun up a fresh Jupyter notebook, loaded my CSV, sipped my espresso and got to work.
The project wasn’t new to me because I’ve had a bit of experience with similar tasks during training. I did everything by the book: encoded the categorical variables, handled missing values, split the data…