How I need you.
To store finished models,
for my CPU.
As Data Scientists, a key part of our workflow is generating models. More often than not, we are fitting multiple models to our data to find which one works best and providing analysis based on the results. This means that if we’re dealing with 30,000 rows and 100 columns of data — which isn’t atypical — modeling data may take a very long time. Furthermore, if we’re grid-searching to find the best hyperparameters for each model, the time it takes to fit a model and get results increases significantly. And there’s the problem, having to rerun those models over and over again each time you run code can be computationally expensive and waste a lot of time. …
As a twenty something year old trying to figure out this whole “adulting” thing, cleaning has unfortunately fallen by the way side — a chip forgotten in the side of the driver’s seat. I mean I meant to pick up the chip a week ago, things just got… busy. So discovering that my desired career path is 80% cleaning was definitely a reality check. Looks like I’ll have to whip out that car vacuum cleaner after all.
As a data scientist, I deal with data and it can get very messy. Missing values, bad column names, typos — you name it, data can describe it (inaccurately). Needless to say, working with messy data will cause more harm than good, so it needs to be cleaned. …