Chris Kim, Model Maker

Chris Kim
Chris Kim
Jul 10, 2017 · 2 min read

I find and make models.

I think they’re models with pretty interesting looks and traits. I train them. I test them. Even though I try to make the best model, they sometimes don’t score very highly. You probably have guessed already, but I’m talking about predictive models.

I made a function that lets me choose from an original dataset the features, target, NLP vectorizer, and classifier. With all these inputs, I’ll get an accuracy score based on the actual target. Then using a helper function, I get a dataframe that lists which vectorizer and classifier were used. It returns in descending order based on accuracy score. All this work to find out, which model and parameters provided the most accurate score.

I used these functions in my last project, the Data Scientist Seeking Data Scientist. These two functions work well for analyzing text data to determine if NLP can help determine classes.

Things to improve on: Construct function in a pipeline or as an object. I’ve done some pipelining and object-oriented coding, but it is not as natural to me as writing a straight function. There’s great examples on the bottom of Scikit-Learn’s website that I will need to go back and read through again. For those interested, here is my code:

Definitely not pretty, but easy for me to understand
Choosing which models, vectorizers, and features to use

I promise to update this blog once I rewrite my code in a pipeline/class. Hopefully, promising this makes me accountable!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade