OK, so what is a model exactly?

Emily Glauser
Aug 9, 2017 · 3 min read

“All models are wrong, but some are useful” — George Box

Model, modeling, algorithms- these are the hottest buzzwords in data right now which of course means an enormous misunderstanding of what they actually are. I’ve found that there are a few important questions to ask when dealing with modeling:

  • When to use a model?
  • What kind of model to employ?
  • Do I need to use a model here?
  • Also, WTF is a model, anyway?

Let’s answer that last one first. A model is a formula for looking at your data. It’s a method. The goal of a model is to get specific with the question you’re asking of your data. They are used to identify inferences from data, relationships and associations. Math is necessary. A full EDA is necessary. Advanced technical skills? Not so much.

When to use a model

This is the first question I find valuable when looking at my data. If I want to know how many users clicked on a banner ad on my website in March of 2016, I don’t need a model for that. If I want to predict how many users will click on a banner ad based on it’s position on a page, that’s where a model will come in handy. Essentially, data modeling exists to tell us stories within our data that are not surface level. A properly formatted logistic regression model can predict whether an applicant will get into university based on certain criteria. A clustering model can tell us segmentation of a user base according to features that are based on patterns in behavior. Just because a customer makes their first order on the day they subscribe, doesn’t mean they will be a long term valuable customer, nor does it mean they have anything in common with other customers who make a purchase on the same day they subscribe.

What kind of model to employ

This is where you get creative and statistics comes in handy. Say you have survey data from sample population of people who each answered a 500 question survey about characteristics of themselves. That’s 500 variables to consider when viewing and analyzing your data. In order to gather patterns or insights from this data, it’s important to use dimensionality reduction to simply make it understandable, not to mention useful. How many different methods can one try for dimensionality reduction? Dozens? Linear classification modeling? Weighted averages? It depends on what your data looks like. This step is absolutely key and I have yet to meet a data scientist who is great at answering this question right the first time. It helps to know stats, but even PhDs struggle with this. Study the basics and employ the basics first. Try a linear regression, see what happens. Your primary model is a test.

Do I even need a model here

Logic would tell us that this question should likely come first. Unfortunately, that is not always the case. You’ll sometimes find that weeks worth of building, testing, frustration and tears (did I say tears? I meant joyful perseverance) will reveal to you that your efforts were useless and you should just stop, pack it up and quietly go home. A model doesn’t always have an answer for you and it’s entirely reasonable to admit that maybe there isn’t a great forecasting method or predictive indicator or simply that your data is total shit. Modeling doesn’t always work out and even when it does, it doesn’t always tell you something compelling. So get used to that.

One of the better papers I’ve read on formal modeling and it’s use cases is by Roger Peng and Elizabeth Matsui in their “Art of Data Science” Guide. They outline modeling for the lay person and even if you don’t currently live in that world, it’s a good scraping of the surface. Or at least something to take your mind off your failed modeling efforts.

100,000 Hours

Mapping the road to “Data Scientist”

Emily Glauser

Written by

forever curious

100,000 Hours

Mapping the road to “Data Scientist”

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade