The Model You Don’t Build May Be Your Best One

Colin Davy
State of Analytics
Published in
3 min readJan 16, 2017

Models are all the rage right now, and why wouldn’t they be? Their applications and capabilities seem to double every year, the toolboxes keep growing, and you can’t throw a rock without hitting 10 articles written about their use cases every week. It’s inevitable, then, that clients will start to ask for more and more models, since they’re hearing all about them. Along with that comes a tendency to ask for the model first, and a deeper examination of the business question they’re trying to answer later, when it should be the other way around. Understanding the question the client is trying to answer is a prerequisite to selecting the right model for the problem, but often times, there’s an underrated approach to answering their business question: not building a model at all.

Case in point: a while back, I had finished a proof-of-concept with a k-means clustering models to help a client understand their customers better. We had a follow-up project where they wanted to know more information about a different set of customers, so naturally they asked for another k-means model. I could have built them exactly what they asked for, using the customer dimensions they already provided, and it would have been exactly what they requested, which is usually the success benchmark of any deliverable. However, after probing a little more to find out more about the question they were trying to answer, I figured out the answer they were looking for could be effectively provided with a simple SQL query and a data visualization, which I pushed for as a better solution. It’s not always advisable to tell a client that what they explicitly asked for isn’t actually what they really want, but in the case of models, I find myself more often looking to not give them a model if they don’t have to have one.

Models are great, but they’re also incredibly cumbersome to maintain. They require someone with an understanding of data science to maintain, their internal fitting processes are not intuitive, and if someone asks about their methodology after you leave, they don’t lend themselves to easy explanations. Sometimes, when their applications have a very clear-cut objective, like predicting a quantity, they can be well worth the hassle, especially if they’re benchmarked against some other existing process’ accuracy and/or labor required. But for more abstract applications, like clustering and customer segmentation, it’s not immediately clear that their insight will outperform simpler statistics and methods. All else being equal, why would you not opt for the simpler solution?

I get it, having a model can feel cutting edge. The client’s happy they’re doing futuristic stuff, you’re happy to put that data science training and capabilities to use and everyone can marvel at how smart they feel. But doing what’s right for everyone isn’t about making people feel smart, it’s about getting stuff done with the least amount of short and long-term cost. There’s no shortage of legitimate use cases for more advanced data science and modeling techniques, but the worse tendency to fall into is becoming the hammer and seeing every problem as a nail. Always doing what’s right for the client takes a lot of forms, and sometimes that involves recognizing the best model for the job is the one you don’t build.

--

--

Colin Davy
State of Analytics

Colin is a consulting data scientist in San Francisco, a two-time winner of the Sloan Sports Analytics Conference Hackathon, and a Jeopardy champion.