Win First, Explain Later

How wanting to feel clever keeps you from the next level

5 min readSep 25, 2022

The stories we tell

Do you like digging into a dataset to find hidden treasures that everyone else missed? I love that feeling. It’s like walking up to a big unfinished puzzle on a family trip and quietly completing it when no one else is around. In data science, those puzzle pieces are insights or artifacts in the feature space. You might notice an interaction that boosts prediction accuracy when encoded properly or you might find a signal in non-random missing values. This is an important exercise and a valuable skill, but it can lead to that “if I can’t understand it, it must not be real” bias that prevents us from next-level results.

Our minds rely on stories to organize the world. Someone asks about my work day and I have no problem recounting only the relevant details leading up to an exciting incident that was conveniently resolved by me, the obvious hero of the story. We all do this constantly. The same thing happens when you find a feature that improves model performance and you explain it. “We noticed that when it rains in the summer on the west coast, people shop less but shopping on the east coast is unaffected by rain overall. Combining local weather with geography and time of year improved our demand predictions by 7%.” Contrast that with, “we added UV index and zip code to the features and the black box algorithm improved demand predictions by 9%.” You know the latter is better but somehow it’s unsatisfying or even suspicious.

Engineer the hell out of it

It’s good to like solving puzzles and feeling clever is a nice reward, but if you want to be a bona fide data science professional, you can’t depend solely on clean explanations. If using your imagination to identify clever insights that solve a problem is one extreme, the opposite is using tools to engineer the hell out of a solution. I can build a regression model with carefully crafted features that theoretically should influence the outcome variable or I can try all pairwise interactions of features in every possible subset through brute force. The thoughtful approach requires brain power while the brute force approach requires compute power. Which is better depends on the actual problem, but understanding the difference is critical.

As you consider these extremes, you might realize your bias is to pursue quick and dirty solutions or elegant ones. I was once asked to write a script to delete a bunch of unused virtual machines. The script would take about 20 minutes to write but I’d have to wait a few hours to get the right permissions to run it so instead I did a few manual deletions and estimated I could do it the dirty way in less than half an hour. I silenced my notifications, set a timer, and 17 minutes later it was done. If this were something I needed to do more than once, I probably would have written the script, but it was a one time cleanup. Quick and dirty was the way to go. This is a simple example and the stakes were low, but it begs the question: how do you adopt a mindset that allows you to approach problems with more flexibility?

Earn your way to clever

My general advice is to follow the engineered solutions to their limits then get creative from there. For the engineers, don’t just engineer the hell out of everything because someone will eventually find an elegant approach that makes your brute force unnecessary and wasteful. Be curious about different domains, use your imagination, and take some creative risks on the margins. For everyone else, get your hands dirty before trying to be clever. You don’t need to build everything yourself but you need to understand what’s been done and how it works. Exponential improvements in computing power and tooling make this a never ending process of evolution. The road to obscurity begins when you disengage from the dirty work.

In data science, elite practitioners used to be highly valued for their ability to do thoughtful feature engineering that drove performance while remaining explainable to a non-technical audience. This is not nearly as valued today. For example, advances in computer vision have demonstrated that convolutional neural networks armed with enough data and compute power can outperform decades of human-guided feature engineering in just a few days. While there are still plenty of applications where careful feature engineering is important, it’s usually due to a lack of data or prohibitive costs. The critical insight is that many valuable skills today will be reduced to a few lines of code in the future. The question you have to ask yourself is how near is that future?

In practice

As a practitioner and a human being, you have to manage the tension between your desire to explain things and your performance objectives. Sometimes, the best solutions fit nicely into a clean narrative that’s simple, generalizable, and satisfying. If you can find solutions like this, savor it. But when you can improve performance without a clear explanation, that doesn’t need to be a drawback. While I can’t give you a strategy for how to approach all problems, I can leave you with some guidance that will hopefully put you in a better position to sort these things out for yourself.

Start with the broader context and approximate the relative value of pure performance. If you’re predicting price variance for day trading stocks, explainability has zero value. No one will care if you can’t explain something that generates consistent profits. Alternatively if you’re predicting onset of a disease, understanding factors that are likely to cause it might help people make decisions to prevent it. Given the broader context, invest time and effort proportionally. If understanding is more valuable or more likely to drive performance, understand the data first and then engineer solutions. If brute force performance is likely to work, prioritize engineering infrastructure first and then get creative. The problem space should shape your approach, not your desire to tell a good story about the solution.