Sitemap

4 Things I Overlooked in Data Science

Lessons I Didn’t See Coming — Until I Did

5 min readMay 18, 2025

Opening Thoughts

When I look back on my journey in data science, it’s not the technical milestones that stand out — it’s the mindset shifts. The things I overlooked, misunderstood, or underestimated ended up shaping my growth far more than I anticipated.

This isn’t a reflection on textbook knowledge or tutorial walkthroughs. It’s about the lessons that only surfaced with experience — gradually, and often in hindsight. Here are four things I came to realize I had been overlooking.

Simplicity Isn’t Boring — It’s Effective

In my early days, I often rushed into using complex models — neural networks, gradient boosting, and support vector machines — because they sounded impressive. They were trendy, “smart” tools that felt like the mark of a competent data scientist. But looking back, that approach wasn’t always rooted in understanding the problem. It was more about proving something, often at the cost of clarity and relevance.

The truth is, complexity doesn’t guarantee better performance. A well-applied linear regression can outperform a black-box model if used in the right context. But simple models often feel underwhelming. You think, “Surely I can do more than this.” I struggled to accept that sometimes the best solution is also the simplest one.

I’ve since learned to choose models based on the problem — not how advanced they sound. Model complexity should never be a badge of honor. If anything, simpler solutions are preferable: they’re more interpretable, more transparent, and easier to communicate. Dismissing them is often just pride getting in the way.

Sometimes, the easy way out is the smart way in.

What Are We Willing to Give Up?

In a sense, data science is an attempt to explain human behavior through numbers. But how much can data really tell us about people? Can we extract meaningful insights, or are we just fitting patterns to noise?

We rely on theoretical models to represent problems of interest, often bringing along a host of assumptions — some explicit, others subtle. In practice, this might include:

  • Independence of observations — Each data point is treated as unrelated to the others.
  • Stationarity — In time series, we often assume statistical properties (like mean and variance) don’t change over time.
  • No missingness bias — We assume missing data is random, not systematically absent.
  • Sufficient data — We believe we have enough data to generalize from.

Early on, I didn’t always take these assumptions seriously. I often focused more on getting a working model than questioning the foundation it stood on. But over time, I’ve come to realize that overlooking assumptions isn’t just a technical oversight — it’s a philosophical one.

We need to pause and ask: Are these assumptions reasonable in the context of the problem? Are the results still valid even if they aren’t perfectly met?

The truth is, we can never fully capture the real world in a model. Human behavior is messy, context-dependent, and intangible. Theories in practice nowadays are merely just tools to approximate that complexity. These tools don’t reflect reality perfectly — they only bring us closer to understanding it.

And that’s the point. It’s not about finding the “perfect” model — it’s about recognizing what trade-offs we’re making. Every model simplifies reality. The real question is: what are we willing to give up in exchange for insight?

Right Answer, Wrong Question

What drives business? It’s not just accuracy — it’s speed and impact. Sometimes, a mediocre model delivered quickly is more valuable than a high-performing one that takes weeks to deploy. The same goes for solutions that are technically impressive but hard to implement. In the real world, the question always circles back to: How does this help the business make money?

Coming from an academic background, I didn’t fully grasp this at first. I was used to working with clean, curated datasets — often artificial or simplified for the sake of learning. In that environment, I had the freedom to explore every angle, optimize every metric, and push for the “best” solution in a theoretical sense.

But bridging to the real world caught me off guard. It’s not just about building the most accurate model or exploring every statistical nuance. It’s about practicality, speed, and alignment with business needs. Most importantly: Is my solution answering the right question?

We often spend so much energy optimizing models that we forget to ask whether we’re solving the right problem in the first place. And in the business world, I’ve learned that asking the right question is often times more valuable than knowing all the right answers.

Speaking the Right Language

When I first started out, I didn’t know how to communicate with stakeholders without diving deep into the technical weeds. I’d talk about loss functions, regularization techniques, model tuning — all the intricate details I had spent hours working through. But to the people in the room, none of that really mattered. They weren’t data scientists — they were product managers, executives, marketers. What they wanted to know was: What did we learn? What does it mean for the business? What do we do now?

As data scientists, we live in a world of nuance and precision. We obsess over statistical significance, model selection, and data caveats. But it’s easy to forget that stakeholders are operating on limited time, focused on outcomes and action, not methodology. Our responsibility isn’t just to find insights — it’s to communicate them in a way that drives decisions.

Over time, I learned to shift my focus from the “how” to the “so what. Instead of walking through the entire modeling process, I started distilling my work into key takeaways: What problem were we solving? What did the data say? How confident are we in the result? What’s the next step?

This wasn’t about oversimplifying or sugar-coating. It was about meeting people where they are, and framing insights in a way that’s relevant to their goals. Your work doesn’t just need to be right — it needs to be practical and faithful to the business context.

The ability to bridge the gap between technical depth and business clarity isn’t just a “soft skill” — it’s a core part of being an effective data scientist.

Closing Remarks

These aren’t the kinds of lessons you pick up in a classroom. They come from working with messy data, real-world constraints, and real people. School teaches you the tools, but experience teaches you how to use them with purpose.

The most valuable things I’ve learned in data science weren’t technical — they were the things I didn’t even know I was missing.

--

--

No responses yet