Simulation Gone Wrong

Over Simplification Can Be No Joking Matter

Decision-First AI
Charting Ahead
Published in
3 min readNov 5, 2018

--

Real world data is hard to come by, though many entry level analysts remain blissfully unaware. It used to be a favorite interview question of mine.

So what is the biggest data set you ever worked with?

The wrong answer was a lie, but the unimpressive truth sounded something like this:

< 1 million rows

< 20 columns

anything that included Kaggle

This is not REAL world data… not really. It is a bit like your surgeon telling you she has logged 10,000 hours on the game of Operation. I’ll pass…

Medical simulations are the real deal or at least a lot closer than most open “real world” data sets. Medical schools spend hundreds of thousands of dollars creating meaningful simulations. Even the data sets found behind the pay walls of software companies or provide during boot camps tend to fall woefully shy of something meaningful.

You would be forgiven for believing the opposite, assuming you had never experienced it first hand. Mimicking the human system so that very non-digital tools like stethoscopes can be used is a somewhat daunting tasking. By comparison, simulating real world data should be a walk in the park. It is after all, completely digital. And yet, medical science is winning.

Also winning, flight simulators. They build cockpits, provide full screen vistas, and on occasion simulate turbulence. All of which go well beyond real world data.

Even video games are getting more immersive and more real world. It is impressive enough when you consider Madden Football of NBA 2K-whatever, but even Fortnight and Halo are feeling more “real” then many training data sets.

The issue is one of feedback. Pilots, doctors, astronauts, and Pop Warner football players — all know what real world looks and feels like. Aspiring analysts have little idea what qualifies as real world data until they experience it first hand. They are actually often put off when they find their data to be big, dirty, and sometime duplicated. And so, the joke is on them.

If you want to learn real world data skills, you need real world data sets. Until a great simulator comes along, focus on apprenticeships and internships. Real data will always trump even the best simulation, but hopefully only by a little — once someone gets it right.

We are working on it:

--

--

Decision-First AI
Charting Ahead

FKA Corsair's Publishing - Articles that engage, educate, and entertain through analogies, analytics, and … occasionally, pirates!