What is Data Science?

A log of my current understanding.


To anyone reading this looking for a textbook definition, I think you might have to look elsewhere. As a student learning more and more about data science every day, I thought it would be a cool idea to log what I actually think data science is periodically. Tracking progression, as many data scientists know, can lead to some really cool insights; and I hope to give my future self some entertainment along the way. So as of May 11th, 2014, these are my thoughts on data science.

The bread and butter

Data science, to me, is the process of making sense out of large sets of data. It’s strange that it is called a ‘science,’ because I feel that it is more of a procedure than a study. I am 100% sure that the field has all the merits and reasons to be called a science, but I think of data science to be more of a collection of skills and tools aimed at extracting value from data.

I am a total fighting game fanatic (games like Street Fighter, Marvel vs. Capcom), and players often use a term called ‘bread and butter’ to describe a particularly reliable combo. These combos usually do solid damage in the game, and players can probably perform their bread and butter combos in their sleep.

Strangely enough, I think data science can draw some interesting parallels from fighting games. Data scientists often have to come up with certain strategies to ‘defeat’ their opponent, to which they employ all their skills and ‘bread and butter’s to bring them to victory. In Street Fighter, you can’t just flail and rely on long combos to win; successful players, like data scientists, know what they need to look for and focus all their skills on a certain strategy to win. The bread and butter’s, in the data scientist’s case, can be a variety of things: statistics knowledge, Python chops, machine learning techniques — but the main takeaway is that they ALWAYS point towards a winning strategy.

What comprises an actual data scientist?

Switching gears to something a little more academic than Street Fighter, a data scientist (in my humble opinion), needs three specific areas of expertise to be successful. In no particular order:

  • Programming or software know-how
  • Statistics sense
  • Business intuition

The first bit, I must admit, is clearly a bias of the company I want to work for — Twitch. After getting a big chunk of advice from one of their data scientists, I realized that data science is simply not possible without the programming chops to facilitate the process. Whether you know Python, R, SAS, or even Excel, data scientists need a way to streamline all the mess in big datasets. Data just doesn’t organize itself!

Secondly, data scientists need a solid foundation in statistics to properly conduct analysis on data. Statistics people will just have an easier time identifying trends, anomalies, or proper techniques to use (sorry for the lack of parallelism). Take, for example, two lists: [10, 8, 12] and [1, 28, 1]. For the sake of simplicity, let’s just imagine that these two lists are too long for mental math to comprehend. Typical data science-y techniques would output both datasets to have an average of 10, but we can clearly see that these two lists are vastly different. Someone without a statistics-mind might be inclined to report that both datasets are similar based on the average alone, but will fail to see the huge amount of variance in the second list. This is referred to as the flaw of averages, which is a pretty big deal.

Lastly, data scientists need to have a clear business intuition to be successful. Above all, data scientists extract VALUE from large sets of data; that ‘value’ isn’t just an interesting trend or weird value. Data scientists must provide a business value, something that the company as a whole can use. The insight must also be actionable, and must make decisions easier for those who need to make the decisions.

And that’s what I think of data science so far! As a graduating senior, I like to see my graduation not as an end to learning, but rather, as just another milestone in an ongoing process. Please do not take this blog entry as law — I would love it if anyone would correct me on anything I’ve written!

Happy hacking,

Christian