The Data Manifesto: Why Data Science Matters?

Data science is important. Heck, it is important enough that many CEOs are now asking their CFOs or CMOs “are we doing any big data?” And it’s common enough that even my mom knows that I work in “data science”.

Most of us have read at least one story about companies who used data science to predict the future, revolutionize their business, or even, disrupt an industry. While some know data scientists “create models”, “code” and “make predictions” scarcely anyone knows what it means or what data scientists really do.

The answer is that we, data scientists, find patterns in events.

Why is finding pattern important? Imagine a world where most people can describe the world, while a few possess the skills to see connections between them.

The first group sees that after the heavy rains, the riverbeds flood and people have to take shelter into higher grounds. The describe it as:

Flood happens -> People Flee -> Go to higher ground

1607 Flood — Burnham Woodcut

The second group notices while flood brings misery, the floodplains do tend to have more agricultural outputs in the following year

Flood happens -> Floodplains produce more food next year

Egyptians farming in the floodplains after a flood

The second group are the people whose skillsets will eventually be known data science. In a nutshell, pattern finding leads to understanding connections between events and prediction that can lead to solving a problem, or finding a new resource (“productive floodplains”) to improve lives.

Human beings are good at finding patterns, as our early survival and progress depended on it. If you were an early hunter gatherer, your well-being dependent on finding patterns in the migratory routes of game animals (“more food”), or, seeing a shadow in the bush to decipher it is a mountain lion (“don’t get eaten”). Unfortunately since our survival correlated with overestimating in the second case, we became good at seeing patterns where none exists. Consequently, even today, we tend to see lot more “false positives” when we do mental estimations.

The other reason why we produce more false positives is because most of us are not made for experimentations like, for example, “A/B test” — another favorite tool for data scientists. This has again to do with human evolution: imagine the same mountain lion, now real, staring at two of our forefathers, Joe and Jeff. Joe sees the lion and runs for life, while Jeff reasons — “let me run an A/B test: instead of running like Joe, I will stand here and see what happens”. The early A/B test experimenters like Jeff, if existed, largely got taken out of gene pool because of their fondness for running A/B tests.

What has this story got to do with data science? With the help of modern tools and technologies data scientists do a few things well (i) find patterns in a codified manner that can be shared across the organizations and industry, and (ii) create “experiments” where the models proposed can be tested with real life data and improved over time. Data science is radical because it enables flexible, computationally-driven pattern finding, and is able to bring scientific experimentation, consistency and objectivity to problem solving.

Hi there. These articles take quite a bit of time and effort to write, so if you’d please click the green heart below, it would let the editors know that you liked it. Thank you

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.