The Future of Data Science

Oleg Rogynskyy
3 min readDec 18, 2015

--

The beginning of the modern data era began with just that, a deluge of data. For the past several years organizations have been struggling just to store the amount the of data they generate, much less do anything with it. Business Insider quotes Professor Patrick Wolfe, Executive Director of the University College of London’s Big Data Institute, as saying that only .5% of the world’s data is currently being analyzed. But all that’s about to change.

I believe that when historians look back, they will divide the current data era into three stages: diagnosis, prediction, and prescription. Today, many organizations are still struggling with the first stage, diagnosis. When I say diagnosis I refer to the ability to look at an organization’s entire data set and draw basic conclusions about what’s going on (clustering, anomaly detection), just like a doctor does with a patient. For example, while working with a large credit card provider, we found that a one cent purchase was a strong indicator of home furniture purchases. Why? Because the US Postal Service requires those registering a change of address to submit to a one cent charge to ensure that their request is legitimate. People are obviously much more likely to purchase home furnishings after recently moving, hence the relationship between the one cent charge and the purchase of home furnishings.

Call me an optimist but I believe that the machine learning technology necessary to draw conclusions such as the one above will soon become ubiquitous. However, that’s just the first step. What some forward-looking (and resource-rich) organizations like Google and Facebook are now doing goes beyond simple diagnosis and into the prediction. Prediction is the true promise of machine learning, it refers to the ability of organizations to make educated “guesses” about future events based on current datasets. For example, Cisco leverages H2O to score the 60,000 models they run to predict customer purchase decisions. The idea behind these models is simple: take terabytes of data on customer actions and plug that into what amounts to a giant calculator to tell you what a new, or recurring, customer is likely to want to buy next and when.

The value of being able to make these kinds of predictions is self-evident, and organizations that aren’t able to adapt to the new data era by leveraging prediction are destined to fall behind. However, the ultimate promise of the data era is not prediction, but prescription. Prescription goes a step further than prediction; instead of just telling you what’s going to happen it tells you what you need to do in order to achieve the best possible outcome in the future. In other words, prescription leverages your data to evaluate every possible way you could respond to a specific event and tells you what you should do in order to get the best results.

For example, imagine our Cisco models not just being able to draw conclusions about what customers are going to do, but also about what specific products and features the company should be developing right now to serve them in the future. The idea is that when that Joe Sixpack at Company X calls to place a large order of routers you already have exactly what he needs ready for him before he even picks up the phone. Prediction anticipates future needs, prescription tells you how to start servicing those needs before they even arise.

We’re just beginning to understand how data can improve our lives. Although the current options may seem overwhelming, given time they will become as natural as using a search engine to find answers to your questions online. I would suggest that in the future you won’t need to ask those questions. Your algorithms will do it for you.

--

--