Is the Gravity of Data More than We Bargained For?
At our machine learning announcement in New York a few weeks ago, I tweeted this:
“Because it makes sense to follow the gravity of data for machine learning”
That tweet was re-tweeted several times, it clearly resonated with a lot of people and it got me thinking…..
Are we fighting the gravity of data with the push to the cloud???
Now don’t get me wrong, we all agree the cloud is the destination of choice. The ability to be more agile, not worry about standing up clusters or managing them, instant gratification of a solution with only a few clicks… I mean who doesn’t want that? But as it turns out, it seems the road to cloud is a bit more of a journey than we expected, why?
Maybe the gravity of data is a bigger force than we bargained for
Here is an interesting statistic for you, 90% of the world’s data is not something you can Google.1 That’s right, it’s not sitting in some cloud just waiting for you to come and get it, most of our data is locked up tight in our own systems on premises.
Let’s take the amount of operational data on mainframes alone for mostly financial services, credit card companies etc. That’s right, pretty much every time you swipe your credit card, that transaction is landing on a mainframe. In fact, most of the world’s structured data is on a mainframe, which means all that data is probably not being leveraged today. Essentially, it’s dark data to the companies that own it, piles and piles of data, getting bigger every day.
And this is the case with most of our operational systems all over the place. Transactions happen, it’s a lot of small data, but it’s fast data and it’s incredibly valuable to a business. It can tell you about logistic issues, operational issues, issues on the manufacturing floor, fraud hides in these transactions, so do cyber attacks. This is data about our business, the way our business is running and how healthy the business is. The fact is, most firms do not fully leverage this data today or they end up pulling only a subset into a data warehouse and limit access via departments and applications.
Well so much for the democratization of data eh? Let’s start with opening up what we have and using that data to make us smarter. I mean- it’s there for the taking, it’s free and it’s ready to whisper secrets about our businesses. We just need to listen.
This is exactly the reason why we initially released machine learning on the System z platform. http://www.ibm.com/analytics/us/en/events/machine-learning/
It just made sense to start where a big chunk of really valuable data was sitting in the dark. By the way, if your company has operational data on a Systems z- you should also check out how to leverage data virtualization to shine a light on that dark data, @Rocket Software offers a great solution for this…. http://www.rocketsoftware.com/products/rocket-data/rocket-data-virtualization
Lets face it, you need lots of data to train and model, so let’s go where the data gravity is pulling us to leverage analytics and machine learning- let’s start with our on premises gold mines and go from there.
Ok, so for things like Machine Learning and even Data Science I think we can all agree that we need a hybrid architecture. The true challenge isn’t getting to the cloud, it’s creating a bridge so that all of you data wherever it is can be leveraged.
If your destination is the cloud today (or tomorrow), don’t fight the gravity. Focus on hybrid, build a data architecture and governance model that allows you to democratize and leverage data no matter where it lives.