How much is your virtual data mattress costing you?
By Andrew C. Oliver
Want to know the most common business mistake made today? Data hoarding.
Thanks to the buzz of Big Data, organizations are stuffing data into socks and drawers and mattresses, waiting for that rainy day that will likely never come. This stored data doesn’t ask the right questions or solve the right problems, it just sits and waits. The whispered promises of Big Data are rarely fulfilled in these old piles of moldy data. To really deliver value, data has to do more than just follow the mantra of “store first, ask questions later.”
Ask questions first
Compliance aside, does this data bring you joy? What is it and why are you storing it? What value does it add to your business today, in real-time? Is this data stored because of the concept of Big Data or is it actually of use? The answers will go a long way towards determining the value of the data and minimizing the cost impact of storing and managing the data from the outset.
There are questions that require n-dimensional cubes in order to deliver efficient answers, but many can and should be answered in real-time. Because those answers hold their maximum value if delivered today. Today’s sales compared to yesterday’s, trending patterns in behavior or fluctuations in demand, patients waiting for beds or the status of a supply chain — these are questions that provide answers that have value. They also reinforce the fact that data has to be relevant in order to do its work.
The truth is that most data expires. It has a lifecycle that, once exceeded, leaves you with little more than a puddle of ones and zeroes that take up space but more importantly, require financial resources for their upkeep that would be better spent elsewhere. Big Data may have ignited innovation in storage and analytics and insights, but it has also given everyone the excuse to keep on with bad data habits. Maybe you will find a use for it one day. Maybe. “Let’s keep it on the off chance it’ll become useful,” or “Wouldn’t want to erase it and then find we could have used it.” You know the psychology all too well.
Stop hoarding, build a more robust data strategy, and pay attention to the questions
A long-term and resilient data strategy focuses on what the data is, what its proximate and ancillary values are, and the cost of storing it in operational, analytical and offline systems.
This does more than just examine the value of the data but also includes a plan to carefully dispose of the data once the cost of storing and managing it has exceeded relevance. This cost is becoming an increasingly important consideration. Budgets will be far less forgiving of data hoarding as the cloud takes over. The costs of transporting data to cloud storage and storing data there for the long term will increase dramatically, rapidly eroding any value that the data can deliver as it decays in the dark.
The business questions that are being answered through platforms such as Teradata and Hadoop should be answered on the operational level with an operational database with massive parallel processing (MPP). There needs to be a way to replicate the data in real-time and to segment resources so as to ensure that analytical queries don’t affect customers.
These systems exist! They’re not the figment of an anti-hoarding imagination. But they do require a fundamental change in how the organization approaches data and the systems that surround it. Give it a thought or two.
Andrew C. Oliver learned to code when he was 8. He founded the Apache POI project and served on the board of the Open Source Initiative. He writes a column for Infoworld and is the Director of Product Marketing and Evangelism for Couchbase.