Data management needs a new ideology
Today we shove growing data in new servers and clouds, sequester it into applications for compliance reasons and discard it in tape libraries — forever blind to what that data is actually composed of. It’s not totally our fault. The enterprise environment is different than it used to be. It’s an evolving entity — which from the perspective of those who are in charge of managing it — is growing and changing organically. Not just users but machines are creating a tidal wave of new data. Not just IT but line of business employees can provision a new application in seconds. No longer is data neatly compartmentalized into infrastructure that is in the room with us. The average organization has 977 cloud applications. I dare you to name 977 cloud applications.
At Veritas Technologies we’ve witnessed the velocity and complexity of data growth consume our customers resources. It is clear that the industry should at least explore new ways to approach the way data should be managed. For enterprise organizations, the ideology has always been infrastructure led. More data means more disk, more compute and more power. Moore’s law and disk price pressure has historically reinforced that thinking but the digital transformation is now accelerating far faster than those models can support.
So we thought we should start with the data. Instead, we should begin with understanding the data that we are creating, storing and managing on a daily basis. What are the core attributes? What is its profile? What is the data about our data that helps us understand our environment’s genetics? Could we better diagnose and solve the challenges data brings us if we looked first at the nature of that data?

To answer these questions, we launched a community of like-minded data scientists, industry experts and thought leaders to begin the Data Genomics Project with us. The Data Genomics Project is an initiative that seeks to change the way we think about managing data. Beginning with these initial questions, we believe this community will surface the data-genome that matters for information management, help us build it, and share the discussion with a world struggling to solve tremendous data growth challenges.
At Veritas we are uniquely positioned to move our industry down this path. Because we work with tens of thousands of organizations throughout the globe backing-up, archiving and analyzing Exabyte’s of data, we are the only company that can glean the defining characteristics of an organization’s environment at scale.
The first step in this project is to begin to benchmark accurate details of real environments — from the file type composition, to the average age distribution to the size proportions of their individual files. We’re pleased to contribute the Data Genomics Index to the cause. In this report, Veritas analyzed 10’s of billions of files and their attributes directly from many of our customer’s unstructured data environments in 2015 to provide the first accurate view into what the average environment looks like. The data about our data may surprise you. Head over to www.datagenomicsproject.org to check it out and stay tuned for updates on the work of the Data Genomics Project.
�