The basic definition of dark data is data that has been collected, but is unstructured and, therefore, not currently being used. It is data that has been continuously collected and stored, but has not been organized via categorization, labels, or any other effective organization tool. Though this massive treasure trove of unstructured data could hold valuable insights if it were to be organized and, subsequently, analyzed, it is currently in the “dark”. Potentially highly influential in the decision-making processes of a business, dark data is often waiting indefinitely to be evaluated and analyzed via data analytics.
Examples of Dark Data
One example of dark data is a customer call record. Potentially holding valuable information on a customer’s thoughts and geolocation, these types of records are regularly recorded and stored, but rarely organized or analyzed. Another example of dark data is a website log file. Potentially holding valuable information on visitor behavior and traffic, these logs are regularly collected, but rarely analyzed in any organized or meaningful way.
Growth of Dark Data
According to a 2011 IDC study, 90% of digital data is unstructured data, or dark data. The study also found that the world’s digital data is doubling every two years, significantly faster than Moore’s Law predicted. New technologies and technological advancements are paving the way for low-cost solutions to capturing and storing massive amounts of information. In 2011, the overall cost of capturing and storing large amounts of unstructured information dropped to just one-sixth of the cost seen in 2005. We are closer than ever to a cost-effective method for analyzing dark data.
Issues with Dark Data
Considering the increasing awareness and usage of big data and data analytics, there is now a large demand to organize dark data and make it usable. However, this type of data is often complex, very large in size, and stored in multiple locations. This makes analysis very difficult and costly. Nonetheless, the potential value of analyzing unstructured dark data is staggering. Due to the potential value, there have been many proposed big data solutions.
Solutions to the Dark Data Problem
- machine learning, or allowing some type of artificial intelligence to develop a computer program that changes and improves based on a constant supply of new unstructured data
- open data, or making unstructured data available for everyone to analyze and explore
- software that converts dark data to graphics, or creating a program that automatically organizes data into easy-to-understand graphics
All feasible solutions, these methods are now being actively explored and attempted by various companies. In the race to acquire and utilize the newest and most valuable big data, new and better technologies will continue to emerge. A developing field, the potential value and insights of large amounts of dark data remains undetermined.