Big Data Defined !!

In very simple words as it has been told by IBM, “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is Big Data.”

In other words we can say Big data usually includes data sets with sizes beyond the ability of commonly-used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target and as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set.

Big Data basically spans 5 different dimensions:

1. Volume

Many factors contribute to the increase in data volume — transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, etc. In the past, excessive data volume created a storage issue. But with today’s decreasing storage costs, those issues are eliminated.

2. Velocity

According to Gartner, velocity “means both how fast data is being produced and how fast the data must be processed to meet demand.” RFID (radio frequency ID) tags and smart metering are driving an increasing need to deal with torrents of data in near-real time. Reacting quickly enough to deal with velocity is a challenge to most organizations.

3. Variety

Data today comes in all types of formats — from traditional databases to hierarchical data stores created by end users and OLAP (online analytical processing) systems, to text documents, email, meter-collected data, video, audio, stock ticker data and financial transactions. By some estimates, 80 percent of an organization’s data is not numeric! But it still must be included in analyses and decision making.

4. Variability

In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something big trending in the social media? Perhaps there is a high-profile IPO looming? It might be an event update or details of an upcoming event. Daily, seasonal and event-triggered peak data loads can be challenging to manage, especially when social media is involved.

5. Complexity

When you deal with huge volumes of data, it comes from multiple sources. It is quite an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Data governance can help you determine how disparate data relates to common definitions and how to systematically integrate structured and unstructured data assets to produce high-quality information that is useful, appropriate and up-to-date.

A few examples of Big Data:

  • 10,000 payment card transactions are made every second around the world.
  • Walmart handles more than 1 million customer transactions an hour.
  • 340 million tweets are sent per day. (That’s nearly 4,000 tweets per second).
  • Facebook has more than 901 million active users generating social interaction data.
  • More than 5 billion people are calling, texting, tweeting and browsing websites on mobile phones.