Big Data
What is Big Data?
In simple words huge amount of data is called as big data.
So let me explain from different angle:
Data that is produced every day say by your mobile phone, computer or any other device which is then stored in a remote computer called the ‘Cloud’. Now lets just think for a min that not just your devices but millions of people use mobile phone and other devices, so the amount of data produced every day is pretty huge.
The data is not uniform it can be a video, an audio, a text or say numbers or any other format. Just having huge data is of no value. Without the ability to process it, it is just data occupying space. The real value or the ‘Magic’ per say is in processing it efficiently and getting insights out of it, if you have come across a term ‘Big-Data Analytics’ that is it.
The data generated is very huge and the storage of the same can not be done in traditional systems such as relational databases on a single computer. Monolithic approach is not the very best in terms of money to performance efficiency when it comes to bigdata.
A very good example is the logs generated by systems(servers) running 24X7. The amount of the logs that are generated is huge and the structure of the log can differ with the system. Since enterprise deployments are heterogeneous the logs can be structured, semi-structured or unstructured. The storage of these can not be done in a single system which has strict rules for data storage for e.g. a database.
Let’s explore more about types of categories of data.
Structured Data
The data which follows a strict schema (rule) is called as structured data. It is the most easily process-able data, meaning you write a regular expression or a rule to extract information and that shall be applicable to the entire data set. Since the data obeys the schema and the extraction/processing rule is written w.r.t to the schema it valid throughout.
Semi-Structured Data
The data which follows a schema loosely is called as semi-structured data. The data overall follows a schema but the schema itself is quite liberal to accommodate variations in data. This is slightly tough to parse this type of data because the liberal part of the schema allows for special cases and while processing one has to account for the special cases.
Unstructured Data
The data which does not obey any schema is called unstructured data. This one is the toughest to process since there is not schema to refer to. The representational level operations can be done but higher level operations are costly w.r.t time and processing capacity.
BigData is worthless until and unless there is a way to process it!