Saradalakshmi8074
5 min readMay 27, 2020

Data Journalism

Facets of Data Science…

Visualize your data with facets

In Data Science and Big Data you’ll come across many different types of data, and each of them tends to require different tools and techniques. The main categories of data are these:

  • Structured
  • Unstructured
  • Natural Language
  • Machine-generated
  • Graph-based
  • Audio, video and images
  • Streaming

Let’s explore all these interesting data types..

Structured Data

Fig : An Excel Sheet is an example of Structured Data

Structured data is the data that depends on a data model and resides in a fixed field within a record. It’s often easy to store structured data in tables within data bases or Excel files. SQL, Structured Query Language, is the preferred way to manage and query data that resides in data bases. You may also come across structured data that might give you a hard time storing it in a traditional relational database.

Hierarchical data such as a family tree is one such example.The world isn’t made up of structured data, though; it’s imposed upon it by humans and machines.

Unstructured Data

Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or varying. One example of unstructured data is your regular email. Although email contains structured elements such as the sender, title, and body text, it’s a challenge to find the number of people who have written an email complaint about a specific employee because so many ways exist to refer to a person, for example. The thousands of different languages and dialects out there further complicate this.
A human-written email, is also a perfect example of natural language data.

Natural Language

Natural language is a special type of unstructured data ;it’s challenging to process because it requires knowledge of specific data science techniques and linguistics.

The natural language processing community has had success in entity recognition, topic recognition, summarization, text completion, and sentiment analysis, but models trained in one domain don’t generalize well to other domains. Even state-of-the-art techniques aren’t able to decipher the meaning of every piece of text. This shouldn’t be a surprise though: humans struggle with natural language as well. It’s ambiguous by nature. The concept of meaning itself is questionable here. Have two people listen to the same conversation. Will they get the same meaning? The meaning of the same words can vary when coming from someone upset or joyous.

Machine-generated Data

Machine-generated data is informative that’s automatically created by a computer, process, application or other machine without human intervention. Machine-generated data is becoming a major data resource and will continue to do so.

The analysis of Machine data relies on highly scalable tools, due to high volume and speed.

Examples are, web server logs, call detail records, network event logs and telemetry.

Example for Machine data

This is not the best approach for highly interconnected or “networked” data, where the relationship between entities have a valuable role to play.

Graph-based or Network Data

“Graph data” can be a confusing term because any data can be shown in a graph. “Graph” in this case points to mathematical graph theory. In graph theory, a graph is a mathematical structure to model pair-wise relationships between objects. Graph or network data is, in short, data that focuses on the relationship or adjacency of objects.

The graph structures use nodes, edges, and properties to represent and store graphical data.

Friends in social network is an example of Graph-based data

Graph-based data is a natural way to represent social networks, and its structure allows you to calculate specific metrics such as the influence of a person and the shortest path between two people.

Graph databases are used to store graph-based data and are queried with specialized query languages such as SPARQL.

Graph data poses its challenges, but for a computer interpreting additive and image data, it can be ever more difficult.

Audio, Images and Videos

Audio, image, and video are data types that pose specific challenges to a data scientist. Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for computers.

Multimedia data in the form of audio, video, images and sensor signals have become an integral part of everyday life. Moreover, they have revolutionized product testing and evidence collection by providing multiple sources of data for quantitative and systematic assessment.

We have various libraries, development languages and IDEs commonly used in the field, such as :

  • MATLAB
  • openCV
  • ImageJ
  • Python
  • R
  • Java
  • C
  • C++
  • C#

Streaming Data

While streaming data can take almost any of the previous forms, it has an extra property. The data flows into the system when an event happens instead of being loaded into a data store in a batch. Although it isn’t really a different type of data, we treat it here as much because you need to adapt your process to deal with this type of information.

Examples are the “What’s trending” on Twitter, live sporting or music events and the stock market.

These are the seven important facets of Data Science…

“Information is the oil of 21st century and Analytics is the combustion engine”.

Saradalakshmi8074