Does Your Data Have a Structure?

Haripriya Sakethapuram
Apr 27 · 3 min read
Photo by David Clode on Unsplash

Historically, organizations have mainly focused on structured data for any kind of visualization or analysis. Traditional systems and reporting still rely on this form of data.

However, with evolving world and swift increase in semi-structured and unstructured data sources businesses are evolving too! They are now exploring all 3 kinds of data, interpreting them and acting on those insights.

This article will walk you through what is Structured data, Semi-structured data and Unstructured data with examples. Also, present few differences between each of them to understand even better.

Before we proceed, let’s see what is a Data and Data structure.

Data is a set of facts such as numbers, descriptions, and observations and it is fundamental to business decision. It comes in many forms.

Data Structure is a collection of data values, relationships among them and operations that can be applied to the data.

Structured Data

Structured data is information that has been formatted and transformed into well-defined data model. It resides in predefined formats and is typically stored in a relational database (RDBMS). This means it conforms to a tabular format with rows and columns.

This kind of data is created by both humans and machine. You will find structured data in financial data, bar codes, machine logs, customer star ratings and so on.

Example: Excel files and SQL databases

Unstructured Data

Unstructured data is stored in its natural format until it’s extracted for analysis. It lacks pre-defined format and due to its nature, it cannot be stored in a relational database. Lack of structure made it more difficult to search, manage and analyze them. Hence businesses have widely discarded them. However, ability to store and process unstructured data has greatly grown in recent years. They are typically stored in data lakes (Data warehouse).

The amount of unstructured data is much larger than that of structured data. You will find unstructured data in social media activity, surveillance imagery, audio files, satellite imagery and so on.

Example: Word, Media logs, Text, PDF

Semi-Structured Data

Data may always not be structured or Unstructured — we have another category between them called Semi-structured data. Semi-structured data is a mix of both structured and unstructured data. It has definite characteristics but does not confine into a rigid structure which is required for relational database.

Photos in a smart phone is a semi-structured data. Every photo captured using a smart phone contains unstructured image content. However, they are also tagged with location, time stamp, device ID. After being saved in a device they can be tagged as a pet/ dog or a person name. In this case, you find a structure in an unstructured data. Interesting isn’t it?

Example: XML data, JSON documents, EDI and CSV

To further explain our topic without any technical terms, let’s use an analogy of a job interview process.

For context, a structured interview is one in which type of questions and order in which they are asked is pre-determined by HR team. This remains same for every candidate.

On the other hand, Unstructured interview process is one in which type of questions and order in which they are asked is purely interviewer’s discretion and could be entirely different for each candidate.

In Semi-structured interview process, interviewer will combine both Structured and Unstructured interview process. Here you will find few pre-determined questions but in a flexible order. Also, interviewer is provided with a window for building rapport and follow up questions.

Now let’s look at few differences between Structured data, Unstructured data and Semi-Structured data.

*Schema influences how data is stored and accessed.

*Tuple is one of 4 built-in data types in Python used to store collections of data which is ordered and unchangeable.

Businesses have access to massive amounts of structured, semi-structured, and unstructured data from different sources in all formats. Separating the data accordingly to its types is the first step to get the most out of the valuable data. Hope you enjoyed reading this article!

Originally published at https://www.numpyninja.com on April 27, 2021.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store