Industry-Level Data Handling

This will show you how you can leverage your company data effectively and efficiently

8 min readAug 7, 2023

What is data?

Based on Oxford Dictionary, data is facts and statistics collected together for reference or analysis. I repeat, data is facts.

Data is the source of truth. So, can we lie with data? Not really. We will discuss further about 6 dimensions of data quality that are mostly discussed in a lot of data management books, but in a more practical way that was found in the industry.

Data is everywhere. Data has become our closest ally (please don’t say enemy), and thanks to today's emerging technology, it is much easier to leverage data. From your daily shopping cart, Spotify playlist, to Netflix movie favorites.

Let’s check how the data is in the wilds. Covid-19 Pandemic data from COVID-19 Tracking Project and NYTimes can be a good reference for this case.

Let’s first understand what the is the meaning of these tables.

The date column represents when is the observation happen.
The state column represents where the observation happens (in the state of the US).
The cases column represents the occurrence of positive covid cases.

So we know the meaning of that data, we can interpret it like this:

On Jan, 21st 2023, the state of Washington has one positive Covid-19 cases.

Can we call this a data? Yes, because it is telling us a fact. And surely we can do a lot of analysis with this data.

Data handling is the art of managing the information from data collecting to the insight for making the right decision.

Poor data handling can lead to biased or even wrong analysis that might lead us toward making a wrong decisions. It can make us do a lot of tedious tasks because our data is so messed up that we need to fix our data first before processing it. And we don’t want that.

This short post will be dedicated to industry level data handling, meaning that it is intended for data frequently used in industry. This guide will light up your way to handle your data and make the most of it.

Tabular First

Tabular data is a row based collection of data, each row tells a single distinct information between others. This type of data is mostly found in a database, CSV, or spreadsheet. Usually, the first row of tabular data contains the header (or title) of each column.

As a rule of thumbs, stay as long as possible with tabular forms.

Let’s distinguish ourselves with the most common type of data found in the universe. To my best knowledge, there are two distinct forms of storing data:

Flat Report Style (Don’t know, but it’s what I personally called), which looks like this:

This form comes with a lots of merge cells, and the orientation is more of like columns-based data form. This type of data form is much more suitable for summary or final report that you would like to email to your managers, or even printed.

While this form of data storing might be more compact at the first glance, it will be harder to do some further analysis. You need to take an extra step to preprocess the data before you do some analysis, which is a tedious task to do.

And as soon as your data grow more prominent, you will get a headache to do the preprocessing task. I’m not really a fan of this type of data storing technique, the column based approach is not suitable to scale up. And the world has been introduced to pivot style form, which is a powerful tool to summarize your data.

Important Tips:

Please avoid using merge cells. Merge cells can easily breakdown your spreadsheet formula. Please store each value in a single cell. For visual reason, please use format cells > horizontal alignment > center across selection instead.

Use Horizontal Alignment, instead of merge cells (for visual purpose).

2. Tabular Style, looked like this:

A tabular data form is a way of storing data in a single row, and each single row contain one single information. Some people called it a database, because most database model stored in a tabular way.

Tabular data form is the most friendly type of data storing that will give you the ease of use to do further analysis. The row-based approach is suitable to scale up as soon as your data grow bigger. You just need to enter each data in a new row, no need to create new column.

Another advantages of using tabular form is that in Excel you can use filter or simply subtotal the column that contains numbers.

Every flat report style can be transformed into tabular data style, and vice versa. You can see in the picture above, that we transformed flat style into tabular by creating a new columns which are WH, Qty Pcs, and Qty Plt. We don’t need to separate each Qty Pcs and Qty Plt as they share the same information.

If you used a flat report style frequently, I highly recommend you switch as soon as possible to a tabular one. Instead, you can generate that flat report style with pivot table in less than a minutes.

1st Dimension: Completeness

This is the first dimension of data quality, which is completeness. Completeness means that our data store a minimum information about something we need. In our example, we have inventory-level information about each warehouses, in pieces quantity and pallet quantity. Completeness depends on what is your final goal. For this cases, it is enough to get the information of quantity in pieces or in pallet, per SKU to get the current inventory level of the warehouses.

Here is another practical issue when it come to completeness,

The Example of the Incompleteness of Data

In tabular format, each rows must be completely filled. You might want to redesign your tables if many cells are empty.

The form is acceptable because it is in tabular forms, but at some point date values are missing. The one that are making this report might argue that person with ID SC-0045 will work on Shift 1, on August 10 2023. And person with ID SC-0056, will be having shift 1 on August 11 2023. The report maker can easily interpret the report, but other people might not. The data itself is not clear and create ambiguity. So, please fill all the rows value completely.

Stay as Unique

One of your managers contact you, and assigned you a project to make a warehouse personnel data in your company. This data will be used by your colleague in another project to schedule personnel shift, not limited to other department that might want your data in the future.

So, your approach should be,create a data in tabular form, and start the data collection process. As the process finished, the data will be looked like this:

As you progressed, you might see that your data is complete. But, there is something missing. The void of indexing.

Here is the 2nd Dimension of data quality introduced: Uniqueness.

As we store our data in a single row, we need to create a unique key (index) for each personnel record. This is a critical steps, as this uniqueness will state that each record have a unique index to reference to. In data modeling terminologies, it is called a primary key, but let’s stay simple with unique key (UK).

Important Tips:

Indexing (unique key) at each record is a must. Without indexing, finding your data will be like finding needle in a haystack.

You can create the unique key (UK) all you want but the most common rule is that each key must be unique between each other, and no duplication allowed. UK should be permanent, meaning no change allowed. You might be tempted to use email or phones number as a unique key, but they are vulnerable to changes. You might want to avoid that.

For this example, you might assign unique key SC-0023 to John Doe, SC-0045 to Daniel Dane, and SC-0098 to Justin Petterson.

Warehouse Personnel Data, but with Unique Key

Be Consistent

In today collaborative era, each of individual in the company can work together to achieve one or two things. Let’s say, you are in production department and you are in charge in creating a production plan for the next two weeks.

Inconsistency happened in the organization

In your report, you stated “Latte Machine” to have a production target of 10 tons for the next two weeks. Then, there is Jessica from Finance Dept. that needed your data to calculate the budget and the pricing. Jessica has been used the word “Latte Wood Machine” instead. Then, there is a guy, named Jared. He need your data to calculate the resources needed for the production capacity. And he used “Latte” (OH PLEASE).

This chain of inconsistency is like a snowball effect. Each person used their own term, while referencing to the same things. Imagine when you want to create a report that consolidate from multiple person datas.

This is the third dimension of data quality: Consistency.

Be consistent, make sure your data is consistent on data that stored elsewhere.

There is another data quality that are interesting to be explained together with consistency, which is Validity. Validity will ensure that your data meet its spesific format. For example, in stock level, there will never be minus inventory. Another example is when dealing with dates. Dates is a tricky datatypes. In US, the dates commonly in MM/DD/YYYY, then in Europe the dates become DD/MM/YYYY.

Example of invalid data when the date doesn’t conform to the same format.

Other Data Quality

There are other data quality that will not be explained in details such as accuracy, and timeliness.

When Product Qty of 34 units, we want our data to show that the true quantity of product in our warehouse is exactly 34 units. Same when the datas show routing distance of 10 km, of course we want the actual distance as close to 10 km. Accurate is dominantly used during the data collection phase

Timeliness is how your data is actually get updated. When you entry the data into the ERP, then ERP system inputed in 15 hours, late 2 hours from normal setting. So, timeliness will measure how realiable your data in realtime. We will not cover it as it is much more technical side.

End Notes

Thanks for reading, follow me to read more about Supply Chain, Data Analytics, and Optimization.

Industry-Level Data Handling

This will show you how you can leverage your company data effectively and efficiently

Tabular First

Important Tips:

Stay as Unique

Important Tips:

Be Consistent

Other Data Quality

End Notes

Written by Rihot Gusron