How I cleaned my First Dataset Using Microsoft Excel

Bolanle Olajide
5 min readDec 9, 2023

--

So, I just dived into the world of data analysis, and let me tell you about cleaning my very first dataset in Microsoft Excel.

Introduction:
In order to ensure accurate and trustworthy findings from data analysis, cleaning dirty or unclean data is an essential step in the process. We will go over step-by-step methods in this guide for efficiently cleaning and organizing data in Microsoft Excel.

Step by Step Process into Cleaning a dirty data.

The fact that the material I used is in a text format must be noted first.

1. As can be seen in the picture below, I opened my Microsoft Excel application first in order to clean my unclean data. I next selected Open, followed by Browse.

2. After that, I selected the "dirty file" and then clicked Open. The image below shows the prompt that appeared after selecting the "open" option.

3 The Delimited option is automatically selected by Excel in the image above. I then clicked on Next, which brought up another prompt and automatically selected the Tab option. Next, I selected the semicolon option, and last, I clicked on the Finish function.

After that, the dirty data is delivered and prepared for cleaning!

In order to know what we will be working with, it is expected of a data analyst to thoroughly examine the data. In this case, the data relates to every employee in the organization.

Now let’s clean up the dirty data properly. I started by changing the names in the first column from upper case to proper case. I changed it to a proper case by clicking on column B first, then using the insert function to insert another column, which brought up an empty column B.

Next, I clicked on the cell I wanted to effect and the proper function (=PROPER(A1)).

Finally, I hit the tab key on my keyboard. The outcome of this is shown in the picture below. We may observe that the name has changed.

I then used autofill to make the necessary changes in the other cells. The content in column B was then pasted into column A using the copy and paste function. Next, I thicked the Value option. Below, you can see pictures

Next, I selected Column C, then I selected Insert to add an additional column, and last I selected Cell C1 and entered the (=value (cell B1)).
Next, I pressed the tab key on my keyboard.This is the result that appears below after I autofill the result.

Then I selected copy and paste special into cell B.I then changed the format to Short date by clicking on the number tab, as you can see below.

The outcome

I cleaned Column C using the same method as Column B, but I replaced the format of Short Date with Currency. The outcome can be seen below.

I proceeded to Column D and made the same case changes to the characters as I had in Column A. Below is the outcome.

I then used “substitute function" to change the abbreviated words to full words. Using =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(PROPER(D1), “Admin”, “Administration”), “Mktg”, “Marketing”), “R & D”, “Research and Development”) . The result is seen in the picture below.

Regarding Column E, I used the same method. I created a new column, changed the text in column E1 to proper case using the PROPER function, and then affected the remaining text in column E using Autofill. Then, just like I did in the previous columns, we copy and paste special.

At this point, the dataset already appears appealing. Simply glance at it .

With each column, I made a header. Here’s the final product after I highlighted every character and used the Ctrl T function to turn everything into a table. And this below is the Clean Data

--

--