My First Data Cleanup Journey using Excel

Oluwaseyi Akinsola
4 min readDec 7, 2023

--

A man doing data cleaning

INTRODUCTION

This is my first time cleaning up data, and it feels interesting. The data comes in a .txt file; seems confusing at first, but with the help of my instructor, I’m exploring this new world.

My job is to organize this digital mess. With a lot of curiosity, I’m starting by putting the data into Excel. Follow along as I figure out this new adventure of cleaning data, turning all the confusion into something clear and understandable.

Data Cleaning

Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets. This mean filling in missing details, correcting errors, or making sure everything follows the same rules. See it like cleaning up your room — you want everything in the right place and looking good. Similarly, data cleaning makes sure that the information is in good shape, so when people use it, they can trust it to be right.

OBJECTIVE OF THE PROJECT

The objective of this project is cleanup the dirty data. Find out if there is any inconsistencies, incomplete data

Data Quality issues observed

Wrong data types and inconsistency of data

Incomplete values

Steps in Data Cleaning

Step 1: Open Excel and Start a New Worksheet

Open Microsoft Excel and start a new worksheet where I want to import the data.

Step 2: Locate and Open the .txt File

Click on the “File” tab and select “Open.” Navigate to the location where your .txt file is stored and open it.

Step 3: Import Wizard Opens

Excel opens the Text Import Wizard since it recognizes I am trying to open a text file. The wizard helps me set up the import.

Step 4: Choose Delimited and Click “Next”

In the Text Import Wizard, select “Delimited” as the data type and click “Next.” This tells Excel that your data is separated by specific characters.

Step 5: Select Delimiters

Text Import Wizard

I Choose Semicolon as the delimiters that I used in the .txt file. Click “Next.” and then “Finish”.

Step 6: Data Cleaning

Here is where the real work begins. First, I familiarize myself with the dataset, which pertains to employee information. Then, I commence the data cleaning process. I observe an irregularity in the Date column, and to address this, I use the =VALUE(B1) formula to convert its values. Subsequently, I format all rows to adhere to the date format.

Upon reviewing the Salary column, I format it to display currency.

For consistency, I prefer to have my text formatted as sentence case. To achieve this, I utilize the =PROPER function, ensuring that all text initials are capitalized.

I also notice abbreviations in the Department column, specifically “MKTG” for Marketing and “R & D” for Research and Development. I employ the SUBSTITUTE function to replace these abbreviations with their corresponding full forms.

These steps contribute to the refinement of the dataset, making it more uniform and conducive for further analysis.

=SUBSTITUTE(SUBTITUTE(PROPER(D1), “Admin”, “Administration), “Mktg”, “Marketing”), “R & D”, “Research and Development”)

Substitute Function at work

My data is almost ready for analysis. I’ve defined clear headings for each column, including “Full Name,” “Date of Appointment,” “Salary,” “Department,” and “Roles.” To enhance organization, I’ve transformed the entire dataset into a table. This not only improves visual clarity but also provides a structured format for efficient data handling. With these steps, the dataset is now well-organized and primed for further exploration and insights.

Structured the entire dataset into a table format

Now, the data looks much better and is easier to work with. The dates, salary, and text are all in good shape. Departments make more sense, and the table layout is clean. These changes set us up for better analysis and understanding of the information. The data is ready for the next steps!

Thank you for following along as I shared how I turned messy data into clean data.

--

--