Data Cleaning and Preparation Techniques

Abdurrahman Elkhadrawy
Data 100
Published in
5 min readAug 15, 2024
Image from dataversity.net

Data Cleaning is very important when it comes to effectively analyzing our data. Imagine going through a full scale analysis just to find that your data wasn’t accurate or had duplicate values that skewed your data to a certain direction. Ensuring the quality of our data and making sure our data is prepared for analysis is just as important as the analysis itself. Lets explore some data cleaning techniques and ways we can find and replace values and more in this article!

Date Functions (TODAY, YEAR, MONTH)

Calculating Duration

Lets say we wanted to calculate how long a customer has been with the company from the begin date to the anniversary date.

  1. In Excel we can do this by simply subtracting the cell with the anniversary date with the cell that contains begin date to get the total amount of days that have passed.
  2. However we are more interested in the years instead so all we have to do is wrap the formula is parenthesis and so some basic arithmetic and divide it by 365. So for example it will look something like this “=(C2-B2)/365” and there you go.

Extracting Parts of a Date

DAY Function: Used like so DAY(cell that contains date) to extract the day as a number from that cell.

MONTH Function: Used like so MONTH(cell that contains date) to extract the month as a number from that cell.

YEAR Function: Used like so YEAR(cell that contains date) to extract the year as a number from that cell.

TEXT Function: If instead we want a text of the day for example instead of a number like 22 but a word like Tuesday. We can do this, TEXT(select cell that contains date, “dddd”). We can put 4 d’s to signify day or 4 m’s to signify month and finally 4 y’s to signify year.

Date Functions Usage ( Click to enlarge GIF)

IF Statements

IF Function can be used to apply logic similar to conditional formatting. Lets say we wanted to figure out which of the cells is over budget or under budget. We would normally have to go through each one and see if it is and type our answer. If we have 1000’s of rows you can see why this starts to become a problem. Lets see how we can use IF functions to fix that.

Syntax: The Syntax is as follows “IF(logical_test, value_if_true, value_if_false)”.

For example in the case of finding which employee is over budget or under it could be applied like so. Assuming the budget column is B and the amount spent column is C here is the formula. “IF(C3<B2,”Under Budget”, “Over Budget”)”.

IF Statement Usage ( Click to enlarge GIF)

Freezing Panes

Sometimes while scrolling to either your right or most commonly down we lose track of the header rows. This causes to forget what our data looks like and what values belong to which column. To help avoid this issue lets use the freezing panes feature provided by excel.

  1. Lets say we wanted to freeze the first two rows of our table that contain headers. We first select the cell below it so A3.
  2. Then go to the view tab then Freeze Panes, and select Freeze Panes.
  3. If you wanted to just freeze the one row at the top we can do that without selecting any cells. Go to the view tab then Freeze Panes then simply select Freeze Top Row.
  4. To freeze columns, select the cell to the right of the columns to be frozen and apply Freeze Panes. Remember though this will also freeze the rows above it almost like the cell you choose creates a intersection. Freezing what's to the left and above it.
Freezing Panes Usage (Click to enlarge GIF)

Find and Replace

Whether you want to find particular keywords to help with your analysis and errors or replace text and values with formatting and more this tool got you covered. Lets see how we can do that.

  1. Find Function: First go to Home > Editing > Find & Select or use CTRL + F. Click options to search within the sheet, workbook, or by rows. Then its highly recommend to click Find All to list all occurrences instead of seeing them one by one. Highlight all found items using CTRL down. After that you can even use the fill to fill all the items highlighted by a specific color
  2. Replace Function: Use CTRL + H to access Replace. You can use this to replace text or formatting. We can also use formatting when it comes to find function but be careful. It will not just copy the fill color for example but the bold styling as well. so Be cautious with formatting to avoid unintended changes.
Utilizing the Find/Replace Tool ( Click to enlarge)

Removing Duplicates

One of the most common things to do during the data cleaning process is to remove duplicates. That could be rows that are entirely identical or a row that contains a single column that's the same as another. Lets see how we can fix such cases.

Highlight Duplicates: First like highlight our range that we will be working with. Then go to Conditional Formatting > Highlight Cell Rules > Duplicate Values. This will use conditional formatting to expose which values are duplicated. Which will give us a sense of what we need to clean up.

Remove Duplicates: Lets first Select data and go to the Data tab, then click Remove Duplicates. Then check columns to identify which duplicates to remove. By default all columns will be selected.

Understand that only rows with duplicates in the selected columns will be removed. That means if their are 3 columns and all 3 boxes are checked only fully identical rows will be deleted. That means if their was a duplicate in only one of the columns that won’t be deleted so we need to account for that.

Removing Duplicates ( Click to enlarge GIF)

Overall, from using DATE functions to extract specific information from our dates to using IF statements creating powerful conditionals to speed up our workflow. Excel provides for us many ways to prepare and clean our data.

I encourage you to practice these techniques especially removing duplicates to make sure that your data is well cleaned and organized before proper analysis. Once the data is cleaned we can delve deeper to organize our data to find insights we couldn't see before. Which is what we will do in the next article. Stay Tune! Good morning, Good evening and Good night.

--

--