Top 5 Excel Data Cleaning Skills I Use Everyday

MargaretEfron
Learning Data
5 min readJul 31, 2023

--

Photo by Anton on Unsplash

Learning how to clean up messy data is an essential skill for any data analyst.

I wish the data coworkers sent me was always clean and ready for interpretation! But unfortunately, as a data analyst, you will be sent Excel spreadsheets from years ago, from people who no longer work at your organization, with spelling errors, incorrect classifications, and many other issues. Time to roll up your sleeves and get to work data cleaning.

Before embarking on data cleaning, make sure to create a backup copy of the original data in a separate workbook, in case you need to double-check it later.

Below are my top 5 Excel data cleaning functions that I use every day in my work as a business systems analyst.

1. Replace cell values using the “Find and Replace” feature:

If you want to replace certain cells with specific values, you can use the “Find and Replace” feature.

Press ‘Ctrl + H’ to open the “Find and Replace” dialog box, enter the values you want to find, and put the desired value in the “Replace with” field.

Click on “Replace All” or “Replace” to replace the values.

Example: Yesterday, I was analyzing student survey information where students entered their country in a text box. They entered varying values, including “United States of America.” I wanted to clean up the data so it only listed “United States.”

I used the “Find & Replace” dialog box to correct “United States of America” entries to just list “United States.”

Unfortunately, Excel does NOT currently support entering multiple values in the “Find” box, so if there are multiple values you want to replace, you will have to do the find & replace multiple times.

Note: Google Sheets is more powerful than Excel for replacing multiple values because it allows you to use full “regular expressions” in your find/replace. This is way more flexible than what Excel gives. (See this article on how to use REGEXREPLACE in Google Sheets.)

2. PROPER function to correct improper capitalization:

If you have a column with inconsistent capitalization, PROPER can help! I often use the PROPER function to correct capitalization in user-entered fields for first name, last name, and/or city and state. This is particularly helpful for cleaning up Qualtrics data that has been downloaded in an Excel spreadsheet.

Add a new column next to the column with incorrect capitalizations.

Write the function = PROPER( , highlight the cell you want to correct for capitalization, close the parenthesis, and hit enter.
You will see the proper capitalization of the cell value.

Drag down this function to apply to all the rows you wish.

Example: In this case, I needed to correct the capitalization of the “city” column, so I used the =PROPER(A2) function.

Some students incorrectly capitalized their city names! No problem, I can use the PROPER function to fix it in an adjacent column.
Once I drag down the PROPER function to cover the remaining rows, all the cities have fixed capitalization.

3. IF Function:

The IF function checks for multiple conditions and returns different values based on different variables. You can check for one value — for example, you can check if a cell value contains “Georgia,” and if so, have a corresponding column list the region as “South.”

You can also check for multiple states and return corresponding regions using nested “IF” functions.

Example: See this sample code:

=IF(A2= “California”, “West”, IF(A2= “Georgia”, “South”, “”))

In this code, Excel checks cell A2. If A2 contains “California”, it returns “West.” If it contains “Georgia,” it returns “South. If neither condition is met, it returns an empty string (“”), leaving the cell blank.

Here, I use nested IF functions to list the corresponding region for each state.
Once I apply the nested IF functions and drag down the formula to the remaining cells, the corresponding regions are listed for each of the states.

4. CONCAT Function:

This is often used when you want to create a single “Full Name” column by combining a “First Name” column and a “Last Name” column. To combine first and last names, use the CONCATENATE function.

Example: First, add a column next to the “Last Name” column called “Full Name.”

Then, enter the formula: = CONCAT(A2, “ “, B2)

I used the CONCAT function to combine the first and last names in the “full name” column.

This CONCAT function combines the first name and the last name and adds a space in between.

Press Enter, and then drag the formula down so it covers the whole column.

Now that I’ve dragged the CONCAT function down to the remaining cells, everyone has their full name listed.

5. Custom Sorting to find extreme outliers in the data:

For student-submitted survey data, often they enter information incorrectly, including their annual salary. There are a few different ways to handle this: you can set up conditional formatting so that cells above or below the average salary are highlighted.

You can also use custom sorting to sort the column values from smallest to largest, and then from largest to smallest.

Example: I custom sorted the annual salaries in this spreadsheet from largest to smallest to find any extreme outliers. Tom Smith is now listed at the top as a clear outlier — he is making $1 million a year, while his coworkers are making between $50,000-$60,000.

Custom sort the annual salaries in the spreadsheet from largest to smallest to find any extreme outliers.
After custom sorting, we can see that Tom Smith leads the pack with a $1M annual salary (or perhaps he entered his salary incorrectly.)

For correctness, be sure to custom-sort your values from smallest to largest to find any extreme outliers in the other direction as well. How you handle these outliers will depend on your organization’s practices.

— -

I hope these Excel data-cleaning tips were useful! These are the Excel functions and features I use the most often to standardize the data, correct formatting and spelling errors, and find extreme outliers. When cleaning your data, Microsoft recommends completing tasks that don’t require column manipulation first, such as spell-checking or using the Find & Replace dialog box.

Pro Tip: Use ChatGPT to enhance your Excel data-cleaning skills! Ask ChatGPT to provide sample dummy Excel data and provide the # of rows and columns you’d like. Practice cleaning the dummy data using tips from ChatGPT.

Example ChatGPT prompt: “Can you produce an Excel spreadsheet with dummy data in two columns: city and state. For city, include cities with improper capitalization so I can practice cleaning them up. Provide me with 5 rows of dummy data.”

Additional sources: Microsoft — Top 10 ways to clean your data

--

--

MargaretEfron
Learning Data

I love all things data and write about Excel, Power BI, and SQL. I currently work as a Business Systems Analyst at the Darden School of Business.