Simple Ways to Do Data Cleaning in MS-Excel

Lakshmi Prakash
Design and Development
8 min readJul 1, 2022

It’s easy to manually edit and clean a small dataset, but what when you have several tens of columns and several tens of thousands of rows or more? Excel spreadsheets are one of the most commonly used tools to store, read, and analyze data, so let’s start with a few tips today to clean large datasets on MS-Excel.

Why Should You Clean Your Data?

One of the major reasons to clean your dataset is to get accurate results. You want to know that your datasets don’t have several duplicates or irrelevant values that you knowingly take into account. The added advantage of a clean dataset is that it becomes easily readable, hence easily comprehensible for anyone in your team or anyone you’re planning to present your data to. (Of course, we are only talking about datasets on Excel spreadsheets here and basic data visualization.)

Simple Data Cleaning in MS-Excel

Making Data Readable:

To begin with, consider the column widths — does the data look so congested that you can’t even read the contents of many rows because the column width is the default one or it’s shorter than the content length? You can easily change the width of the columns in Excel. For example, let’s look at this one below: you can’t make out what is written in columns G and H, no? What you can do now is clicking Ctrl + A to “select all” and then choose some column’s header, put your cursor on the right side of the header cell, and drag it towards the right. This would re-arrange the entire dataset for you, increasing the widths of all columns just enough to make the content of all cells readable.

A dataset with data too congested and column widths being set to default

Now, see, the data looks far more readable and easy on the eyes.

The same dataset with column widths increased to make it more readable

Deleting Blank Rows:

It’s only common to have blank rows in datasets. You might want to delete blank rows because if you don’t, not only would the data not look consistent and readable but it will also stop you from performing certain actions that Excel is very well known for!

For example, see, Ctrl + A should let me select all the data, but see, when I click on a cell randomly and use Ctrl + A, it selects only a part of the data, that is the blank rows above and below the selected area act like blockers, preventing Excel from selecting them all. This is just an example.

Ctrl + A not selecting the entire dataset because of blank rows

Nobody’s lying when they are saying Excel is highly powerful with a lot many in-built functions. It makes life really, really easy for statisticians, researchers, analysts, accountants, and those who have to work with data.

Removing Duplicates:

Excel has an in-built function which will help you remove duplicates, too. And that’s often needed in data analysis. The same person could have made two or more entries in a survey by mistake. You could have several people who might have made such mistakes. In those cases, you must delete or remove duplicates from your dataset because your data analysis wouldn’t be showing actual, honest results. You can easily do this by going to “Data” and clicking the “Remove Duplicates” button.

Removing Duplicates

In this example, you can see that row 3 is a duplicate of row 2. There is only one value that separates row 6 from rows 2 and 3, and that is the “size” value. Now, when you hit “Remove Duplicates”, Excel will give you options, letting you pick your own definition of “duplicates”. If you want a row to be deleted *only* if every single value is a duplicate of the original entry, then let all the boxes/column names be ticked (which is the default option you will see). But if you want to have only one “staples” entry, then uncheck all other column names, and let only “Item” be ticked. This will tell Excel that when the same Item name is repeated anywhere in the list, it has to be treated as a duplicate (irrespective of what other columns’ values are) and be deleted. You also get to draw boundaries and select a range within which you want duplicates deleted. Now, because there is an empty row between the first six rows and row 8, when the selected cell is anywhere above the empty row and “Remove Duplicates” is used, it won’t affect any information o row 8. This way, you can pick and choose which duplicate data you need to remove.

Automatically Rewriting Commonly Used Words to Avoid Typos:

Let’s say whoever created the dataset is either not all that good in English or they were in a hurry, maybe, so there are far more typos than you can imagine. Well, if it’s commonly used words in Excel, we can easily get Excel to rewrite them for us. You can see what I mean -

You have to just rewrite or edit the content in first cell in the list (first row or first column), then use the fill-handle and drag it, so Excel will understand that that’s the spelling, spacing, and format you want for all those in the list, with just the numbers alone changed accordingly.

Data with Typos
Using the Fill option for automatic filling

Tired of Dragging the Fill-Handle?

Tired of dragging the fill-handle down long spreadsheets? Well, this can be annoying if you’re working hard to meet a deadline. When you know that starting from a certain row onwards, everything below that row should contain the same information, then instead of dragging, just double-click the fill-handle. Be they empty columns or pre-filled columns where you want the entries changed, this move will automatically change it all for you.

Deleting Unwanted/Outdated Data:

Let’s say you figure that a certain number of rows are no longer required or make sense in your dataset anymore. This could be wrong information or irrelevant or outdated, which happens often. Removing unnecessary data is part of data cleaning. Now, how can you delete/remove a certain group of data?

Going back to the stationery store dataset, let us say you are no longer interested in notepads and you want to remove them all from your dataset.

You can select the header of the first column, go to “Data”, then click “Filter”. This will add a down-facing arrow/triangle to the column header. If you click the same, you will see the default option showing a list of items that come under this section, with every box ticked. What you need to do is uncheck them and select only “notepad”. This will result in only the rows which have their first cell entry as “notepad”. Now, you can delete them.

Using the Filter function in Excel

Rewriting/Changing Values to Update Your Dataset:

Using the same example we see above, we can use the filter function to separate a select group of data. Now, if there are updates I want to make or typos I want to clear or changes to be made to all of these rows, since I already have it filtered, I can go make the change in the first row, then use the fill tool to drag and update all the values.

For example, if I no longer deal with “medium” sized notepads, then I can change it to “large” by juts making the change in the first row and using fill. Once changed/updated/modified, I can remove the filter by clicking the triangle/arrow and checking “select all”.

You can do this in different ways. You can use “Find & Replace” on the Home page. You can select a set/range of data, go to home, then click “Find & Replace”, then type the exact word that you want to find, click “Find All”, then open the Replace tab on the same pop-up, enter the word, and click “Replace All”. This is the preferred option for many.

Using “Find & Replace” on MS-Excel

You can also present your cleaned up data as a simple bar chart or pie chart. All you have to do is select your data, right click and choose “Quick Analysis” and open the “Charts” tab there. This would give you a list of options. Click “More” and open the “All Charts” tab. You can pick the style you want, and you’ll get to see a quick, simple graphical representation of the data. See, you don’t need to master advanced tools for data visualization if your project ideas are simple. Excel will do!

A Simple Bar Chart on MS-Excel

You can do much more powerful data cleaning using more advanced techniques. These cover only the basics, meant for simple and easy data cleaning, say for your school project or something. If you are interested, you might want to consider learning advanced MS-Excel. You will surely not regret it, especially if you aim to become a data analyst or data scientist, and in that case, this is a good place to start.

--

--

Lakshmi Prakash
Design and Development

A conversation designer and writer interested in technology, mental health, gender equality, behavioral sciences, and more.