Data Cleaning Checklist in Excel

Esther Wairimu
2 min readJul 28, 2024

--

Creating a Checklist in Microsoft Excel to help in Cleaning and Preparing the Brazilian E-commerce Data for Analysis

image from pixabay

This article gives an introduction to the analysis project.

To use the Olist Brazilian E-commerce dataset required me to clean the datasets and use them for analysis in SQL and Tableau. Since there were nine datasets, I created a cleaning checklist to avoid forgetting which steps I took to clean the spreadsheets and save time. Thankfully, the people who make the datasets available in Kaggle do a great job in providing clean and well-described datasets, and I did not have a lot of cleaning to do. It must have been a lot of work cleaning the raw data and making it presentable. Yes, let people get paid.

I got lots of tips on how to clean spreadsheets on YouTube from Edgecate, Alex the Analyst, and Kenji Explains. These videos by Excel Campus-Jon and Chester Tugwell helped use Text to Columns, to convert dates into date format. There are many more useful tips on Tik Tok too, but these are what I used.

Here is what the checklist looks like:

I had a third column that showed whether the task was complete or pending.

A challenge I encountered was trying to translate the reviews into English. Microsoft Excel has a limit to the number of rows it can translate, and using the GOOGLETRANSLATE function in Google Sheets would cause the spreadsheet to hang. In the long run, I did not translate the reviews dataset.

However, I used the reviews score field and counted the rows that had review comments when I created visualizations for the Tableau dashboard. I was still able to continue with my analysis in Microsoft SQL Server and Tableau using the other datasets and extract meaningful insights.

To view the articles on SQL and Tableau analysis, click here and here.

Thank you for reading.

--

--