How to Use Data Cleaning to Clean up Your Customer Data

Noah Wieder | US Data API Blogs
6 min readJun 4, 2019

--

Too long to read? Listen to audio version!

There’s a joke in data management that data scientists spend 80% of their time cleaning data, and 20% of their time complaining about data cleaning.

It’s hyperbole. But, the reality is that most businesses have really dirty data. This is especially true with customer data. Customer data is easily falsified. It’s plagued by data entry errors. It decays incredibly fast. And customer data is difficult to correct.

That’s why, on average, 30% of a company’s customer data is incorrect. And one-fifth of companies believe that over half of their customer data is inaccurate. This presents a strong case for data cleaning.

Clean customer data improves almost every aspect of your business. Any customer profile that has incorrect or missing information is essentially a dead lead since you can’t contact that customer with the data you have. Dead leads in your database are wasted space.

Cleaning your customer data turns incomplete and incorrect leads into usable ones. This means:

  • More efficient customer acquisition.
  • Smoother customer communication.
  • Better overall customer experiences.
  • More productive business operations.
  • Better data analysis for strategic and tactical decision making.
  • Increased revenue.

As you can imagine, bad data will have a negative impact on all of these areas. So, it pays to know about data cleaning and to implement good data cleaning tools and processes.

I’ll answer all the important questions in this article, starting with the most essential one: what is data cleaning?

What is Data Cleaning?

Data cleaning — also known as data cleansing — is finding incorrect records in a dataset and removing or replacing them with clean data.

This can be tricky with customer data because there’s no master database for validating the data. And manual data verification is extremely time-intensive, not to mention ineffective. So, you’ll need data cleansing tools in order to cleanse your customer lists.

While it’s true that there’s no single source of all customer data, the data is available. It’s just dispersed throughout a lot of databases. And that’s the trouble with getting correct personal information.

Data cleansing tools automate the process of checking all these separate databases and retrieving the information you need. Good data cleaning tools check millions of data sources to get accurate data.

But, even with good data cleaning tools, there’s still a process for cleaning data. The complete data cleaning process can be broken down into two broad data cleaning steps:

  1. Identify and fill in missing values.
  2. Correct existing data.

That’s the gist of cleaning customer data. As you might have guessed, these steps break down into smaller steps. Those steps are the next topic.

Data Cleaning Steps

Even with the right tools, there are right and wrong ways to go about cleaning your data. A good data cleaning process will get the best results. Proper data cleansing goes like this:

1. Standardize data organization and formatting. Before you can use any data cleaning tools, your data needs to be properly organized. Each piece of personal information (name, address, phone number, email, etc.) needs to be in its own column. Mixed information in a single column often confounds data cleaning systems.

First, export your data from your database, or if you already have it in Excel, you’re ready to standardize your data.

When you do this, also ensure that each piece of information is correctly formatted and that the entries are valid. Remove entries that have letters or non-numeric values where there should be only numbers (such as zip codes and phone numbers) and entries with invalid characters (like @ or ‘ symbols in names or physical addresses).

You can do all of this for free with the built-in data validation functions in a spreadsheet editor.

2. Append missing data. Missing data and incorrect data are equally unusable. However, finding missing data and correcting existing data are slightly different processes.

Adding missing information to an incomplete data-set is called “data append.” To do this correctly, you’ll need to separate your customer lists based on what data is missing. Create separate lists of entries missing phone numbers, entries missing addresses, and so on.

Data append services use the data you have, to find the data you’re missing. You’ll need to specify which pieces of information you need (i.e. addresses, phone numbers, etc.). If you upload a list that’s a mishmash of different data append needs, you’ll get an error or poor results.

Only remove entries from your list if they can’t be completed through a data append service and can’t be used without the missing information. Otherwise, you could lose a lot of customer information.

Once you’ve appended all the missing data to your lists of incomplete entries, these lists will be good to go. The data append service always adds the most up-to-date information it finds. So, there’s no need to clean the information that’s been appended to your lists.

If you want, you can use these lists in the next step to check the accuracy of the data you already had for each entry.

3. Update and correct existing data. Now you’ll be left with a list or lists of complete customer data entries that need to be checked for accuracy and updated. Even though you’re cleaning this data, you’ll use the same data append service that you used for completing your incomplete entries.

First, you’ll need to make a duplicate of your list. This gives you a reference, so you can tell which entries have been updated, even if you don’t keep the old list.

Open the duplicate list you created. Remove the information you want to update from all the entries. So, if you want to ensure all the addresses are correct, remove all the addresses from the list. The data append tool will add the most up-to-date information available to these fields.

You may have to run your list a couple of times if you need to update every piece of information. Typically, you can update several fields at a time, it just depends on what they are.

If you have entries complete with names, addresses, phone numbers, and emails, you’d need to run your list twice — once to update the phone numbers and addresses, and one more time to update the email addresses.

If you plan on using the email addresses or phone numbers for outbound marketing, you should also verify the email addresses and phone numbers. Data cleaning will gather the emails and phone numbers associated with the names and addresses you have.

But, you also want to know if phone numbers are on the do not call list if the email is an unmonitored inbox or a spam trap, and other details that cause issues if you’re not aware of them.

Once you’ve processed your lists and updated all the information, you have two choices:

  1. Simply replace all the old information on your original list with the updated information.
  2. Compare your lists, analyze your data accuracy, and only replace outdated information.

Either method will work for correcting data inaccuracies. The second just gives you more information for identifying issues in your data collection and management processes.

Once you’ve cleaned your lists, you should have a lot more usable customer profiles. You can plug your clean lists into your telemarketing, email marketing, direct mail campaigns, and any other customer outreach channels you use.

You’ll get a much better return after you’ve cleaned your lists because you’ll hit far fewer dead ends.

If you need data to append and list cleaning tools to round out your data cleaning strategy, check out the Searchbug batch services and API.

Or learn more about improving your data quality.

--

--

Noah Wieder | US Data API Blogs

Exploring US data APIs. Everything from identity verification, to lead gen, to compliance. Follow for insights to help businesses leverage data effectively.