MarTech in the Time of Data Deluge

In the digital marketing era, data cleansing is more important than ever. Here’s how to clean it up.

Andy Hoover
Slalom Data & AI
6 min readSep 29, 2022

--

Photo by Mikael Blomkvist from Pexels

Data is critical to growth in the digital age

You need it. You need lots of it. And actually, you probably already have more than you know what to do with!

Worried? Don’t be. You can work with data deluge. New software comes available every day and many can help you mix, mingle, and match data allowing you to surface your target customer and deliver an optimized message, personalized just for them.

But, what does that look like?

Today, we have the ability to merge disparate data points to understand a single person so holistically that our MarTech forbears would blush. With the right data, you know that “Jane” subscribes to your emails, is a frequent web shopper, and favors patterns and specific colors. With the right data, you can learn that Jane shops every three weeks, buys a lot of red clothes, and wears a size 12. Shoes? Size 6. With the right data, you can target Jane while she’s shopping, selectively recommending only items in her size and favorite colors.

Ooh, that’s nice. Right?

Not so fast — clean that data!

Choosing, buying, and implementing a CRM, CDP, or other data warehouse is exciting — so exciting, in fact, that data hygiene is often overlooked. Whether you’re using Oracle or Salesforce or a dozen Excel sheets in a bedazzled three-ring binder, you must not ignore the importance of data cleansing and quality. If you want to win Jane’s interest, purchase selection, and brand loyalty in the digital age, remember these four nonnegotiable tenets of data hygiene: field-level validation, primary keys, duplicate prevention, and regular checkups.

1. Field-level validation: Data that makes sense

Wherever you collect your data, you must set up some basic parameters, (constraints you use to define valid data values for a field).

How?

Make sure you have a rule that prevents junk like “email@a.a” from entering your pristine database. Instead, confirm that email addresses end in .com, .org, or .edu. Confirm it’s com and not c0m or cm. Apply similar constraints for phone numbers and any preference or personal data entered by users. Drop-down-pick lists — as opposed to freeform text fields — are great for this. Field-level validation is quick, easy, and worth spending the extra 5% effort when configuring data collectors.

2. Primary keys: Keys to the kingdom

A primary key is a database field (or combination of fields) that defines a unique record — some attribute of your data that every record must have and that cannot be duplicated in a table. Most marketers and database administrators use email address or an ID number. You’ll need to decide what defines a unique record and what is a duplicate.

For example, if partners Jane and Joe share one email address, are each of them stored as a separate customer record? Or will one record suffice since it’s just one inbox — one address to which you’ll send your latest email campaign or newsletter? Ultimately, the nature of your business and digital ecosystem should guide your decision.

A primary key can’t be duplicated within a table, so it’s great for bouncing forgetful returners, email sharers, or fakes. A primary key does not have to be an email address either. If you decide Jane and Joe should represent two independent records, you can:

  • Assign each record a General Use Identification (GUID) of twelve random numbers when a user hits “submit” on your e-commerce site.
  • Assign a “composite” primary key — itself defined as the unique combination of multiple fields — such as email + first name. Using a composite primary key means records for Jane and Joe can co-exist harmoniously without the extra effort of implementing a random ID maker.

Whatever you decide, define it early and stick with it, across all sources and databases.

What about two databases with different primary keys? Please, don’t do that. Why? Because now, Jane exists twice, and you can’t say which Jane is the “best” Jane, because one is ‘Jane0345’ and the other is ‘Jane@Jane.Jane’

And now you’re realizing you probably should have implemented that .com validation, huh?

@jane.jane

Really?

3. Preventing duplicates: Cleanliness = Godliness

Much like soap keeps the stink away, clean data has no unintended duplicates, no bad data that breaks logic, and no missing data that turns database rows into deadweight. Duplicates are among the worst data quality issues and are to be prevented above all else. Primary keys help, but — by design — they can permit certain kinds of duplicates to exist.

When a prospect or customer is represented two or more times in your database, the picture of them fractures — half their data accrues in one record and half in another. Instead of Jane-the-triweekly-shopper-in-red-dress-size-12, we get Jane-the-size-12. Good, but not great.

Fear not! The Salesforce Customer Data Platform (CDP) is one of many platforms with powerful rules engines that help to merge once-disparate records. Whether you’re using CDP or just a simple query and your own two eyes, what matters most is that you prioritize the effort.

4. Regular checkups: Wash, rinse, repeat

The work of data cleansing is never done. To maintain all the amazing results from your earlier efforts in field-level validation configuration, primary key definition, and de-duplication, you must audit your data-capture mechanisms. Stress-test your compliance/privacy functionality. Continuously hunt down and resolve duplicates. (Bonus points for ferreting out how duplicates created and implementing a fix to prevent it programmatically.)

While database maintenance is more than just clone hunting, don’t fret. Everything layers very cleanly when done systematically and regularly. Take time to proactively plan your approach, rather than react to emergencies.

For example, are you checking for bots or junk emails? It’s a little tricky, but email addresses that are all numbers — for example, ‘10102565@yoymail’ — should be a red flag. Yellow, at the very least. Balance a suspicious email by checking the domain and the end-user engagement (e.g. open rates for emails are noisy — thank you, Apple — but click rates are great). If you have rich alternative data, consider looking for a purchase, or using login activity, which should help alleviate concern.

I also recommend frequently checking for bad domains from your users.

yoyomail

Seriously?

Field-level validation can help, but something like yoyomail.com can appear legit to a field check. If you see a domain repeated with lots of easily generated addresses, you might have a problem. Even Gmail can be abused if you start seeing things like greyskull1+1@gmail.com followed by greyskull1+2, greyskull1+3, etc. While getting a list of bad domains is not usually a big source of data quality challenges, paid tools can help and remember, quick, regular, manual checks will surface problems more easily than you think.

Finally, set rules for retention and engagement. Not every record must persist forever. As with blueberries, regular pruning will produce better fruit. So much better, in fact, that “cast a wide net” will leave your vocabulary. Auto-removal based on time is a bit heavy handed, but when paired with other data — clicks, purchases, logins, downloads, or anything trackable — you can reduce weight and maintenance while boosting efficiency.

Conclusion

Giant databases brimming with intriguing data are not new. So why should data cleansing be front of mind now? Because of where digital marketing is headed and how important clean data will be for that process. The future — and increasingly, the present — requires lethally accurate one-to-one marketing. The business that sends target-optimized communications that feel as personal as a handwritten birthday card is going to win. Period.

To win this race, you’ll need to merge many data sources, manifesting a perfect picture of a target individual. You need to know that Jane is that Jane. You need to know that order #1567 is a match to jane@jane.jane who is subscribed to your weekly ad and doesn’t want the monthly flash sale notification. In order to learn that Jane likes red and tends to place an order about every three weeks, you need to know that order #1567 had red shoes (size 6) and a red shirt (size large), and so did the three orders before.

Clean data will get you there. Anything less will not.

Slalom is a global consulting firm that helps people and organizations dream bigger, move faster, and build better tomorrows for all. Learn more and reach out today.

--

--

Andy Hoover
Slalom Data & AI

Senior Consultant for Slalom Global Marketing Cloud Team. Have been working with Salesforce Marketing Cloud for 8ish years.