Masking Data to Protect Personal Information in Salesforce (Part 1)
Why mask your data?
Once upon a time, all your data was on-premises, in a place you either owned or had physical access to. And that data was kept in systems you directly controlled and secured. And that data was used by your developers and testers who were employed by you and worked in your buildings, using your desktops and laptops.
To help keep things simple, maybe you used production data in your development and testing processes. Development and testing access could be viewed as “just as secure” as your real, live systems, so data loss risks were mitigated to a satisfactory degree. Sound familiar?
Admittedly, this probably reflects a world that has not existed for some time now. The reality of where your data is, how you control access to it, who’s working with it, and why they’re working with it is almost certainly very different today. Data that was once on-premises is now being moved to cloud applications (like Salesforce!), your developers and testers could be consultants or freelancers on short contracts, and those developers and testers are almost certainly working remotely and using their own equipment. Let’s also bear in mind that the development lifecycle that these developers are using will no doubt include use of your sandbox Salesforce environments. Sandbox user accounts usually come with elevated levels of data access (compared to access granted in production) to support development access needs. That brings with it the likelihood that access to personal data, protected through sharing controls in production, is not restricted in those sandbox environments.
All this means there is a good chance that any personal data you are using in development and testing is making its way out of your controlled environments and into the wrong hands. Whether the data loss is accidental or not, minimizing your exposure to such risk by controlling who has access to personal data while still supporting realistic use cases should be a top priority.
So, simply put: it’s time to think about masking your data so you can be more flexible in how you approach development and testing activities without increasing your exposure to data loss risks. This could mean the difference between getting the latest enhancement out there to support a fast response to an emerging situation in your industry or missing the opportunity altogether.
A note regarding the scope of this series: while there are plenty of sound regulatory and risk management reasons to look at data masking in your production systems, this series is focused on development and testing processes. However, many of the concepts discussed are applicable to production environments as well.
What is data masking?
Let’s use data masking as the umbrella term for a collection of techniques that enable you to control how your data is stored and how it appears to users.
Replace your identifiable data values with randomly generated values and you’ve anonymized the data. If I turn “Richard Booth” into “hftwppfbvmkfex”, I’m confident no one will be able to work out that it’s meant to be a person’s name. This works best when you need data present for processing reasons, but not necessarily in a format that makes it easily used by human beings. (We like things to make some sort of contextual sense when we look at them, but computers really don’t care.) Data values like ID numbers and bank account details are usually good candidates for anonymization.
You pseudonymize your data by replacing a data value with a value that looks like it could be the real thing, but in fact is not. Why do this instead of anonymization? It helps with contextual recognition of data in processes. For example, if “email@example.com” is pseudonymized and turned into “firstname.lastname@example.org”, we can still see it’s meant to be an email address and treat it accordingly. The fact it’s not David’s email address is what matters here; his personal data has been protected.
Don’t want anyone to see certain records or fields? Then delete them from your Salesforce development and testing orgs, or just don’t load them in the first place. Either way, the data loss risk for that particular subset of data through development and testing processes just became zero. The downside is that any processes that rely on a deleted field value or particular record will break without some attention.
How is data masking different from encryption?
Your data can be encrypted in transit and at rest. Encryption in transit prevents unauthorized access to data while it’s moving from one place to another; for example, HTTPS traffic between Saleforce.com and your browser is encrypted so no one in between can snoop on the network traffic and steal the data moving between those points. Encryption at rest prevents unauthorized access to data stored on digital media, including solid state or hard disk drives, magnetic tapes, flash drives, and so on.
Salesforce provides you with two options for at-rest encryption: Classic Encryption and Shield Platform Encryption, part of Salesforce Shield. Classic Encryption enables you to encrypt data in special custom fields using 128-bit AES keys. It also gives you an option to present that data back to users in a masked format. Shield Platform Encryption enables you to encrypt a wide variety of fields using 256-bit AES keys but does not provide for any data masking. Both options are available for any of your organizations, production and sandboxes alike. For more details, see How Shield Platform Encryption Works.
Generally speaking, you mask your data to stop authorized users from seeing data you don’t need or want them to. You encrypt data to stop unauthorized users from accessing your data. Use cases for masking and encrypting data are usually different, and sometimes used in combination. For example, consider credit and debit card number management for PCI compliance; you encrypt the card number at rest and mask a portion of the number when displaying it to users.
Choosing to encrypt data is not a decision to take lightly. It will have implications for accessing and processing data, so it’s important to take time to work out whether it’s really what you need.
I’m writing this as we’re in the midst of a global crisis, all playing our part in battling the COVID-19 pandemic. I’m seeing organizations moving at pace to address some very serious and pressing issues. From my own experience working with nonprofit and education organizations, I see big changes happening very quickly, from the virtual delivery of services and educational programs, to rapid increases in the scale of support operations and, in some cases unfortunately, the closure of some functions and activities.
Across the breadth of Salesforce customers, new team members are being drafted in to help with the load, whether they’re trusted volunteers, temporarily reassigned staff from other departments, or paid-for external resources. Change management and governance processes are being flexed to accommodate the rate of change required. Of course, protecting personal data is always a priority, but during all of this wouldn’t it be great if you didn’t have to worry about it for development and testing because your data was automatically protected?
Part II of this series covers how to mask data in Salesforce and tips to get your project started.
About the author
Richard Booth is a Customer Success Architect at Salesforce.org in EMEA. He helps nonprofit and higher education organizations make the best possible use of Salesforce technologies to deliver impact and support their mission. You can connect with Richard on LinkedIn.