From Google Cloud Blog: “Improving security, compliance, and governance with cloud-based DLP data discovery”
So, I’ve been doing some blogging at Google Cloud blog with most posts connected to products, launches, etc. However, I am also doing a fun blog series on DLP in the cloud. Blog 1 is here, and blog 2 is here — you can also see a long quote from the second one below.
Note that our DLP (called Cloud DLP because we loooove creative product names here) can do a lot of very cool “tricks” related to data transformation, data de-identification and even re-identification risk analysis (due to its privacy origins). These cool capabilities will be covered in the next blogs in the series, because, frankly, they are quite magical and deserve to be more known and used. They also play well to addressing the real threat models and data security use cases people have today.
For now, check out “Improving security, compliance, and governance with cloud-based DLP data discovery” with a long quote below.
One of the more critical, but sometimes forgotten, questions related to data security is how do you find the data you need to secure. To security newcomers, this may sound contradictory; surely you have that valuable data firmly in your hands?
In reality, many types of sensitive, personal and even regulated data get “misplaced” at some organizations. For example, cases where payment data — credit card numbers, in particular — is found outside of the formally defined Cardholder Data Environment (CDE) have been strikingly common over many years. Sadly, they often come to light during a post-breach investigation or, somewhat better, during a PCI DSS assessment by a QSA.
Similarly, recent attention (such as due to GDPR or CCPA) paid to personally identifiable information has led to cases where personal data was discovered in unexpected places. Furthermore, the accelerating pace of cloud migrations means that there are more cases of personal data being uploaded to the public cloud. It happens sometimes without the necessary controls, and, in fact, without awareness of security and privacy teams. For example, a test instance of a data analysis application may be moved from the data center to the cloud, without thinking that the test instance used production customer data. In fact, perhaps it was acceptable to use personal data for testing while the application was developed and then deployed internally, but now public cloud changed things.
These and other similar cases have elevated the importance of data discovery, a key component of DLP technology. As we noted in our previous blog, sensitive data discovery is critically important for security, compliance and privacy initiatives. Thus, there is value in knowing where your sensitive data is at any time, whether it is in the cloud or not.
Perhaps surprisingly, one can still see situations where sensitive data discovery is a “hard sell” with security leaders. Some leaders see the value in preventing the leaks (and theft) of valuable data across the perimeter, but not necessarily the discovery of the data inside the perimeter.
However, the fact is that such thinking has become outdated in the cloud era! The perimeter has morphed in many ways hence simply sitting at the border (that is, if you can find the border to sit on) looking for departing data is no longer real (that is, if you assume that it ever was). In light of this, there are organizations that consider a broad accidental disclosure of sensitive data inside their organization to be “an internal data breach”, even though the data was never seen departing from the company. In fact, in a global organization, such internal disclosure may violate rules because it may make the data visible by employees from other countries.
Hence, the only approach that works today is protecting sensitive data by starting with knowing where it exists. This may have been conceptually true for years, but today this is also true operationally. Cloud has made this true!
Still, there is a substantial debate about sensitive data in the cloud. One survey found out that “71% of organizations report that the majority of their cloud-resident data is sensitive.” However, the real challenge is that it is very likely that many organizations have sensitive data in the cloud and they are not aware of what data and where in the cloud. Gartner recently noted that data discovery plays a role in Data Access Governance (DAG).
Hence, even though discovery on its own does not make the data “more secure”, it is a critical first step to take. It can make decisions about the data (approving access requests, sharing, retention, etc.) more informed and thus more secure.
What to discover?
The definition of sensitive data remains the subject of some debate in the security community. Some define it as data that, if revealed, will cause harm; some focus on data that others may want to steal; and some use the pure regulatory definition (hence substituting “regulated” data for “sensitive” — perhaps not a very logical change).
Still, there are some types of data where there is broad agreement that such data is considered sensitive (even though the universal definition of “sensitive data” perhaps remains elusive):
- Regulated data such as payment data, personally identifiable information (PII) and many types of personal health information (PHI).
- Corporate secrets and other data that is sensitive because it is clearly valuable for business.
- Data that if made public will cause harm, negative PR or other damage to a company and/or its brand.
It is very likely that entire industries and even specific companies can identify many other types of data considered sensitive. Note that valuing data as a business asset is an area of much research.
When to discover?
Our conversation here focuses on sensitive data in the cloud, hence it is useful to relate our discovery activities to cloud migration. Sensitive data discovery has value across the entire migration process.
- Before cloud migration — this helps plan what data can be moved to the public cloud and whether additional controls will be needed when it moves to specific cloud services. This ultimately helps organizations make an informed decision about sensitive data in the cloud.
- During cloud migration — this focuses on validating that the data being migrated is being moved into the properly secured areas. It also checks for mistakes with data classification (e.g. moving secret data to an open environment by mistake or moving regulated data into an environment without the prescribed controls). This may also be used to drive data transformation (masking, tokenization, de-identification) for reducing the risk.
- After cloud migration — this looks for mistakes in placing the data, moving data from more protected to less protected areas by mistake, and many other user cases. This activity evolves into an ongoing set of discovery activities that continue indefinitely. Security and compliance implications of this may include changing permissions, moving data to more protected areas and of course encrypting it.
To migrate and operate sensitive data workloads in the cloud, you would very likely utilize a combination of all three of the above…