How to Protect Your Google Cloud Organization from Security Misconfigurations Using Cloud Native Auto Remediation

Don Santos
Google Cloud - Community
5 min readApr 20, 2022

--

Header photo from Pixabay

Why do we need auto remediation?

Many companies today are finding out the expensive way that security should be emphasized more during architecture planning. For example, an S3 Bucket exposed personal information for over 500,000 students in Ghana ¹. Another S3 Bucket exposed images of 12,000 patients for a Japanese Healthcare firm ². Sephora was recently found to have an S3 Bucket that leaked over 490,000 customers’ private information, including card numbers, full names, email addresses, and phone numbers ³. According to Trend Micro, in 2018 and 2019, breaches caused by cloud misconfiguration cost companies nearly US$5 trillion . Let’s talk about that. Say a developer creates a SQL Instance on a Google Cloud project you manage and there is a security policy in your organization that says all traffic should be encrypted. At the time of this post, Google Cloud does not have any way to enforce SSL connections only for SQL instances via Organization Policy constraints . Microsoft Azure provides a way to enforce this at the API level via Azure Policy, Amazon Web Services (AWS) allows you to create granular IAM policies for denying resources that aren’t allowed. AWS also provides AWS Config that allows you to auto remediate non compliant resources. At the moment, Google Cloud does not have any native services that remediate resources like Azure Policy or AWS Config; so… here we are.

Lines of Defense Against Misconfigured Resources

Let’s talk about the different areas where we can prevent and detect the deployment of misconfigured resources. We’ll call them your “lines of defense”.

Your first line of defense is compliant Terraform modules that have been blessed by the security team. These modules meet the standards that the security team have defined and contain the correct configurations to be baseline secure.

If for some reason the users are able to check in Terraform code that has not been approved by security, your next line of defense would be security gates in your CI/CD pipeline. Centralizing where you run Terraform not only allows organizations to control what can be deployed, but also gives them the ability to add security checks before actual deployment. In our case, we have IAC (Infrastructure as Code) security checks, either using tools like tfsec or Snyk, to scan our Terraform code before it gets applied.

Here is where things get a little interesting. We can say that the next lines of defense are your Organization Policies and VPC Service Controls. However, if for some reason an org policy or VPC service control can’t cover what we’re trying to prevent, we should build the guardrail for it.

We can take a look at two example use cases here. The first one is enforcing only SSL connections for SQL Instances, which I’ve already covered in the intro. The other use case is pretty much a scenario that can happen in any organization. A super admin user or someone who has excessive permissions tries to change something in your organization. In this case, let’s use IAM roles as an example. The user assigns someone with the Owner role… for the entire organization. If it was a valid change, sure, we can let it slide. But that’s for a different discussion. In our case, it wasn’t a valid change.

This is where the guardrails come in. We essentially have an Organization Log Sink filtering API calls such as a call against IAM specifically with the owner role. If that happens, that log will get sent over to a pub/sub topic, and trigger cloud functions to delete the change and notify the security team about the change. This remediation may have just prevented someone from getting access to data they should not have had access to.

The Value of Remediating Things Natively

But Don, can’t we do this with vendor tools??? Yeah sure, but with this native capability, you’re only paying for the services you’re using in Google Cloud. Reaction and remediation is near real time. You don’t have to wait for logs to go to a third party tool since the functions are getting triggered at the time of the API call. No licensing costs or other overhead to worry about besides the cost of the services used in Google Cloud. You have visibility to the code running on your functions. You have the power to write your own functions based on your organization’s use cases.

About Don

Don is a Security Manager at Accenture, focused around Application Security, Cloud Security, and DevSecOps. He’s taken on roles with Fortune 100 firms to run cloud and application security assessments, while also implementing security testing and controls around the client’s ecosystem. Interested in deploying this out to your environment? Let’s connect!

Disclaimer:

My postings reflect my own views and do not necessarily represent the views of my employer, Accenture.

The information in this blog post is general in nature and does not take into account the specific needs of your IT ecosystem and network, which may vary and require unique action. You should independently assess your specific needs in deciding to use any of the tools mentioned. The tfsec and Snyk, etc. tools are not an Accenture tool. Accenture makes no representation that it has vetted or otherwise endorses these tools and Accenture disclaims any liability for their use, effectiveness or any disruption or loss arising from use of these tools.

Originally published at https://www.linkedin.com.

--

--