New GCP firewall (3.0) and secure tags
It’s been some time since Firewall 3.0 and secure tags are out but I still often see some confusion when it comes to understanding how they work. Given their recent growing demand, some colleagues and I decided to take some time to go deeper into them and share some key learnings here.
I’ll split this journey in two readings: this one around theory, and a following one with a quick lab.
Firewall 3.0 improvements
Firewall 3.0 brings lots of great additions that cover different gaps. I’ll go through the ones that look more relevant to me.
Network firewall policies
The new network firewall policies (which are not the legacy firewall rules) allow or deny traffic on a VPC. These come with new APIs, have dedicated Terraform resources, different gcloud commands and even a different GUI.
More in general, a firewall policy is a collection of firewall policy rules, which are -again- very similar, but not the same as legacy firewall rules. Instead of firewall rules, firewall policy rules have a unique identifier (read priority) and support more and different filtering mechanisms, depending on where they are configured.
Network firewall policies can either be global or regional, meaning they can optionally apply to workloads living in a single region.
Network firewall policies are also hierarchical, just as the hierarchical firewall policies that you apply to organizations and folders. The way I go about them is that there are hierarchical policies that you can apply at multiple levels throughout the infrastructure: orgs, folders and VPCs.
Starting from the top, this is the inheritance order: org, folder, legacy firewall rules, global network firewall policies, regional network firewall policies. At each level, you can either enforce an action (allow, deny) or delegate the decision to levels down the chain (using goto_next). This allows administrators to enforce infrastructure-wide rules or to safely delegate decisions locally.
During a migration, it’s important to stress that GCP evalues legacy firewall rules before than network firewall policies, whatever their priority is.
Speaking about network firewall policies association, you can bind one to multiple VPCs but one VPC can’t have more than a network firewall policy attached.
A (not so) new GCP resource: secure tags
Imagine you have three VMs: a database, a backend and a frontend server.
The backend should talk to the database. The frontend should communicate with the backend. Using legacy firewall rules you can associate strings (aka network tags) to VMs and create two firewall rules to allow them to communicate. For example, you could say, let the instances tagged “backend” talk to ones tagged “database”. While this works, it comes with significant security caveats: whoever has the ability to modify the VMs configuration (i.e. compute admins) can also modify their tags. This means somebody -not necessarily in charge of the security- can easily change the “frontend” tag to “backend”, thus gaining unauthorized access to the database machine.
Firewall 3.0 introduces secure tags: these are the same tags already present at the organization level, but they have been expanded to be associated with VPCs. Secure tags have their own IAM profile, allowing administrators in charge of them to delegate their use on specific resources (i.e. projects or VMs) to specific identities.
Let’s see what the most common roles associated with secure tags are:
- Tag administrator: you give it at the org level and it allows you to create, manage and delete tags (including policy tags and secure tags)
- Tag user: determines who is able to use a tag (key or value). Administrators have to assign it on the tag itself (either to the key or to the value) and on the resource where the tag will be bound (i.e. a VM or a project)
Secure tag bindings
While you could only associate legacy network tags and service accounts to a whole VM, you can now bind secure tags to VM network cards (NICs). This allows you to enforce different rules on different NICs.
To better understand how the binding process works, keep in mind secure tags are key-(multi)value pairs. You can associate secure tag keys to VPCs and tag values either to projects or to VMs. Given that VMs can’t have more than one NIC in the same VPC, the combination of these two bindings associates a key/value to a NIC.
You can use secure tags in network firewall policies only, meaning they can’t be used for hierarchical firewall policies associated with organizations or folders.
Using secure tags over VPC peering
Network tags and service accounts don’t pass through peerings, meaning you cannot -for example- filter ingress traffic referring to a network tag (or a service account) associated to a VM living in a VPC, peering with yours.
When using network firewall policies instead, you can filter traffic referring to secure tags associated to VM NICs attached to VPCs, peering with yours.
Design notes
While I still have limited experience deploying and managing secure tags and firewall policies at scale, a few ideas came up during the first brainstorming sessions. I well realize these may not apply to all cases but they could be a place to start.
Given the values we see in the official documentation at the moment of writing (max 1000 tag keys per org, max 1000 tag values per key) and the ability to associate only one network to each tag key, using the name of the VPC as the tag key would seem appropriate. Values may instead represent the role or the security profile of a certain group of machines within that network (i.e. backends, frontends, NVAs, critical, production and so on).
I think that for most of the deployments, where a dedicated team is in charge of security, you can greatly simplify the roles assignments: the security team would be in charge of creating secure tags, assigning them to the relevant NICs and of creating the network firewall policies (meaning they’ll need to be tag administrators on the org, tag users and security admins on the projects they manage, where the workloads run). The applications/compute teams would maintain the same permissions and would be simply passive to the whole process.
In more complex scenarios the compute team may get the freedom to assign certain tags autonomously. In these scenarios, it’s important to understand you’ll need to assign roles more carefully (thus possibly adding a management overhead). On the other hand, if you will assign permissions too generously, you’ll run the risk of soon ending up in the same situation you were using network tags.
It seems clear that a good architecture will probably be a mix of hierarchical firewall policies (which don’t use secure tags) and of network firewall policies, possibly using secure tags.
Please, show me some code
Deal! Let’s put in practice what we saw in the next article (coming soon!).
Conclusions
This was just a quick overview, which probably deserves more readings of the official documentation. As always, refer to it for the latest information and updated numbers.
Hope you enjoyed the reading. Stay tuned for more stories and insights from GCP.