Despite the title, probably most of the audience will really know what a DMZ (demilitarized zone) is in networking security. In this article I’m going to analyze it a bit in detail. We will see how this concept translates to GCP (Google Cloud Platform) and for that we need to understand not only the design but the intent of a DMZ. My goal is to analyze DMZs and the broader concept of network segmentation.
A DMZ is a network typically exposing public services like web, DNS or email functions, in a subnetwork separated from the internal network of a company. The main purpose of this subnet separation is to protect internal systems while providing these services to untrusted networks like the Internet.
The external access to these services makes them a target for malicious people trying to exfiltrate data from a network or cause damage in some way. They could exploit vulnerabilities in their software to get access to a system from where they could launch further attacks to other internal systems. By isolating these public services from the private network in a DMZ or perimeter network, we can provide a layer of security to stop these kind of attacks.
The essential element to control this segmentation is the firewall. Firewalls are useful for filtering traffic between subnets allowing or denying traffic based on a set of rules that will look into packet addresses, ports, header fields or even content of messages.
The basic firewall deployment architecture to create a DMZ is based on a two-tier design. The figure below shows an example.
The firewall has three interfaces each attached to a network. Hosts in the DMZ would need to traverse the firewall to reach systems in the private network, but the firewall doesn't allow any incoming connections to it. The private network can access the Internet and the DMZ to use its services or manage the servers.
Other DMZ designs use two serial firewalls, with the DMZ in between. This setup is more complicated but adds another layer of security. In the first design, if the firewall is compromised due to a vulnerability or is somehow misconfigured the private network might be exposed to attacks. With two firewalls between the external and the private network now two devices would need to be compromised to get access. We can add even more security employing two different firewall vendors so an exploit found in one is unlikely to be present in the other, although this also adds more complexity.
Sometimes companies need to connect to third party networks, or they may use multitier designs with firewalls between every layer of their applications. Real world networks can get complicated, but in general concepts shown here will apply.
The GCP world
Can we implement such DMZ designs in GCP? Yes. The firewall devices will be software appliances deployed on Virtual Private Cloud (VPC) networks, and we will see that instead subnetwork segregation we will need different VPCs for each network. The following figure shows an example.
You may be not familiar enough with GCP, but I assume you will understand the figure above. It is quite similar to figure 1 and behaves similarly, except that hosts are replaced by virtual machines (VMs) and instead of using only subnets, we use full VPC networks that are isolated and independent network domains.
The question is: Is that the right way to implement the DMZ concept in GCP? Well, that's another story. You may have heard that GCP is different, that things aren't done as if it were merely an extension of your data center. I would tell you exactly that. And you may think this is simply a sales pitch, but the truth is that GCP is really really different. Let me explain you why.
GCP uses a Software Defined Networking (SDN) approach. This model means there are substantial differences with a traditional data center network, some of which I will address here.
To start with GCP is a layer 3 (L3) network, if any, not a layer 2 (L2) network. A classical network uses subnetting to create multiple logical networks where hosts in a subnet can talk to each other directly, and through a router if talking to a different subnet, with the help of ARP to discover corresponding MAC addresses.
In GCP we have VPC networks, subnets and VM instances attached to them, but subnets here are simply an organizational tool to group instances and control your IP address space. Subnetting has no effect on whether or how a VM can reach another since all the VMs in a VPC have direct visibility to each other no matter which subnet they are in. Indeed, no matter which region of the world they are in. A VPC network provides a full mesh of global reachability, so from the point of view of an instance any other VM is just one hop away. The SDN or network virtualization stack that runs in the host of every VM handles this internally.
This doesn’t mean the whole VPC network is a broadcast domain, it is just the opposite. We can observe this by inspecting the network mask applied to VMs. If we create a subnet range 10.0.0.0/24 and attach VMs to it, every VM will get assigned a /32 mask as if it were the only host in a subnet:
eth0 Link encap:Ethernet HWaddr 42:01:0A:00:00:02
inet addr:10.0.0.2 Bcast:10.0.0.2 Mask:255.255.255.255
When sending a packet, the instance will send it to the subnet's gateway MAC address regardless if the destination IP is outside or within the subnet range (remember it was declared as /24). The instance will make an ARP request to resolve the gateway’s MAC address, and that's everything ARP will be used for:
$ ping 10.0.0.5
PING 10.0.0.5 (10.0.0.5) 56(84) bytes of data.
64 bytes from 10.0.0.5: icmp_seq=1 ttl=64 time=2.01 ms
Address HWtype HWaddress Flags Mask Iface
10.0.0.1 ether 42:01:0a:00:00:01 C eth0
In reality there is no separate gateway in a VPC's subnet, everything goes through the SDN stack that will route packets to the destination, but this satisfies L3 and L2 requirements in the VM network stack.
So you see there are no bridges or switches in GCP networking. There is routing with some differences, and Cloud Router, but I won't extend more on that.
The global reachability is a big differentiator and an advantage for network designs, we don’t have to work on interconnecting different regional subnets. It also has notable consequences, in particular a full mesh means traffic within a VPC can’t be redirected or intercepted. Taking the DMZ example, it wouldn’t be possible to implement it inside a single VPC using subnetting only since the DMZ subnet and the private subnet would see each other, and the SDN routing can’t be overridden. For the firewall to be able to filter the traffic between networks, each network should be in a different VPC and the firewall appliance attached to both to connect them, as was shown in the figure 2.
GCP Firewall represents another big difference comparing to firewalls in classical networks. As with routing, firewalling is implemented in the SDN stack. We have already seen designs using firewalls devices/appliances in the DMZ examples. GCP Firewall is not a device, it is a distributed firewall functionality enforced in every VM.
As you can see, for a VM to reach another traffic needs to traverse not one but two firewalls even if they are within the same subnet. Of course traffic from outside the network to any VM always goes through a firewall too. It is as if every VM were in its own DMZ!
That is completely different to how classical L2 networks work, those need to use different networks or subnets for a firewall to be able to filter traffic. They offer no protection against traffic within a subnet. But in GCP, subnetting has no effect for firewalling purposes.
It is worth to note that every VPC network has two implied rules, allow egress rule and deny ingress rule. The egress rule lets any instance send traffic to any destination although with some considerations, for example you will need a public IP to reach Internet. The ingress rule blocks any incoming connections to all instances, so you need to explicitly allow desired traffic.
Taking advantage of the GCP Firewall, let's redesign the DMZ example.
The DMZ per se is gone! The public servers and internal systems are in the same VPC, they can reach each other and there is no firewall appliance in between or facing Internet. Still, all traffic is subject to GCP firewall inspection. Compare this to figure 2.
Using the GCP Firewall has more advantages. It is a cloud native solution with powerful abstractions. Besides the possibility of writing classical firewall rules based on IP ranges, you can use network tags (or service accounts) to define how firewall rules apply to VMs. A network tag is an attribute you can associate to instances for the purpose of filtering traffic. For example, you could create a group of instances to serve web traffic with a tag "web-server", and one single firewall rule to allow HTTPS traffic to all of them.
You could do it similarly with IP addresses, but then you would need to carefully organize IP allocation to minimize the number of rules to write. This gets more complicated as networks grow. And of course, rules based on IP addresses are not so expressive as with tags, they are difficult to reason out and it is easy to make mistakes.
There are more benefits to using a cloud native solution like the GCP Firewall:
- It is a centralized solution, you don't need to configure each instance. Rules are applied to every VM in the VPC, even future ones.
- As part of GCP, it is integrated with IAM so you can control who has access to it in a unified way with the rest of cloud resources. No need to handle sysadmin logins on appliances.
- GCP is designed to be robust and scalable, and Google takes care of its management and security for you.
- Configuration of rules becomes part of your infrastructure as code under version control, and it can nicely integrate with your CI/CD processes. You can control changes, write security policies, require approvals, and quickly rollback changes if things went wrong.
- Logging, monitoring and auditing integrated with Cloud Operations (formerly Stackdriver). You can log interesting events, monitor the firewall activity, and write runtime policies to detect policy or compliance violations.
As we have seen, DMZs are segmented networks aimed to provide security by isolating traffic and user access to those segments only. But network segmentation can help improve other networking aspects.
Grouping systems that often communicate in the same segment, and those that rarely communicate in different segments, can improve performance. Communication problems like congestion can be reduced, and network device failures or broadcast storms can be better isolated.
Sometimes companies are tempted to recreate their networks on GCP, since they already applied this knowledge when designing their network topology. But the concerns mentioned above are mainly related to L2 networks, and we have explained that GCP is not a L2 network. They don't apply directly to GCP.
However it is good to consider how segmentation can help in cloud. A VPC network is a global network that is divided in regions and zones. Although reachability is global, traffic crossing zones and regions will suffer higher latency than if it stays within the same zone. And Google will charge you for traffic leaving a zone. Hence grouping systems that interact often in the same zone will improve performance and cost.
Also, some VPC features like NAT or network flow logs are configured per subnet, so you may want to use additional subnets to use these features.
GCP Firewall is an L3/L4 stateful firewall meaning that it allows bidirectional communication if a connection is allowed. Every rule has several components, like the common 5-tuple to match: source IP, destination IP, protocol, source port, destination port.
GCP Firewall doesn't offer advanced firewall capabilities like stateful inspection firewalls or application-level firewalls do. The design examples shown previously were simple to explain the concepts, but many times a company needs more advanced security features such as URL filtering, detection of threats, or protection against L7 attacks like cross-site scripting (XSS) or SQL injection. In those cases third-party solutions from the Google Cloud Marketplace, in addition to GCP firewall rules, can be used. This may lead back to multiple VPC designs and the DMZ segmentation model.
However, even with advanced security needs sometimes it is possible to implement network topologies which don't require the DMZ segmentation:
- In some cases not all traffic requires advanced firewalling, possibly only public facing services do. Typically, these services will not be exposed directly but through load balancers like GCLB, that already adds some security protection by hiding your systems behind hardened components managed by Google. Then you can direct traffic to go through your third-party security appliance before reaching your services. Instances running public and internal services can live all in the same VPC, leveraging GCP firewall rules to control allowed connections.
- If it fits your needs, you could use Cloud Armor instead of a third-party solution for ingress traffic. Cloud Armor offers you DDoS protection and helps protect your workloads against OWASP Top 10 risks. In this case the network design is simplified further, and of course you get the benefits of a managed solution.
DMZs and network segmentation are well established techniques to improve various aspects of a network like its efficiency and security. We have reviewed how these concepts should be translated to GCP.
DMZs are security perimeters that rely on how L2 networks work, but GCP is a virtual L3 network with enough differences as to reconsider the applicability of these techniques. In some cases, security perimeters can be implemented in a way that makes the DMZ model unnecessary. And GCP products improve over time, so maybe in the future new generations of cloud security engineers will ask: “A DMZ, what is that?” ;)