Beauty of routing in GCP — how to achieve VPC transitivity

Published in

Google Cloud - Community

7 min readJul 18, 2023

The absence of VPC transitivity in the realm of the public cloud is often a significant concern. However, with the right utilization of VPC Peering and effective network design, this becomes a non-issue. But what if you encounter a scenario where the network topology is already established, unchangeable, and certain services accessible via VPC Peering are inaccessible to you? In GCP, numerous managed services rely on VPC Peering, PSC doesn’t have broad support (yes, I’m talking with you AlloyDB), so it’s not entirely impossible to face this situation.

What are VPC, VPC Peering, and VPC transitivity?

VPC stands for Virtual Private Cloud, and it is a cloud concept that allows users to create and manage their isolated virtual networks within a public cloud environment. It enables organizations to have control over their networking resources, such as IP addresses, subnets, route tables, and network gateways, while keeping their cloud infrastructure isolated and secure from other users in the same cloud provider’s environment.

VPC Peering is a networking feature provided by cloud providers. It allows you to establish a direct private connection between two separate VPCs within the same cloud provider’s infrastructure. This connection enables the VPCs to communicate with each other as if they were part of the same network, even though they might belong to different accounts. VPC Peering is typically used to share resources or facilitate communication between VPCs owned by the same organization or within a multi-tier application architecture.

VPC transitivity refers to the ability to route traffic between two VPCs through a common, intermediary VPC. In other words, if VPC A is peered with VPC B and VPC B is peered with VPC C, transitive peering would allow VPC A to communicate with VPC C via VPC B, even though A and C don’t have a direct peering connection. However, VPC transitivity is not natively supported by all cloud providers. In AWS and GCP, for example, VPC peering is not transitive by default. Each VPC must have its own separate peering connection to communicate with other VPCs. Therefore, direct peering connections between all required VPCs are necessary to achieve full communication in non-transitive VPC environments.

A familiar example?

Let’s consider the following example:

*A typical scenario where the lack of VPC transitivity becomes a blocker — p1*

As you can see, we have a direct VPC Peering between the VPC Left and VPC Right, then another Peering instance between VPC Right and the Google Managed Tenant that hosts several managed services such as Cloud SQL and the control plane for GKE.

A typical scenario where the lack of VPC transitivity becomes a blocker — p2

Due to the lack of VPC transitivity, the communication between VPC Left and the services hosted on the Google Managed Tenant cannot happen. As mentioned earlier, there isn’t any magic GCP native solution that would solve such an issue, having a correct Network Design becomes then a must.

Perhaps the native routing capability of GCP can help?

Replacing the VPC Peering between VPC Left and VPC Right with a routing element can alleviate the situation.

In this scenario, when any service connected to the VPC Left needs to access, for instance, Cloud SQL, hosted on the Google Managed Tenant which is in VPC Peering with VPC Right will go through the router.

Example of the traffic flow with the router

From a GCP routing standpoint, only a few simple things are needed:

On the VPC Left two custom routes are required to reach Google Manager Tenant and VPC Right using the routing instances as the next hop;
Similarly, on VPC Right a custom route to reach VPC Left using the routing instances as the next hop;
Thanks to the Exchange custom routes feature of VPC Peering we need to ensure the VPC Right peering instance to Import and Export any defined custom routes;
Lastly, for PSA (see next section) Exchange custom routes is needed on the servicenetworking-googleapis-com Peering side:

Magic of Import and Export custom routes in GCP

On the routing element, we can make this as much complicated as we want. I choose the KISS approach going down the Linux VM path with the following options define:

IP Forwarding allowed;
NIC0 connected to VPC Left;
NIC1 connected to VPC Right;
gVNIC enabled hoping for as much low latency as possible;
A startup-script with the following content:

#! /bin/bash
sysctl -w net.ipv4.ip_forward=1
ip route add 192.168.245.0/24 via 10.130.100.1

The simplicity of this approach is, honestly, marvelous. No NAT-ing, no proxying, no complex routing layers, no stratified configurations. It’s just some routing on GCP, IP Forward enabled on the VM, and plain-and-simple routing in Linux. It’s lean yet powerful.

Using a dedicated router, fully integrated with the GCP API, may yield the ability to auto-discovering such custom routes. This is also doable through a shell script invoking the gcloud CLI.

Re-usability

Thanks to the Networking API Private Services Access the same addressing is re-used by GCP across many managed services (Memorystore, Cloud SQL, Filestore, Vertex AI, etc). This considerably reduces the amount of configuration required. GKE, unfortunately, doesn’t follow the same PSA approach. You could come up with some clever routing schema that defines that only certain ranges are used by this scope, but given you’re here in the first place, perhaps it’s a bit late for a clever routing schema.

Scalability

I’m sure the sharpest readers would wonder how well a single GCE instance could handle all the traffic. Google Cloud limits outbound (egress) bandwidth using per-VM maximum egress rates based on the machine type of the VM sending the packet and whether the packet’s destination is accessible using routes within a VPC network or routes outside of a VPC network. Such upper boundaries are pre-defined and well-documented. Generally speaking, for anything, not E2 and with a low CPU count, you’re limited to 10Gbps. This can be further increased through TIER_1 networking up to 100Gbps. The newer C3 instances, thanks to the IPU architecture, start at 23Gbps rather than 10Gbps and can go up to 200Gbps (see my other write-up about it).

From a pure latency standpoint, unless you’re running a DPDK router like VPP, you’re gonna see an increase due to the interrupt fashion of Linux and all other non-carrier routers. Rather than going for a Cattle VM, a way to reduce the latency and scale throughput is to have at least one instance per zone and perhaps more than one where needed. This is enabled by iLB, see the next section.

Addressing the elephant in the room: SPoF

While this single routing instance is good for a PoC or a medium post, how about something a bit more reliable?

How about Internal Network Load Balancers and Managed Instance Groups? With this approach, we can gain true availability thanks to the usage of two persistent iLB (one for VPC Left and another for VPC Right). This solves the performance scalability too with also the in-zone low-latency topic covered above. Make only sure to select the appropriate Session Affinity, like Client Source IP:

The iLB session affinity is instrumental for the performance scalability

The GCE availability is taken care of by a MIG:

The only real aspect a bit janky about this setup is the warning showed by one of the two iLB complaining about forwarding traffic only to instances whose NICs are in the network.

It’s a false warning yet not so nice having it

It’s a false warning because the routing instance is connected to both VPC networks. Probably the internal iLB check is done taking into consideration only the instance’s NIC0. Yet this is fully functional.

For a comprehensive documentation, see the official GCP guide.

Considerations

To summarize, the lack of VPC transitivity in the public cloud can present difficulties in facilitating communication across multiple VPCs. Nevertheless, by employing a routing instance, as demonstrated in this case, it becomes possible to overcome this limitation. Although this approach may not be flawless, it offers a practical solution without disrupting the entire network infrastructure. It is worth noting that even the latest Network Connectivity Center VPC Spokes configuration does not facilitate the exchange of static and dynamic routes. I hope this discussion assists you in resolving networking challenges and highlights the remarkable aspects of network routing management in GCP, including its limitations and flexibility.

Beauty of routing in GCP — how to achieve VPC transitivity

Written by Federico Iezzi