Google Networking Deep Dive — What’s behind the scenes

Arnaud Redon
Google Cloud - Community
6 min readMay 2, 2024

For a while now, I’ve been fascinated by the inner workings of Google’s data center network. What powers the vast Google Cloud Platform (GCP) and seamlessly connects users worldwide? Specifically, I wanted to discuss how the Google network is built in their data centers 📐✏️ 💻

Today, we delve into the world of Google’s Virtual Private Clouds (VPCs), peering, and the magic behind it all. Enjoy the read!🧑‍💻

Google Innovations

Most Google networking implementations in datacenters are based on Google innovations( Maglev, Jupiter, Andromeda, Espresso ….)

Google innovations over time

All these distributed systems in the network required significant bandwidth. Google couldn’t buy a commercially available network with enough capacity to meet its needs, so it built its own network.

The central question is : how google can scale, control and trace all network elements across all datacenters ? 🤯 One of the answer: SDN (software defined network)

The control of networking: SDN (software Defined Network)

SDN is a centralized control plane that provisions and controls network elements to provide end-to-end connectivity and enforce policy. Andromeda, Espresso, Orion are built on SDN architecture.

How Google does Google implement all of these ? Let’s go in deep.

How is google network built ?

Google network architecture is built in onion layers as represented in the picture:

Google Network

Google’s data center network infrastructure:

  • Jupiter Fabric: This is the internal network within Google’s data centers, operating at speeds of 40 Gbps per link and capable of handling a total bandwidth of 1 Petabit per second (Pbps). It leverages Software-Defined Networking (SDN) for centralized control and management.
  • WAN B4: This is Google’s wide-area network (WAN) that connects its data centers globally. It also utilizes SDN for efficient traffic routing and boasts high throughput in the terabit range (Tbps).
  • B2: This network segment connects Google’s data centers to the global internet backbone with very high Service Level Agreements (SLAs) for guaranteed performance and reliability for user-facing traffic entering the Google network
  • Espresso: This is the SDN controller for Google’s peering edge network. It dynamically selects the most efficient routes to deliver customer traffic based on real-time measurements of availability and latency.
  • Orion: This is the SDN platform deployed across both Jupiter fabrics and the WAN B4 network for Google’s entire network, not just GCP. It acts as the central control point for provisioning, configuring, and managing network resources.it acts as the control plane, telling the network equipment (switches, routers) how to route traffic and manage overall network behavior. Think of Orion as the central traffic control system, directing data flow across the entire Google network.
  • Andromeda: This is the SDN platform focuses on network virtualization. It’s an internal system within Google Cloud Platform (GCP) that creates and manages virtual networks on top of the physical network infrastructure. Imagine it as carving out dedicated virtual highways within a massive physical road network for GCP users.

Regarding encapsulation: Each forwarding element within the network likely encapsulates the customer’s IP packet within another IP packet with a transport header. This additional header is likely used for internal routing within Google’s network and wouldn’t be part of the final delivery to the customer.

VPC functionning

Google Global VPC is controlled by Andromeda (Software Defined Network Virtualization) which provides Isolation, performance, services and velocity.

Within a Google Global VPC, you can create multiple virtual networks. These virtual networks act as isolated segments where your virtual machines (VMs) reside and can communicate with each other. VM IP addresses are independent of the underlying physical network, providing flexibility and portability. A virtual switch (vSwitch) within the virtual network handles forwarding traffic and maintains a mapping between VM addresses and the actual physical host IPs. That’s why, VM live migration is so easy on GCP!

Andromeda architecture

The Andromeda architecture is a two-plane system consisting of a control plane and a data plane.

The control plane consists of controller VMs. These VMs receive a network representation that includes firewall rules, routes, subnets, and VM information. The controllers translate this information into OpenFlow commands and send them to vSwitches through the OpenFlow frontend proxy. Importantly, the control plane is stateless, meaning it doesn’t store network configuration information. This simplifies upgrades and avoids impacting vSwitches during updates.

The data plane runs on physical hardware resources and can process over 3 million packets per second.

To ensure scalability, Andromeda utilizes sharded VM controllers with VM replication. This means there’s typically one master controller running with two standby controllers for redundancy.

Andromeda Data Plane

Andromeda operates in user space, eliminating the need for root access. This reduces the attack surface and potentially improves code execution efficiency. Andromeda isolation is achieved through encapsulation with address virtualization for isolating traffic, and packet encapsulation for traversing the physical network.

functional isolation
Andromeda network services

Andromeda is designed around a flexible hierarchy of flow processing paths. Flows are mapped to a programming path dynamically based on feature and performance requirements. Google introduced Hoverboards which uses gateways for the long tail of low bandwidth flows, and enables the control plane to program network connectivity for tens of thousands of VMs in seconds. Andromeda sends packets that do not match a flow rule on the VM host to Hoverboards, dedicated gateways that perform virtual network routing.

The Hoverboard model enables Andromeda to achieve scalability and agility while maintaining high performance and efficient resource utilization. This allows for scaling to large virtual networks (100k+ VMs) without overwhelming the control plane.

The Hoverboard path enables control plane scaling by processing the long tail of mostly idle flows on dedicated gateways

What about Internal Loadbalancing?

Internal Loadbalancing is directly embedded and programmed in VMs and sends packet to appropriate backend services. The scalability is distributed across all VMs in Datacenters and there is no chokepoint 👊

The Vm controllers program the openflow rules and into the client Vms and associate the backend with the VIP

Firewall rules

Just like load balancers , Firewall rules are also deployed directly on Compute Engine Vms. This method allows the scalability without bottleneck.

I hope you enjoyed reading this. See you next time 👋

--

--