Cruise
Published in

Cruise

Container Platform Networking at Cruise

Using Google Kubernetes Engine with a multi-cloud, private hybrid network.

GKE in a private hybrid network
  1. Building a Container Platform
  2. Container Platform Security

Why Private? Why Hybrid?

In order for Cruise to move quickly (but safely) towards our goal of launching a ride-hailing service using fully autonomous vehicles, we need access to huge amounts of hardware (both virtual and physical) to run a wide variety of workloads. These workloads include massive test pipelines, machine learning clusters, and data science analysis in addition to multiple distributed backend systems to facilitate ride sharing, mapping, and fleet management. To satisfy the variety and scale of hardware requirements, we use both on-premises data centers and multiple public clouds.

When the car returns to a hub location, the recorded data is extracted and uploaded to the cloud over private fiber lines.

For example, one workflow that requires hybrid connectivity is the process of analyzing recordings from car cameras, lidars, radars, and other sensors. Many people assume that this kind of data is streamed from the car to the cloud, but in reality, so much data is generated from these instruments that trying to stream it all over LTE or even 5G in real-time is simply impossible — there’s just not enough bandwidth. Instead, sensor data is buffered on local disks in the car. When the car returns to a hub location, the recorded data is extracted and uploaded to the cloud over private fiber lines.

GCP Hybrid Connectivity Options

In order to connect our on-premises garages, offices, and data centers with remote cloud providers, we needed a way to extend our private wide area network (WAN) backbone.

Dedicated Interconnect

With a dedicated interconnect, traffic between your datacenter and the cloud traverses a private, physical line installed by a service provider (or between cages in a shared collocation facility) to connect your site to Google’s edge network. You pay (at minimum) monthly port costs to Google for maintaining the connection, as well as any associated service provider costs.

  • Covered by GCP networking SLA
  • Supports up to 200 Gbps (2 x 100 Gbps lines) or 80 Gbps (8 x 10 Gbps lines) per individual connection
  • Port and egress costs apply
  • Internal VPC subnet prefixes only (no public peering option)
  • Requires presence in an internet exchange (IX) facility
  • Slow provisioning lead times if new lines need to be installed

Partner Interconnect

With a partner interconnect, traffic between your datacenter and the cloud goes over existing lines owned by a service provider. That provider then sells you capacity on their shared line.

  • Covered by GCP networking SLA
  • Fast provisioning time due to pre-provisioned capacity to third party vendors
  • Supports 50 Mbps to 10 Gbps per individual connection
  • Port and egress costs apply
  • Internal VPC subnet prefixes only (no public peering option)
  • Lower bandwidth than dedicated interconnect
  • Third-party network traversal and costs required

Virtual Private Networking

With a Virtual Private Network (VPN), traffic between your datacenter and the cloud is encrypted over the public internet, using your existing Internet connection and bandwidth. VPN termination requires that you host an on-premises device capable of static or route-based VPN.

  • Slower throughput due to IPSec overhead
  • Expensive at scale due to internet access and crypto acceleration costs

Direct Peering

With direct peering, traffic between your datacenter and Google traverses a direct connection, like Dedicated Interconnect, but it only peers public IPs, not private IPs. You don’t actually even need to be a Google Cloud customer here. Direct peering is the traditional route of Internet peering with Google for all of their public services, not just GCP (YouTube, G Suite, etc.).

  • 10Gb & 100Gb free public peering with Google
  • No port or traffic costs
  • Reduced Internet egress rates to your network for your GCP projects
  • Not covered by GCP networking SLA
  • Public Google IP prefixes only (not VPC-aware)
  • Requires presence in an internet exchange (IX) facility
  • Requires provider-independent IP address space (PI) — a minimum /24 of public IPv4 space registered with a public autonomous system number (ASN)
  • Integrating GCP with your on-premises network can be a manual process
  • Not all interested parties are accepted by Google for direct peering

Cruise Hybrid Connectivity Choices

Cruise uses a mix of these connectivity options for cost-efficiency and redundancy: direct interconnect for high bandwidth access to GCP VPCs, VPNs for cheap low bandwidth emergency fallback, and direct peering for free high bandwidth access to public GCP services.

Reliability and redundancy are critically important so that services running in GKE are always accessible to the rest of our network.

One of our most important network metrics is throughput capability to Google Cloud Storage (GCS), which we use as a data lake. However, because GCS is a public service, traffic to GCS from on-premises won’t traverse GCP interconnects by default, instead using (potentially slower) public ISP connections.

Private Network Backbone

After evaluating our options, we decided to deploy a point-of-presence (PoP) in IX colocation facilities where Google and other cloud providers also have a presence. These PoPs could then be integrated with our WAN backbone, allowing us to easily (and quickly) connect to most cloud providers, carriers, and Internet service providers (ISPs) with minimal effort.

Figure: Hybrid, Multi-Cloud Interconnectivity

Internal GCP Routing

Once physical connections are established between Cruise and GCP, we ‘divide’ up those links with logical connections known as VLAN Attachments. This ultimately allows us to establish BGP peering sessions between our physical routers and the regional Cloud Routers containing our VPC subnets.

Figure: Cruise GCP Shared VPC Interconnect Design

Subnetworks

The Kubernetes network model gives each service and pod a whole IP to simplify application development, but that design choice has the consequence of complicating platform configuration and operation, especially when managing multiple Kubernetes clusters. There are two different options in GKE to enable this Kubernetes-style networking: route-based networking & VPC-native networking.

  1. Alias IPs are natively routable within the network, including peered networks.
  2. Alias IPs can be announced through BGP by Cloud Router over interconnects & VPNs, if desired.
  3. Alias IPs can’t be used by other compute resources on the network, which prevents conflict and allows for dedicated firewall rules.
  4. Alias IPs can access GCP hosted services without egressing through a NAT gateway.
  1. Number of clusters
  2. Number of regions
  3. Number of nodes
  4. Number of pods per node

Secondary IP Ranges

Initially, we decided on a strategy of having one subnet per region, each shared by multiple clusters. However, while the primary IP range is expandable after creation, secondary IP ranges can’t be expanded while in use by a GKE cluster. Even if you could expand them, you would need to leave contiguous unallocated IPs available on the network, at which point they might as well be pre-allocated to the subnet.

Planning For Change

Another challenge is that having clusters share pod IP ranges means that we can’t delete a pod IP range without deleting all the clusters using it, thus making it hard to change which IPs are used by a cluster. To make things simpler and easier to change, we switched to provisioning a subnet for each GKE cluster. It means a little more CIDR math up front, but is a good architectural choice to keep things easy to change in the future.

Ingress

GKE comes with a suite of ingress integrations that should be good enough for most basic use cases. However, one thing to consider is that the public ingress options (from the internet to the cluster) are more robust and mature than the private ingress options (from the intranet to the cluster).

  1. External HTTP(S) Load Balancer provides public ingress that supports HTTP(S) and uses layer 7 load balancing (reverse proxies).
  2. Internal HTTP(S) Load Balancer (Beta) provides private ingress that supports HTTP(S) and uses layer 7 load balancing (reverse proxies).
  3. Network TCP/UDP Load Balancer (NLB) provides public ingress that supports TCP or UDP and uses regional layer 4 routing (IP translation).
  4. Internal TCP/UDP Load Balancer (ILB) (Beta) provides private ingress that supports TCP or UDP and uses regional layer 4 routing (IP translation).

Public Ingress

For TCP/UDP public ingress, the NLB can be configured by Cruise PaaS tenants using a standard Kubernetes Service resource of type Load Balancer. The controller that provides the integration is baked into the upstream Kubernetes’ GCP cloud provider. The NLB is implemented pretty low in the networking stack, so it doesn’t provide advanced features, like session stickiness, built-in authentication, or path-based routing. Generally, we only use the NLB for non-HTTP traffic, unless it’s a middleman for an application-layer HTTP proxy, like Nginx or Envoy.

Private Ingress

For TCP/UDP private ingress, the ILB can be configured by Cruise PaaS tenants using a Kubernetes Service resource of type Load Balancer, the same mechanism used to configure the NLB, except with an annotation to make it private (cloud.google.com/load-balancer-type: Internal). The ILB is effectively very similar to the NLB, except that it’s only accessible from within the VPC and only within the same region by default. So like the NLB, we generally use the ILB for non-HTTP traffic.

Figure: Private GKE Ingress using Nginx Ingress Controller, Google ILB, and External DNS

Egress

If your GKE cluster nodes have public IPs, you get egress to the internet for free, but if you deploy your nodes on private subnets for added security (like we do), then your public egress traffic has to transit a NAT gateway to reach the internet. When we originally deployed GKE, we had to deploy our own NAT gateways, using a fork of Google’s NAT Gateway Terraform Module. However, in early 2019 Google launched Cloud NAT, a fully managed solution that has reduced this management overhead for our engineering team.

Network Isolation

It’s worth noting that Kubernetes doesn’t natively provide any Quality of Service (QoS) features to isolate ingress or egress. All network traffic shares the individual node’s resources, and more broadly, the network’s resources. As a result, it’s pretty easy for a single Kubernetes pod to consume all the bandwidth of a shared node or the shared NAT gateways and cause a bottleneck, if you’re not careful.

If you absolutely need ingress and egress isolation now, you may have to peel back the higher layer abstractions and use a lower level abstraction instead.

Upgrading to Skylake VMs helped with some of our networking bottlenecks, because Google raised the egress bandwidth cap to 32 Gbps (on 16+ core instances). However, the highest speeds are limited to same-zone VM-to-VM traffic. While the new architecture is still faster than the previous Broadwell architecture, it’s not possible to achieve 32 Gbps on all traffic within a multi-zonal cluster or between regions/clouds.

Scale Considerations

Recently, Google published a great article with guidelines for creating scalable (GKE) clusters, which is a great resource that pulls together a lot of the constraints that are strewn throughout the GCP and GKE documentation. It also builds on some of the scalability thresholds put together by the Kubernetes Scalability Special Interest Group (SIG).

To Be Continued…

In this post, we’ve explained how Cruise deploys Kubernetes clusters on private networks, how we connect these networks with our on-premises datacenters, and how we deploy scalable ingress and egress for our clusters. In the next blog post of this series, we will look at the Observability of our clusters and workloads running on them.

--

--

Cruise is building the world’s most advanced self-driving vehicles to safely connect people with the places, things and experiences they care about. Join us in solving the engineering challenge of a generation: https://getcruise.com/careers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Karl Isenberg

Cloud Guy. Anthos Solutions Architect at Google (opinions my own). X-Cruise, X-Mesosphere, & X-Pivotal.