Building an external k8s API Gateway & LoadBalancer

Published in

ThermoKline

11 min readApr 18, 2023

The access infrastructure used by Cloud Providers for k8s differs from the access solutions used in on-premise deployments. While on-premise deployments use in-cluster mechanisms, cloud providers LoadBalancers and API Gateways run outside of the cluster directing traffic to clusters deployed within their cloud complex.

Placing API Gateways and LoadBalancers (Gateways) outside of the k8s cluster has significant feature and operational benefits. We have developed an external API Gateway and LoadBalancer for k8s that interoperates with both Cloud providers and on-premise clusters.

k8s controller for the ServiceAPI (LoadBalancer) and the GatewayAPI
Operate with Cloud Providers and on-premise clusters
Multi-tenant
Transparent to NAT’s and Firewalls
Fully configurable Proxy Engine
IPv4 & IPv6
Performance & reliability by design

User Operation.

A cluster user creates either an L7 API Gateway or an L4 TCP proxy using the GatewayAPI, or L4 (tcp/udp) proxy using the Service API. All operations result in the creation and programming of proxy engines instantiated in the external gateway. The configuration of gateways is based upon in-cluster and external gateway configuration. In addition to creating the gateway the External Gateway allocates IP addresses, updates DNS information and configures the External Gateway network to advertise the public address.

Gateways are created on demand from configuration templates stored in the External Gateway, and each gateway’s configuration is stored in a custom resource on the External Gateway. The configuration on the k8s Gateway depends upon the k8s cluster API used, however the platform design separates the configuration and operational tasks of k8s cluster user, k8s cluster managers and Gateway, Network or Security engineers.

Once configured, requests are received by the external gateway proxy instances and forwarded directly to POD’s hosting the applications, no proxies are required in the target clusters.

Platform Components.

A number of core and ancillary components are used to create and integrate the Gateway platform with Kubernetes.

k8s Controller

A single controller enables the creation of Gateways using either the Gateway API for API gateways or the Service API for LoadBalancers.

The benefit of using the k8s APIs is that each of these mechanisms are part of the standard k8s APIs therefore cluster users requesting gateways do not need to understand propriety configuration. A user requiring a simple LoadBalancer can use the Service API, where an API Gateway is required with http routing, the gateway API is used. Using one controller leveraging both standard APIs is an important part of limiting the complexity in the operational workflow.

The specific external gateway configurations are contained in the Gateway Platform as creation templates. The in cluster controller uses custom resource templates for Gateway creation. This enables standardized gateway configurations separating detailed gateway configuration from k8s cluster operating enabling cluster users to create predefined gateways on demand.

Gateway Platform

The Gateway platform itself runs a simplified upstream version of k8s as its control plane foundation providing configuration management and storage infrastructure. The cluster controllers communicate with the gateway; using k8s in the Gateway created a common API structure simplifying the development of all of the components and providing a common infrastructure to manage. k8s manages the creation of the Gateway proxy instances but the CNI does not provide the traffic forwarding path. k8s also provides isolation necessary for multi-tenant operation and contributes to gateway platform redundancy.

Gateway Controllers

k8s controllers implement the Gateway Platform. They implement the API used by the k8s cluster controllers and the infrastructure that manages the interaction with k8s, configuration, storage and creation of the gateways. These controllers also implement IPAM for the public IP addresses allocated to Gateways.

Each Gateway platform node also requires configuration undertaken on the host, this includes the configuration of gateway-k8s-node transport, and IP route management, there is a daemonset controller on each node responsible for this configuration.

Proxy Engine

There are lots of proxy engines to choose from, however the two most prevalent general purpose proxies for k8s are Nginx and Envoy. Our platform is designed so it can support any proxy engine but infrastructure is required to configure a proxy engine variant. Both were considered however Envoy was a better fit with a modern configuration interface and complete (non-paywall) feature set.

Envoy‘s xDS configuration configuration API enables the proxy to be configured dynamically. This includes startup configurations and changes to downstream target clusters as the controller adds and removes downstream k8s cluster pods.

The Gateway Platform can support multiple versions of Envoy. Each version, either standard or a custom build is placed inside a container configured for use with the Gateway. This way any and multiple versions of Envoy can be supported.

Proxy Engine Configuration

During the creation of each tenant namespace two controllers are created that provide the initial proxy bootstrap and operational configuration. The Gateway custom resource (CR) created from the Gateway template CR embeds Envoy configurations in their native form with go templates containing information populated by the Gateway controllers, such as downstream endpoints.

The Gateway CR configuration is parsed into an Envoy configuration and stored in a lightweight Envoy configuration manager called Marin3r. Each tenant namespace has its own instance of Marin3r. One of the key components of Marin3r is configuration syntax checking. Envoy’s configuration syntax is known to be somewhat cumbersome and error prone, Marin3r checks configurations before they are applied and only updates valid configuration while keeping a history of previous valid and invalid configurations.

Once the gateway is created, the Gateway CR can be updated to change the envoy configuration and if the change is validated by Marin3r it will be dynamically applied.

Gateway Platform Networking

Each container hosting the envoy proxy engine requires specific network configuration. The Gateway Platform controller’s IPAM function allocates an IP address for a gateway from a public pool of IPv4 or IPv6 addresses that will be used to access the gateway. The request traffic to-and-from the proxy engine pods does not transit the k8s CNI network, the k8s CNI network is used solely for control plane traffic.

Multus

The Multus controller enables additional network interfaces to be added to PODs on creation. The Gateway platform controller passes the IP address allocated by the Gateway IPAM to Multus and adds that address to the additional POD network interface. During POD creation the routing tables in each POD are updated to ensure that control plane traffic transits the CNI and adds a default route ensuring all other traffic transits the Multus provisioned interface. A gateway is a collection of proxy pods spread over the gateway cluster, each gateway POD gets the same external IP address enabling router-based load balancing.

Routing

The Gateway Platform controller adds routes for the IPAM-allocated public address on each Gateway Platform node where proxy pods exist. Each node in the Gateway Platform has a Routing POD, running BIRDv2. BIRD is configured to read the routes added by the Gateway Platform controller and distribute them using BGP to an upstream switch operating as an Internet router. The switch is configured to support Equal Cost Multipath, therefore traffic destined for the proxies is distributed among the nodes and proxy PODs for each gateway.

eBPF GUE Encapsulation

The k8s cluster controller updates the Gateway controller with node and service endpoints from either the Gateway or Service API. The Service Endpoint addresses are the POD CNI addresses within the target k8s cluster. The CNI POD addresses are added to the Envoy cluster configuration dynamically. Envoy distributes requests to cluster PODs based upon the configured LoadBalancer policy.

To transport traffic between the Gateway Cluster and the k8s cluster Generic UDP encapsulation is used. There are two variants of GUE, variant 0 includes a control header, Linux only implements variant 1 the direct encapsulation without the option for a control header. eBPF programs in the k8s cluster and the Gateway Platform implement GUE variant 0.

The eBPF program intercepts the traffic addressed directly to each POD and encapsulates the traffic sending it directly to the cluster node hosting the pod using information provided by the k8s cluster controller.

However it is likely that the node addresses are located behind Network Address Translation. One of the key reasons the GUE variant 0 header is required is to address this problem. When a Gateway is configured in the target node and a service is associated, the k8s cluster node agent begins sending GUE hello packets that contain a unique key for the destination. This key is used to identify the target node, and the source address used by the GUE hello packets is the NAT address that maps back to the cluster node, therefore sending packets to this address will result in translation to the correct node destination. The GUE hello packets are sent at a regular interval to ensure state in the NAT is maintained.

During gateway creation, the k8s cluster controller also receives addressing information about the Gateway Platform node that is hosting the proxy for the gateway. In addition to sending GUE hello packets, the eBPF program is programmed to decapsulate traffic received from the Gateway Platform. The decapsulation simply removes the outer header leaving the POD CNI address which is then forwarded using routing configured by the k8s CNI. The traffic will arrive at the Node where the POD resides, however it is possible to configure the platform to pass traffic via a limited set of accessible nodes with those nodes forwarding to destination PODs over the k8s cluster CNI.

The eBPF program is relatively simple from a functional perspective (they are not easy to code). Attached at the Traffic Control Ingress and Egress attachment points the programs expand the packet size and add new header information at encap and remove that information at decap. The k8s node program maintains a dynamic table of decapsulated incoming requests that is used to encapsulate replies. The Gateway Platform node program maintains a dynamic table of verified node addresses, all other configuration is undertaken by the controllers in their respective nodes. This simple program adds very minimal additional overhead to packet forwarding.

Gateway Configuration

The configuration in the Gateway platform is uniform irrespective of the k8s Cluster API or type of Gateway that needs to be configured. Teams or Organizations are assigned to individual namespaces and Service accounts are added. Gateway Platforms are referenced by controlplane-address/namespace/service-account.

The only difference between L4 tcp/udp and L7 HTTP/API Gateways is their configuration derived from the Gateway templates.

Gateways are created from templates stored in each namespace defined by Custom Resources. These templates contain Gateway Platform configuration and the Envoy configuration. Gateway Platform parameters include

IPAM pool providing gateway addresses.
Enable multi-cluster gateway sharing
Number of proxy POD instances created for this gateway. (today this is static, however dynamic capabilities can be easily added)
Version of Envoy POD to be loaded for this gateway

The template contains a section called “envoy-template” which mirrors a standard Envoy configuration. To add Gateway and cluster specific information to the configuration, the template can include Go-templates. By using go-templates in the Gateway templates it’s straightforward to understand the resulting Envoy configuration. These templates define how the filter chains and Envoy routes are created providing total configuration flexibility.

Gateway API

The Gateway API provides a significantly richer interface, but is slightly more complicated. Its structure attempts to address different user roles, such as Infrastructure provider, Cluster Operator and Application developer.

Once the controller is added to the k8s cluster, a GatewayClassConfig and GatewayClass are added to enable the creation of a predefined gateway by an Application developer. The GatewayClassConfig is specific to the Gateway Platform and contains the Gateway, Organization, Service Account and desired Gateway platform template. The GatewayClass is the standard definition that binds the GatewayClass to the Gateway API configuration objects. Each GatewayClassConfig/GatewayClass represents a different Gateway Platform gateway configuration, therefore many different API gateway configurations can be supported.

Gateways are created by an application developer by creating a Gateway object in their namespace resulting in the creation of a Gateway and its associated proxy engines based upon the template defined in the GatewayClassConfig. To attach the gateway to POD’s, routes are created that referenced services.

A key function of API gateways is HTTP routing and its associated manipulations. HTTPRoute in the GatewayAPI provides this functionality. It enables the functionality necessary to direct URLs to PODs in the k8s target clusters providing the applications. The configurable actions include redirects/rewrite and traffic splitting. When the Gateway Platform has multicluster enabled, these PODs can be distributed over multiple clusters to provide further mechanism for reliability and migration.

GatewayClassConfig->GatewayClass->Gateway->HTTProute->Service

The Gateway Platform and the Gateway API also support TCPRoute for creating L4 TCP proxies. These are similar to ServiceLoadbalancer with the services referencing TCP.

There is one aspect that should be pointed out, the central object in a configuration is the Route, not the Gateway. The Route object binds a collection of services but can also bind to more than one Gateway enabling additional levels of configuration flexability. Changing routes updates the gateway but does not delete it, the Gateway is a longer lived object while routes are expected to change based upon the application’s needs.

You can learn more about the Gateway API at https://gateway-api.sigs.k8s.io

LoadBalancer with Service API

Using the LoadBalancer component of the Service API is a simpler solution but by default has limited flexibility. The Gateway Platform creates the same proxies using the same mechanism however how they are configured from the cluster is different. Most importantly the Service API does not have an ability to provide HTTP routing.

The Gateway Controller installed on the k8s cluster also supports the Service API Service Loadbalancer.

Generally speaking, the Service API is a good alternative if all that’s required is a simple L4 TCP/UDP proxy. Annotations in the Service object select the template used in the Gateway Platform to create the proxy. If other controllers are listening to this API, such as PureLB or MetalLB or in a cloud provider, the loadBalancerClass should be specified to ensure that the gateway is selected.

Additional Components

To make operation simpler, a number of other components are added to the Gateway platform, in some cases they have required customization to enable operation in this use cases

Certificate Management
Cert Manager and Emberstack. These tools create and manage both system certs and Gateway Certs, emberstack is used to replicate Certs over multiple namespaces.
DNS
ExternalDNS. DNS names are created from the go-templates located in the Gateway templates, ExternalDNS distributes them to DNS servers. Gateway Platform uses a modified version.
Access to Control Plane API
Contour & PureLB. The Gateway Platform uses a standard in-cluster access model to provide access to the Gateway Platform control-plane API. PureLB, an Open Source Service Load Balancer (we are the authors) combined with Contour proxy.
Prototype Gateway Platform UI
This provides a way for users to access our trial system located in a US datacenter and give the Gateway Platform a try.

It’s Open Source

You can learn more and try the EPIC platform at https://www.epic-gateway.org/