A decentralized approach to API Gateways for OpenShift on your way to service mesh
I meet with a lot of customers trying to solve service-to-service communication challenges in their cloud-native architectures. My particular interest is in service mesh technology (I’m writing Istio in Action for full disclosure) like LinkerD, Istio, or Consul. At Solo.io, where I work, we help organizations successfully adopt and manage these systems, but we are noticing an interesting trend with our customers:
Legacy API Management solutions are being used to solve some of the same service-to-service communication challenges that service mesh now better solves.
Legacy API Management has become a bottleneck in their architecture and processes and a decentralized approach is desired.
So we end up seeing folks build their strategy to adopt service mesh to replace their legacy API Management solutions for these service-to-service communication challenges.
These folks are not wrong — however, service mesh isn’t an API Management solution. Although service mesh can nicely solve difficult service-to-service communication challenges it still leaves some gaps to be filled. For example, we still need to handle end-user authentication and authorization. We may need to integrate with existing security flows. We may need some kind of light protocol or message transformation. We may need more domain-specific rate limiting, and many others. For this, we still need API Gateway functionality in our architecture — But we don’t want to just re-create the same old centralized, ESB-like bottlenecks like we had with our old API Management solutions. For this, we need some Gloo: an API Gateway that nicely complements service mesh (for when you’re ready to adopt service mesh) and solves these problems allowing you to jettison your bloated API middleware.
The reality of Kubernetes infrastructure in enterprise
A lot of our customers deploy their microservices to OpenShift as their flavor of Kubernetes. Sometimes, the reality of a large enterprise organization dictates what features and processes from a container platform (ie, Kubernetes based platforms) get used and how they get used. You OpenShift users know what I’m talking about :)
For example, OpenShift has a nice source-code-to-docker container build pipeline out of the box. In my experience, organizations prefer to have more control over how builds get done and what that supply chain looks like so they disable this feature. There are other scenarios where a security team or networking team will come in and coerce an OpenShift installation in terms of existing networking rules and policies. One such example is when a team deploys an application (usually a set of services) they do so into one namespace and completely lockdown network access ingress/egress to that namespace. They use OpenShift’s multi-tenent SDN plugins to do this. In some cases, traffic destined for another service inside the cluster is forced out of the cluster, to external load balancers or API Management software, and back into the cluster. Not ideal in many ways, but this keeps the network and security folks happy.
A decentralized approach using Gloo
We can use Gloo, an API Gateway built on Envoy Proxy, to fit nicely as a decentralized API Gateway even within these [at times] uncomfortable constraints as well as act as a nice stepping stone to your favorite service mesh.
To be clear, the rest of the article will focus on how to:
- build a decentralized API gateway infrastructure
- fit nicely within some of these inconvenient realities
- support new protocols that traditional API Management typically doesn’t
- provide a stepping stone to full service mesh, at your own pace
Straight forward Gloo deployment
OpenShift comes out of the box with a Router component which is the main ingress point to the cluster. This Router is based on HAProxy and basically acts as a L4 reverse proxy and connection load balancer. It can do TLS termination and collect basic metrics. For a basic deployment of Gloo, we can add it behind the OpenShift Router.
In this scenario, although we’re taking an additional hop, we get access to API Gateway functionality like end-user authentication/authorization, request caching, request/response transformation, etc and important L7 network control for doing things like traffic shadowing, traffic shifting and canary deployments. This is a huge gain and is also a step closer to what a service mesh provides. On the other hand, within the cluster, this is still a shared gateway. In the next sections we’ll see how to deploy a bit more decentralized.
Also note, although in this case Gloo is running in OpenShift (a flavor of Kubernetes), Gloo can still route to external APIs not hosted in OpenShift including routing to Function as a Service like Lambda and Cloud Functions.
Alternatively to running behind the OpenShift Router, we could run Gloo on infrastructure nodes as NodePort. This has the advantage of directly exposing the API Gateway and eliminating the HAProxy hop, but has the drawback that network folks don’t typically like NodePort.
You could also use something like BGP routing or metallb to expose Gloo through a LoadBalancer directly. Please see this blog for more.
At this point this gives us basic API Gateway functionality with minimal fuss, however we want to explore ways to decentralize this deployment. At the moment, it’s still fairly centralized and shared, although much less so from a process perspective because we can use GitOps and other declarative, SCM-driven approaches to self-service the configuration of Gloo at this point. Gloo’s configuration is declarative and defined as CRDs in Kubernetes/OpenShift.
If we need further isolation, we can also use proxy sharding which Gloo supports and assign certain APIs to their own gateways. This involves a slight management overhead, but allows you to separate failure domains for higher-value APIs.
This solves some problems experienced with Legacy API Management vendors where a single API could take down the gateway for an entire set of APIs because isolation or bulkheading is not enforced.
One Gloo Gateway per namespace
We could also deploy a single gateway per namespace. As we said, some OpenShift environments don’t allow traffic across namespaces directly except through well known egress/ingress points (typically controlled by multi-tenent SDN or network policy).
In this scenario, each proxy would have its own API Gateway configurations and be controlled by each team. We have access to the full feature set of an API Gateway like rate limiting, authZ/N, caching, traffic routing/splitting, etc. and fits nicely within a locked down namespaces. Notice how this starts to form a simple routing mesh within the cluster but is configured and controlled by respective project teams, not a centralized configuration store. This approach gives a lot of flexibility and team ownership to the API Gateway with minimal contention points within the organization.
One Gloo Gateway + Control plane per namespace
In the previous scenario, we had each OpenShift project/namespace have its own proxy, but the Gloo control plane was still shared. For even more strict environments, this may not be ideal and you can just run the entire control plane per project/namespace. Gloo’s control plane is very slim and doesn’t take much resources, so this could be a great approach to get full multi-tenancy and live comfortably within the constraints that imposes. Services or users when they communicate to these services would come into the cluster through OpenShift Router (HAProxy) and then route to the specific applications and their API Gateway.
In this blog, we explored some realities of how organizations deploy OpenShift Kubernetes as well as how the utilize legacy API Management infrastructure. We also see that because of the reality of some of the OpenShift deployments, the restrictions become a high hurdle to deploy a service mesh (like Istio) and helps explain why OpenShift hasn’t supported Istio yet. Using Gloo, not only can we play nicely in this environments and get closer to getting you to a service mesh, but we can solve those gaps between existing API Management functionality and what you want to get out of a service mesh. Lastly, Gloo plugs in natively with any service mesh and can play the role of ingress or shared gateway within your mesh once you get to it. My personal recommendation has always been to adopt service mesh capabilities iteratively and this is one great example of using Gloo (which is based on the same Envoy technology as most service meshes) to do that. Follow along @soloio_inc and @christianposta for more on these topics!