Thinking of Moving Your WSO2 Deployment On to Kubernetes?

Moving from a Virtual Machine based deployment to a Container Clustering environment can be quite a few sleepless nights. Cloud deployment artifacts for WSO2 products would reduce this number by a few, because of their tried and tested nature. These include Dockerfiles, Puppet based configurations, Kubernetes and Mesos deployment artifacts. As easy as these artifacts make life easier during a redesign stage of an existing deployment, let us anyway walk through some points to keep in mind regarding a WSO2 deployment on top of Kubernetes.

Configuration Automation

If you don’t already have automated configuration management, you should get one today! While saving the day every time an important server goes down, it would also discourage bad practices of doing configuration changes on live servers and ending up with Snowflake instances that break everything the moment an OutOfMemory occurs. Not only would you have a grip over how change is propagated from Development stage to Live, you’d also have your configuration outside the production system.

WSO2 provides Puppet modules for most of its product stack. These make use of Hiera to manage multiple deployment patterns with the same set of Puppet Manifests and Templates.

If you already had a set of Configuration Automation scripts for your WSO2 deployment, you can also use those. However, it would be better to tally them with the WSO2 provided scripts to see if anything new can be added. WSO2 Puppet Modules almost always provide options to configure every detail of a deployment. In addition to the above, WSO2 Puppet Modules make use of Hiera’s hierarchical reading of configuration to support multiple deployment platforms like Kubernetes, Mesos or even bare-metal instances.

The importance of having a configuration automation pipeline in place is how easy that makes the transition from VM or physical deployment to Container Cluster environment.

WSO2 Dockerfiles

WSO2 Dockerfiles provide artifacts and helper scripts to easily configure and build Docker images for various WSO2 products. These scripts use several methods of “provisioning” to configure the WSO2 product inside the Docker image when building. By default two provisioning methods are shipped, “default”, and “puppet”. In “default” provisioning method, the idea is to provide an already configured product pack to builder script which would just include it inside the Docker image.

The more interesting “puppet” provisioning method will make use of WSO2 Puppet Modules to configure a given product inside the Docker image. This is where the advantage of having Automated Configuration comes in. Since WSO2 Puppet Modules have different Hiera data-sets to support different Containerization platforms, Docker images for any of those can be configured with the “puppet” provisioning method.

Kubernetes Deployment

Kubernetes has a plethora of options to use as a platform to be deployed on. AWS, GCE, or any other provider with supporting CoreOS images is all that is needed to deploy a Kubernetes cluster. However, it would be a lot easier to manage these deployments with an Infrastructure as Code tool like Terraform, so that in an emergency you’d be able to quickly get back on feet without worrying about how to contact that Ops guy who left for better compensation.

Some points to consider during the deployment stages would be,

  1. Keep the initial Kubernetes Cluster small. Testing a few instance types would not need more than 2 or 3 Nodes and 1 small Master. Some infrastructure costs can be reduced this way when edge cases regarding the new deployment at the clustering and infrastructure level are being found.
  2. Figure out load balancing first. The overlay networking that is found in Kubernetes has to be specifically exposed to outside traffic. This can be done through several ways however, I’ve experienced that Kubernetes provided “Service LoadBalancer” is the simplest and the most effective to follow. More on this later.
  3. Logging and Monitoring is a huge deal. Compared to the existing deployment, where access to server logs is ridiculously easy, log aggregation and server monitoring can seem a bit tricky depending on how “indoctrinated” you are on Container Clustering. It’s better to adjust for different methods of log aggregation in the new deployment. You will not be able to just SSH in to a running instance and download logs.
  4. Adjust client expectations accordingly. This is non-technical but crucial. The move in to Container Clustered environment is not just a deployment upgrade. From the deployment’s point of view it’s a paradigm shift, and from your stakeholders’ point of view, the outcomes of the system that used to be, might change in quite a few ways. This might be the way your clients accessed the system, or the way your managers viewed the system. In any case, being the champion of the migration, you will have to do some kind of maneuvering to handle stakeholder expectations and move clients’ thinking to the next deployment.

WSO2 Kubernetes Artifacts

WSO2 Kubernetes Artifacts provide Replication Controllers, Service definitions, and helper scripts to deploy WSO2 products on Kubernetes. These, along with WSO2 Puppet Modules and WSO2 Dockerfiles, will contribute to a nicely integrated deployment pipeline to both deploy new products and roll out upgrades smoothly.

Solution Architecture

A typical WSO2 deployment on Kubernetes

If you’re already familiar with how Kubernetes works, the above diagram shouldn’t be much to decipher. However there are some key points here to discuss about.

Load Balancing and Exposing Services to Outside Access

Since Pod IP addresses are internal to the Kubernetes Overlay network, exposing the WSO2 services to outside access is done through an HAProxy Pod. The HAProxy ports are exposed via HostPorts or NodePorts and then can be mapped to ports in the External Load Balancer which faces the outside traffic. Therefore, the only traffic that comes from outside is directed through the HAProxy. In addition to the HAProxy binary, this “Service LoadBalancer” Pod also contains another minor job which constantly polls the Kubernetes API Server and configures the HAProxy configuration so that each Service and the IP addresses of the Pods belonging to the Service are included in the HAProxy “frontends” and “backends”. Unless the Service definition contains the following annotation, HAProxy will be loaded with the Service name as the backend.

apiVersion: v1
kind: Service
metadata:
annotations:
serviceloadbalancer/lb.host: mgt-gateway.am.wso2.com

The resulting “backend” entry in the HAProxy configuration would look like the following.

backend mgt-gateway.am.wso2.com  
option httplog
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
    balance roundrobin    

reqrep ^([^\ :]*)\ /apps-preprod-80[/]?(.*) \1\ /\2

cookie SERVERID insert indirect nocache

server 10.2.17.3:9763 10.2.17.3:9763 cookie s0 check port 9763 inter 5
server 10.2.17.4:9763 10.2.17.4:9763 cookie s0 check port 9763 inter 5

In this case, the Pod IPs that serve for the “sample-backend-name” are 10.2.17.3 and 10.2.17.4.

A key point to remember is that this configuration will load all the Service ports defined in a Service found with the above annotation. Therefore, for WSO2 servers, individual Services for each port that should be exposed, should be added separately. For example, for WSO2 API Manager, instead of just one Service with all the ports (9443, 9763, 8243, 8280, 7611, 7711), six Services with each service exposing just one Service port should be added. The resulting list of Service definitions look like the following

wso2am-default-7611-service.yaml
wso2am-default-7711-service.yaml
wso2am-default-8243-service.yaml
wso2am-default-8280-service.yaml
wso2am-default-9443-service.yaml
wso2am-default-9763-service.yaml

The Service definition for the default Carbon port 9443 looks like the following. Notice that Service name has been changed to resemble the port that it exposes.

serviceloadbalancer/lb.sslTerm

Another observation from the above are the following two additional Service annotations.

  1. serviceloadbalancer/lb.sslTerm — If set to “true”, the Service Load Balancer will terminate SSL and perform backend communication without SSL verification.
  2. serviceloadbalancer/lb.cookie-sticky-session — If set to “true” the Service Load Balancer will set and check a cookie to handle Session Affinity based on it.

One complication that arises when exposing different ports through different Services is that it makes inter-Service communication a bit more complex. Some configurations in WSO2 servers ask only for a host name of another component (ex: Key Manager) and then will use that host name to talk to a number of ports on the particular host. Exposing different ports through different Services will break this scenario. To overcome this, we can use another Service where all the ports are exposed, but make the Service Load Balancer ignore that particular Service from load balancing (since we don’t need load balancing between different ports). The typical approach to achieving this is to introduce a new Service annotation and modify the Service Load Balancer binary to ignore Services with that annotation.

Another important fact to keep in mind is that the Service Load Balancer should be tied to one fault tolerant Node. This should be done especially if the HAProxy ports are being exposed via HostPorts. To do this, Kubernetes Node Affinity can be used.

Deployment Monitoring

Monitoring the state of the Kuberneres deployment and WSO2 deployment on top of it should be approached with a different method in mind. A key difference between a Containerized approach and a VM approach to a WSO2 deployment is that unlike VMs, Containers, especially Clustered Containers would not be guaranteed to run for longer periods of time. The contract in Kubernetes is to keep the Service available, not the Pod, not the Containers inside the Pods. Containers in Kubernetes also don’t have externally visible IP addresses. Furthermore, Containers tend to be designed with only a single process in mind, and background jobs such as Nagios agents would disrupt the Container life-cycle if they tend to stay alive even after the main processes, i.e. after WSO2 servers go down.

One popular approach to monitor a Kubernetes based deployment is Heapster. Heapster collects usage information for Pods and Nodes and uses InfluxDB as the storage backend. The included Grafana dashboard will display the CPU, Memory, and Network usage metrics for the collected Pods and Services. This way, even if a Container keeps crashing (ex: because of OOM error), selecting the Pod to display Pod memory usage will conveniently visualize the metrics.

One concern here is to provide InfluxDB with consistent Persistence. Heapster can be deployed so that only the Heapster Pod is inside the Kubernetes deployment, and InfluxDB and Grafan are moved outside. However this would take a considerable effort since when installing InfluxDB from the provider, only the latest few versions are provided which are incompatible with the version provided by Kubernetes. It looks like the Kubernetes provided Heapster stack contains an older version of InfluxDB, and the provided Grafana dashboard queries are only compatible with that version. If this state later changes and Kubernetes decides to ship a newer compatible version of InfluxDB, it would be better to move it to a more Persistence friendly location. Until that happens, the InfluxDB Pod should also be tied to a single Node through defining Node Affinity. Furthermore, since rarely a need comes to browse through really old metrics (ex: memory usages twelve months back), it would be better to periodically flush the InfluxDB storage.

Docker Registry

Although Docker Hub provides a simple Docker registry, for production deployments it would be a better idea to own and manage a Private Docker Registry. This makes rolling out upgrades easier and corporate data can be better handled without letting private Docker images being stored in public servers. Make this Registry only accessible from the Kubernetes nodes, and additionally through the Docker builder machine for better security.

Log Aggregation

Aggregating WSO2 server logs becomes of more importance when the ephemeral nature of the Containers is taken in to account. The moment a Pod or a Container goes down, all the logs are lost forever and you wouldn’t know what happened.

While WSO2 server logs can be collected as Log4J Events directly to Logstash, better alternatives exist in the Kubernetes eco-system.

  • FluentD — FluentD is deployed as a DaemonSet, which is basically a Pod per every Node. What it does is to read all log files inside “/var/log/containers” and “/var/lib/docker/containers” and publish each entry to ElasticSearch. For this to happen, the FluentD Pod should be mounted with “/var/log” and “/var/lib/docker/containers” from the host. FluentD’s Kubernetes Metadata Filter will also enrich each Log event by polling the Kubernetes API to populate metadata like Pod Name, Service Name etc.
FluentD Log Aggregation
  • FileBeat — FluentD will only read the above mentioned locations and they are really not the logs of WSO2 servers themselves. The logs files inside “/var/lib/docker/containers” (and their sym-links inside “/var/log/containers”) are what is written to the stdout of the Pods. For WSO2 Docker images, if built using WSO2 Dockerfiles, this would be the “wso2carbon.log” file itself, as the main process of the Docker image is “wso2server.sh” command. However, if there are any logs that need to be collected that do not output the stdout of the Pod (such as the http_access logs), using FileBeat to publish logs to Logstash is the best option. To achieve this, we could include FileBeat inside the WSO2 Container itself, but the better option would be to add a separate FileBeat container to each WSO2 Pod, and share a Volume mount of the log locations.
Filebeat based Log Aggregation

As easy as it sounds, log aggregation might turn out to be a headache, depending on how you choose to look at the new deployment. The key point to keep in mind is that this is not just a deployment upgrade. You just stepped in to the completely different paradigm that is Container Clustering. You do not expect your new tractor to produce smaller offspring after being paired with a stud as your old cow used to do, and you or your clients should not. You now operate with different parameters. As long as you keep that in mind, managing above mentioned concerns will be a walk in the park with the right technical tools.