Oracle Developers
Published in

Oracle Developers

Evaluating and improving the performance of your ingress controller on Oracle Kubernetes Engine with Locust

I’ve written about using different Ingress Controllers with OKE before. I’ve also written about using many of them simultaneously in the same cluster. In this article, we’ll look at how you can evaluate and improve the performance of your ingress controller.

Let’s set up our cluster first. Below is the Terraform code for creating the cluster:

One thing you’ll notice: we have four node pools:

node_pools = {
np1 = { shape = "VM.Standard.E4.Flex", ocpus = 2, memory = 32, node_pool_size = 1, label = { app = "nginx"} }
np2 = { shape = "VM.Standard.E4.Flex", ocpus = 2, memory = 32, node_pool_size = 1, label = { app = "monitoring"} }
np3 = { shape = "VM.Standard.E4.Flex", ocpus = 2, memory = 32, node_pool_size = 3, label = { app = "acme"} }
np4 = { shape = "VM.Standard.E4.Flex", ocpus = 2, memory = 32, node_pool_size = 5, label = { app = "roadrunner"} }
}

Node pool np1 will be where we‘ll run our ingress controller, np2 will run our monitoring infrastructure such as Grafana and Prometheus, np3 will run our service, in this case, a simple website for Acme Corp and finally, np4 will run Locust and generate the load.

Ingress controllers also run as pods, that is, they’ll run on a worker node somewhere. If we want to study performance and subsequently improve it, we need to ensure we isolate it and its traffic from the rest of the things we are running in this test such as the monitoring stack and the load generator. This means ensuring only the ingress controller pods land on our selected worker nodes (and therefore node pools) while also keeping the other pods such as Prometheus, Grafana, and Locust out. To achieve this, we’ll use OKE’s ability to let you configure the initial node labels on each node pool. My colleague Tim Graves is a Road Runner fan and since we’ll be accessing ACME Corp’s website as our sample application, he very helpfully suggested I use roadrunner as the label instead of locust. Here goes then.

Add the helm repo for kube-prometheus-stack and generate the helm manifest:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm show values prometheus-community/kube-prometheus-stack > kps.yaml

Use the generated manifest to locate the nodeSelectors and add the following:

app: monitoring

As well as change the following to false:

serviceMonitorSelectorNilUsesHelmValues: false

You can now install kube-prometheus-stack:

helm install kps --namespace monitoring prometheus-community/kube-prometheus-stack -f kps.yaml --create-namespace

In order to understand what’s going on with the Ingress Controller, let’s connect to Grafana:

kubectl --namespace monitoring port-forward svc/kps-grafana 3000:80

and import a couple of dashboards with the following ids: 9614 and 14314.

For the purpose of this exercise, we’ll be using the community NGINX Ingress controller but you can use any other controller of your choice. Add the helm repo and generate the helm manifest:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm show values ingress-nginx/ingress-nginx > nginx.yaml

Configure the following parameters:

controller:
nodeSelector:
app: nginx
admissionWebhooks:
patch:
nodeSelector:
app: nginx
metrics:
enabled: true
serviceMonitor:
enabled: true
defaultBackend:
nodeSelector:
app: monitoring

Setting the node selector will ensure that your nginx pod will land on the worker nodes with the matching labels, in this case, nginx. You can now install the ingress controller:

helm install nginx ingress-nginx/ingress-nginx --namespace nginx -f nginx.yaml --create-namespace

and verify that the ingress controller pod has landed on your selected worker node by first identifying the node with the correct label:

kubectl get nodes --show-labels | grep nginx
10.0.109.8 Ready node 23h v1.24.1 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=VM.Standard.E4.Flex

Then, if we describe the pod:

kubectl --namespace nginx get pod nginx-ingress-nginx-controller-b8d7d4dd9-86pn4 -o json | jq '.status.hostIP'
"10.0.109.8"

Next, we’ll install the sample Acme Corp’s website:

wget https://raw.githubusercontent.com/hyder/okesamples/master/ingresscontrollers/acme/acme-website.yaml
wget https://raw.githubusercontent.com/hyder/okesamples/master/ingresscontrollers/nginx/acme-website-ingress.yaml

Modify the deployment to add the node selector:

apiVersion: apps/v1
kind: Deployment
metadata:
name: acme-website
spec:
selector:
matchLabels:
app: acme-website
replicas: 3
template:
metadata:
labels:
app: acme-website
spec:
nodeSelector:
app: acme
containers:
- name: acme-website
image: lmukadam/acmewebsite:latest
ports:
- containerPort: 80
resources:
limits:
memory: "128Mi"
cpu: "500m"

Change the host parameter in the ingress manifest to put an FQDN on which you have control. If you don’t have one, you can always use nip.io. So, for example, my public IP address for the load balancer is 192.9.171.147. In order to use nip.io, you can use “192.9.171.147.nip.io” for the host parameter in the ingress. Create the deployment and the ingress:

kubectl --namespace nginx apply -f acme-website.yaml
kubectl --namespace nginx apply -f acme-website-ingress.yaml

Verify, you can now access Acme Corp’s website using the FQDN:

Finally, let’s install Locust. First, create a locustfile:

import time
from locust import FastHttpUser, task, between
class AcmeUser(FastHttpUser):
wait_time = between(1, 3)
host = "replace.me"
@task
def hello_world(self):
self.client.get("/")

Notice we are using FastHttpUser instead of the default. This should give us 5–6 times more requests. Next, create a configmap for the locustfile:

kubectl create namespace locust
kubectl --namespace locust create configmap acme-locust --from-file ./locustfile.py

And get the yaml manifests for locust:

helm repo add deliveryhero https://charts.deliveryhero.io/
helm show values deliveryhero/locust > locust.yaml

Edit the locust.yaml:

locust_locustfile_configmap: "acme-locust"
# worker
worker:
replicas: 200
nodeSelector:
app: roadrunner

You can now install locust:

helm --namespace locust install locust deliveryhero/locust -f locust.yaml

If you list the pods in the locust namespace, you should be able to see 101 (100 workers +1 master) of them. Below is a useful command to count the number of locust pods, courtesy of Tim Graves and julien silverston:

expr `k -n locust get pods --no-headers | wc -l` - 1

From a terminal, port-forward to the locust service:

kubectl -n locust port-forward svc/locust 8089:8089

Start a new test:

Hit “Start Swarming” and click on “Charts”. You should see the progress of your load test. Let the test run for around five minutes and then hit “STOP”.

On the Grafana NGINX Ingress Controller Dashboard, you should be able to see changes in ingress volume, latency, memory and CPU usage. Let’s try to understand what’s going on and see if there’s any scope for improvement.

Similarly, on the Ingress NextGen Dashboard, we can check the latency panel and we see that the average latency for the 90th percentile is 276ms. Can we improve this?

Let’s try enabling TCP-BBR. The good news is that this is already in the Linux Kernel!

TCP-BBR is a feature that can be used to achieve higher bandwidth and lower latency for internet traffic and can offer significant performance improvements for internet-based applications. BBR (Bottleneck Bandwidth and Round-Trip Time) is a scheduling algorithm that helps to control the transmit rate of the TCP protocol to reduce buffering by monitoring round-trip times against bandwidth bottlenecks to reduce TCP congestion.

Edit /etc/sysctl.conf and add the following:

net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

Then reload it:

sysctl -p

and check if it’s effective:

sysctl net.ipv4.tcp_congestion_control

and this should give us:

sysctl net.ipv4.tcp_congestion_control

Let’s run locust again with the same parameters and see the effect on latency if any. Immediately, we see a more stable response time chart compared to our first attempt:

Similarly, on Grafana, we see a much slower smaller latency for all percentiles:

For the 90th percentile, the average latency is now 82.6ms, that is, an improvement by a factor of 3.3. Similarly, at the 95th percentile, the improvement in latency is almost at double. Only at the 99th percentile is the improvement negligible. This is a relatively simple change but using this method, we can test the system, gather the metrics and use them to locate possible problems either in settings, architecture or technology choice using a data-driven approach. Naturally, the changes you make depend on your use case and many other variables.

In this article, we deployed nginx-ingress as our ingress controller, generated load using locust and captured the generated metrics. We then analysed them using Grafana and identified a potential latency issue, which we addressed using TCP-BBR. Finally, we tested again to see if this has resulted in a better performance or whether we need to revisit our technology stack.

Obviously, there’s a lot more to explore and there are many different directions you can take this e.g. testing your application, storage, your cluster size and so on. If you want to get started in performance techniques, I strongly suggest reading this thesis. It provides a concise overview of the different performance techniques and will serve you in good stead before you go on to more complicated exercises.

I hope you enjoy this article.

Pain(less?) NGINX Ingress

Tuning NGINX for performance

Optimizing web servers for high throughput and low latency

Optimizing HTTP/2 prioritization with BBR and tcp_notsent_lowat

BBR, the new kid on the block

Let’s chat about it on the Developer Slack!

--

--

Aggregation of articles from Oracle engineers, Groundbreaker Ambassadors, Oracle ACEs, and Java Champions on all things Oracle technology. The views expressed are those of the authors and not necessarily of Oracle.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store