Cloudprober

Daz Wilkin
Google Cloud - Community
5 min readMar 31, 2018

An architect at a customer asked for tools to help him diagnose the source of tail-latencies between services in a Kubernetes Engine cluster. My colleague pointed me to Cloudprober. In my often-futile attempt to try to stay one step ahead of my customers, I thought I’d spend my deep-learning time today exploring it. It’s Google OSS. It’s written in Golang. It surfaces Prometheus endpoints. What’s not to like!? It’s neat!

Docker

Cloudprober is available as a container image *but* I think the image needs a small tweak to enable providing custom configurations and for better support with Kubernetes. If you have challenges with the cloudprober/cloudprober image, you may use dazwilkin/cloudprober:0.9.3-32-g21f071e. I’ll use my image hereon.

The tweak is simply to use ENTRYPOINT to run cloudprober rather than CMD as this permit us freedom to use different flags to cloudprober.

Kubernetes

I wrote yesterday about Stackdriver Profiler and provided 2 samples (Golang, Node.JS) that I deployed as services to Kubernetes. Picking up from there (but hopefully you may adapt for your Kubernetes services), I’m going to configure probers for the HTTP endpoints (on port 8080) that these surface. Primarily to test Cloudprober.

My services are both deployed to the default (I’m lazy) namespace. The Golang profiler is exposed as a service called golang-profiler (this service has an in-cluster DNS name golang-profiler.default.svc). The Node.JS profiler is exposed as a service called nodejs-profiler (this service has an in-cluster DNS name of nodejs-profiler.default.svc).

Kubernetes Dashboard: “Default” namespace services

Aside: Port-Forward

You may wish to skip this step but, in order to test the Cloudprober locally, a hacky but nice trick is to port-forward to the services’ NodePorts through ssh, check the config and, when happy with that, deploy the Cloudprober to Kubernetes and have more confidence that it works.

The following bash commands grab the instance name for the first node in our cluster, the Node Ports for each of the two services and then set up SSH port-forward using gcloud so that these services are available on these ports on localhost:

NODE_HOST=$(\
kubectl get nodes \
--output=jsonpath="{.items[0].metadata.name}")
GOLANG_PORT=$(\
kubectl get services/golang-profiler \
--output=jsonpath="{.spec.ports[0].nodePort}")
NODEJS_PORT=$(\
kubectl get services/nodejs-profiler \
--output=jsonpath="{.spec.ports[0].nodePort}")
gcloud compute ssh ${NODE_HOST} \
--project=${GOOGLE_PROJECT_ID} \
--ssh-flag="-L ${GOLANG_PORT}:localhost:${GOLANG_PORT}" \
--ssh-flag="-L ${NODEJS_PORT}:localhost:${NODEJS_PORT}"

So we may now write a Cloudprober config file (temporarily using these ports) to test Cloudprober:

cloudprober.profilers.cfg:

NB You will need to replace the ${GOLANG_PORT} in line 9 and ${NODEJS_PORT} in line 23 with their values on your system.

NB The host_names values in lines 5 and 19 are both localhost because we’ve port-forwarded these Kubernetes Node Ports locally. I’ve left as comments, the values we’ll need to use (for my services) when I deploy to Kubernetes.

OK. I think this won’t work unless you’re using my tweaked version of the Cloudprober container image. So:

docker run \
--net=host \
--publish=9313:9313 \
--volume=$PWD/cloudprober.profilers.cfg:/cloudprober.cfg \
dazwilkin/cloudprober:0.9.3-32-g21f071e \
--config_file=/cloudprober.cfg \
--logtostderr

NB The --net=host flag is required because the container needs to access the host’s ${GOLANG_PORT} and ${NODEJS_PORT ports.

All being well, you will see logging output. More usefully, you may wish to access the Cloudprober’s Prometheus metrics endpoint. This will be available on:

http://localhost:9313/metrics

Cloudprober’s Prometheus metrics endpoint

And, if you’d like to see this data through Prometheus:

prometheus.yml:

NB As before, please replace ${GOLANG_PORT} in line 5 and $(NODEJS_PORT} in lin 9 with the values on your system.

Then you may run Prometheus locally (!) against these local endpoints:

docker run \
--net=host \
--publish=9090:9090 \
--volume=$PWD/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

and to access the Prometheus Targets to confirm they’re correctly configured:

http://localhost:9090/targets

Prometheus “Targets”

I’ll leave you to play with the graphing awesomeness but:

Prometheus “Graph”

OK. Everything works.

If you’re done here, you may terminate the Prometheus container, terminate the gcloud compute ssh port-forward, tidy-up, pat yourself on the back and move on.

Let’s deploy to Kubernetes!

Kubernetes

We need only revise the Cloudprober config to reflect the in-cluster names of our services.

NB I’ve revised the host_names values in lines 5 and 19 to reflect the full (!) in-cluster DNS names. I’ve reverted the port values in lines 9 and 23 to reflect the service ports (instead of the Node Ports used for off-cluster access).

We need to deploy the Cloudprober image to Kubernetes but before we do that, we need to upload the configuration. We’ll use a ConfigMap to hold the configuration file:

kubectl create configmap cloudprober-profilers-config \
--from-file=cloudprober.cfg=cloudprober.profilers.cfg

Now you just need a deployment.yaml:

You may:

kubectl apply --filename=deployment.yaml

OK… If you’d like you may port-forward the cloudprober service’s Node Port as we did before. In this case, let’s just have a look at its logs:

POD=$(\
kubectl get pods \
--selector=app=cloudprober-profilers \
--namespace=default \
--output=jsonpath="{.items[0].metadata.name}")
kubectl logs pods/$POD cloudprober --follow

and here is an example of (redacted) logs:

cloudprober 1522451121742369580 1522451141 labels=ptype=http
probe=nodejs-profiler
dst=nodejs-profiler.default.svc.cluster.local total=1 success=1
latency=97108.784 timeouts=0
resp-code=map:code,200:1 resp-body=map:resp
cloudprober 1522451121742369585 1522451141 labels=ptype=http
probe=golang-profiler
dst=golang-profiler.default.svc.cluster.local total=1 success=1 latency=92348.214 timeouts=0
resp-code=map:code,200:1 resp-body=map:resp
cloudprober 1522451121742369592 1522451161 labels=ptype=http
probe=nodejs-profiler
dst=nodejs-profiler.default.svc.cluster.local total=2 success=2 latency=197668.887 timeouts=0
resp-code=map:code,200:2 resp-body=map:resp
cloudprober 1522451121742369593 1522451161 labels=ptype=http
probe=golang-profiler
dst=golang-profiler.default.svc.cluster.local total=2 success=2 latency=180111.898 timeouts=0
resp-code=map:code,200:2 resp-body=map:resp

and the accompany service provides an endpoint on which Prometheus metrics are served. I’ll leave you to deploy Prometheus to your cluster and configure it.

Golang

There’s no Golang code today :-( Well, that’s not entirely true. It’s just not yet ready. I was thinking that it would be useful to have Cloudprober auto-configure based on services published in specific namespace(s). I’ve a rough cast of the code working and hope to publish something here next week. I’ll update this post when I do.

Conclusion

Cloudprober is an interesting tool and I intend to spend more time getting to grips with it over the coming week.

--

--