access and debug your kubernetes service / docker container

Alex Leonhardt
Aug 26, 2018 · 4 min read

so you have followed the basic steps to get a service deployed into kubernetes, let’s say the service is Nginx, because why not …

$ kubectl create deployment ngx --image=nginx:latest

let’s scale that up to 2, because, really running just 1 pod is a bit too basic

$ kubectl scale deployment --replicas 2 ngx

you have 2 Nginx pods running on your k8s “cluster”, I use the built-in k8s installation that comes with Docker Edge

$ kubectl get podsNAME                   READY     STATUS    RESTARTS   AGE
ngx-5cb59c856c-cmn8p 1/1 Running 0 52m
ngx-5cb59c856c-s65nh 1/1 Running 0 52m

now that we have 2 pods, we need to somehow access the service so we can do some basic checking ourselves, e.g. to check that things work as expected, before we make this service publicly available via a LoadBalancer or Ingress — so we expose it as an internal service for now

$ kubectl expose deployment ngx --port 8080 --target-port 80

using

$ kubectl proxy &

you can then access your service (securely, it’s actually a encrypted tunnel) via

http://localhost:8001/api/v1/namespaces/default/services/ngx:8080/proxy/

details of how that works are here, but in a nutshell, you can access any internal service via

http://localhost:8001/api/v1/namespace/<namespace>/services/<service-name>:<port>/

so this is where you notice that one of the containers is behaving strangely, and sometimes seems to be unable to connect to a backend or just works intermittently, etc. (something’s wrong, basically) and so you decide, you’d like to run ngrep or tcpdump or strace to figure out what’s going on, but you also don’t want to modify the container image … so now what do I do?

As long as you have access to the node running the container instance, you’re in luck — in this example we’ll just use the local Docker 4 Mac installation, but it should work with any node running Docker containers.

Once you’re logged on to the node running the container, you should be able to find out which on that is by doing

$ kubectl describe pod ngx-5cb59c856c-cmn8pName:           ngx-5cb59c856c-cmn8p
Namespace: default
Node: docker-for-desktop/192.168.65.3

I initially tried editing the pod with

$ kubectl edit pod ngx-5cb59c856c-cmn8p

but saving that config will fail as you cannot add/remove containers from a pod :( — anyway, we’re on the node that runs the container, we can create a custom debug container locally and run that inside the same pid and network namespace as the existing ngx-5cb59c856c-cmn8p.

The Dockerfile could be something like this (shamelessly copied/used from https://medium.com/@rothgar/how-to-debug-a-running-docker-container-from-a-separate-container-983f11740dc6)

FROM alpine
RUN apk update && apk add strace
CMD ["strace", "-p", "1"]

run

$ docker build -t strace .

and once the container is built (in reality, you’d build a consistent debug container and make that available for you to pull anytime from GCR or ECR or wherever) you run it with --privileged (yes, we don’t care about security in this example, see Justin Garrison’s post on how to do this more restrictive, so you’re able to do all the things and not have to fight permissions).

To attach to the pid and net namespace, you need the container Id or name, that’s easy to find with

$ docker ps | grep nginx6b6e65ebc7c8        nginx                                      "nginx -g 'daemon of…"   About an hour ago   Up About an hour                        k8s_nginx_ngx-5cb59c856c-cmn8p_default_402e0d53-a933-11e8-93cb-025000000001_0
e245d91ba045 nginx "nginx -g 'daemon of…" About an hour ago Up About an hour k8s_nginx_ngx-5cb59c856c-s65nh_default_36a3a1e7-a933-11e8-93cb-025000000001_0

So we’ll use the first one which is 6b6e65ebc7c8

docker run -ti --pid=container:6b6e65ebc7c8 --net=container:6b6e65ebc7c8 --privileged strace /bin/ash

Once executed, you need the PID that is actually doing the work, PID 1 is actually the parent Nginx process, but that is not processing any requests, it’s just managing the child processes

/ # ps -ef
PID USER TIME COMMAND
1 root 0:00 nginx: master process nginx -g daemon off;
6 101 0:00 nginx: worker process
44 root 0:00 /bin/ash
50 root 0:00 ps -ef
/ #

okay, so let’s do an strace on PID 6 as that is actually doing work …

/ # strace -fp 6
strace: Process 6 attached
gettimeofday({tv_sec=1535294355, tv_usec=751153}, NULL) = 0
epoll_wait(8, [{EPOLLIN, {u32=2097902081, u64=139627189727745}}], 512, 61010) = 1
gettimeofday({tv_sec=1535294359, tv_usec=908313}, NULL) = 0
recvfrom(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 1024, 0, NULL, NULL) = 213
stat("/usr/share/nginx/html/index.html", {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
open("/usr/share/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 11
fstat(11, {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
writev(3, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=238}], 1) = 238
sendfile(3, 11, [0] => [612], 612) = 612
write(5, "10.1.0.1 - - [26/Aug/2018:14:39:"..., 111) = 111
close(11) = 0
epoll_wait(8, [{EPOLLIN, {u32=2097902081, u64=139627189727745}}], 512, 65000) = 1
gettimeofday({tv_sec=1535294361, tv_usec=971440}, NULL) = 0
recvfrom(3, "GET / HTTP/1.1\r\nHost: localhost:"..., 1024, 0, NULL, NULL) = 213
stat("/usr/share/nginx/html/index.html", {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
open("/usr/share/nginx/html/index.html", O_RDONLY|O_NONBLOCK) = 11
fstat(11, {st_mode=S_IFREG|0644, st_size=612, ...}) = 0
writev(3, [{iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=238}], 1) = 238
sendfile(3, 11, [0] => [612], 612) = 612
write(5, "10.1.0.1 - - [26/Aug/2018:14:39:"..., 111) = 111
close(11) = 0
epoll_wait(8,
^Cstrace: Process 6 detached
<detached ...>

And there it is, 2 requests to this particular Nginx instance without having to

  • redeploy the Pod with an additional debug container (you could argue that this would be better, but you may not be able to re-produce the issue straight away and may need to run it for a long time which costs resources)
  • modify the Dockerfile in any way (install debug tools)
  • change privileges on the running container, it can keep running in its more secure environment vs the debug container which has additional capabilities

the nice thing about this pattern is that you can create yourself a debug container that you can re-use to debug applications running on any node that runs Docker (ECS, on-prem K8S, EKS, AKS, GKE).

Happy debugging!

Credits & Resources:

{ ovni }

it’s out of this world

Alex Leonhardt

Written by

{ ovni }

{ ovni }

it’s out of this world

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade