Fixing a crashed container on OpenShift

Xavier Coulon
4 min readMay 31, 2018

--

Photo by Philip Swinburn on Unsplash

This article provides a solution to fix a problem with a pod that did not start because of a configuration error in the command to run. There are certainly other ways to diagnose the cause of the problem, but I’m sharing this one since it worked for me.

Lastly, I‘ve been giving a try to the NATS Streaming Server to learn more about its support for durable subscriptions. Since the streaming server couldn’t be deployed on OpenShift with an operator yet, I decided to opt for a regular deployment using the existing Docker image.

The deployment and service manifests that I used looked as below:

$ cat nats-streaming-server.yaml
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nats-streaming-server
spec:
replicas: 1
template:
metadata:
labels:
name: nats-streaming-server
spec:
containers:
- name: "nats-streaming-server"
image: "nats-streaming:0.9.2"
command: ["nats-streaming-server"]
args: ["-m","8222"]
---
apiVersion: v1
kind: Service
metadata:
labels:
name: nats-streaming-server
name: nats-streaming-server
spec:
ports:
- name: client
port: 4222
protocol: TCP
targetPort: 4222
- name: mgmt
port: 8222
protocol: TCP
targetPort: 8222
selector:
name: nats-streaming-server
type: ClusterIP

Note that I needed to specified the command and args values in the spec.template.spec.containers element to enable the management port, so I could later monitor the activity on the platform, but that’s another story.

After running the oc apply command, I checked the pods and…

$ oc get pods
NAME READY STATUS
nats-streaming-server-6d56df9445-tjwdw 0/1 RunContainerError

Ahem…

Ok then, time to look at what’s happening with this pod. First, let’s see if the pod’s events can provide us with some information:

$ oc describe pod/nats-streaming-server-6d56df9445-tjwdw
Name: nats-streaming-server-6d56df9445-tjwdw
...
Events:
Type Reason Message
---- ------ -------
Normal Scheduled Successfully assigned nats-streaming-server-6d56df9445-tjwdw to localhost
Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-bvdmj"
Normal Pulled Container image "nats-streaming:0.9.2" already present on machine
Normal Created Created container
Warning Failed Error: failed to start container "nats-streaming-server": executable not found in $PATH
Warning BackOff Back-off restarting failed container

So here the problem: failed to start container “nats-streaming-server”: executable not found in $PATH

The command value [“nats-streaming-server”] in the deployment manifest is somehow wrong, but what should it be, then ? Can we just oc rsh in the container to compare the value of $PATH with the path to nats-streaming-server?

$ oc rsh nats-streaming-server-6d56df9445-tjwdw
error: unable to upgrade connection: container not found ("nats-streaming-server")

Nope.

`docker export` to the rescue

The docker export command allows for exporting a container’s filesystem as a tar file in the host, so it’s easy to check the content afterwards. But first, the CLI needs to be configured to in such a way that the docker command targets the Docker daemon running on the OpenShift (well here, Minishift):

# configure the docker environment variables
$ eval $(minishift docker-env)
# retrieve the id of the nats-streaming-server image
$ docker images | grep nats-streaming
nats-streaming 0.9.2 bf688abfd477 8 weeks ago 10.7MB
# retrieve the id of the container running this
# 'nats-streaming' image
$ docker ps -a | grep bf688abfd477
c55940575b59 bf688abfd477 "nats-streaming-serv…" Created
# export the content of the container in a tar file
$ docker export -o nats-streaming.tar c55940575b59
# inspect the content of the tar file
$ tar tvf nats-streaming.tar | grep nats-streaming-server
-rwxrwxr-x 10725344 Apr 3 21:40 nats-streaming-server

Note that the container is in a Created state, which explains why the oc rsh command tried earlier could not work.

One the one hand, the archive contains the expected nats-streaming-server binary, but located at the root of the filesystem. On the other hand, the value of the PATH environment variable itself can be found by inspecting the container, in particular, the Config.Env value (an array of strings):

$ docker inspect c55940575b59 -f '{{ range $index, $env := .Config.Env }}{{ println $env }}{{ end }}' | grep "PATH="
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

So that’s it! The nats-streaming-server file at the root of the container is not in $PATH. It’s now just a matter of changing the value of the command element to ["/nats-streaming-server"] in deployment manifest and applying it again, and the new pod is now running \o/

$ oc apply -f openshift/nats-streaming-deployment.yaml
deployment "nats-streaming-server" configured
service "nats-streaming-server" unchanged
$ oc get pods
NAME READY STATUS
nats-streaming-server-66d45c8746-wwf7l 1/1 Running

--

--

Xavier Coulon

Halftime dad of two. Swimmer/cyclist/runner and occasionally triathlete. I develop tools for developers on OpenShift.