OpenShift 3.11 Day Two Operations — Environment health checks

Tosin Akinosho
5 min readDec 26, 2019

--

Now we Deployed OpenShift 3.11 in Production whats Next?

This article will discuss different ways to monitor your OpenShift Environment after it has been deployed.

Run a sanity check — A sanity script or smoke test will determine whether your environment is running properly or not after a deployment.

Example Sanity Script

Environmental Health Checks

Check hosts health — Are tests that verify that your cluster is up and running the command oc get nodes can be run on the master instance to verify this.

Check Router and Registry health — verify that router router service is running. The router allows for external users to get access to your application.

Check network Connectivity on Masters — Master nodes keep there state synchronized using etcd key-value store. The communication between the master ans etcd is very important. The communication occurs on ports 2379 and 2380.

Check the connectivity on masters hosts — A very good thing to test in your OpenShift environment is dns resolution on each node.

Example

$ dig +short docker-registry.default.svc.cluster.local
172.30.150.7

The API Service and web console share the same port on OpenShift. The port can be 8443 or 443 depending on your setup. This port must be available within the cluster and to everyone who needs to work with the environment. Below is an example query on a master node.

$ curl -k https://internal-master.example.com:443/version
{
"major": "1",
"minor": "6",
"gitVersion": "v1.6.1+5115d708d7",
"gitCommit": "fff65cf",
"gitTreeState": "clean",
"buildDate": "2017-10-11T22:44:25Z",
"goVersion": "go1.7.6",
"compiler": "gc",
"platform": "linux/amd64"
}

Here is another example ran form a laptop or outside the cluster.

$ curl -k https://master.example.com:443/healthz
ok

Check connectivity on node instances — SDN pod communication uses UDP port 4789 by default. You can verify host functionality by creating a new application. For example, the sanity test script from above can be run to verify functionality.

Storage — Master instances need at least 40 GB of hard disk space in the var directory. Node instances need at least 15 GB of space for the /var directory. Docker uses this directory for storage. If this storage gets full containers will fail to run on target nodes.

Check Docker Storage — Docker Storage can be backed by two options the first option is to use thin pool logical volume with device mapper. The second is to use overlay2 file system found on Red Hat Enterprise Linx 7.4 and above. The overlay2 file system is recommended for its increased performance on systems.

Check for storage type on hosts

$ cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS='--storage-driver overlay2'

Verify the storage driver Docker is currently using.

$ sudo docker info
Containers: 29
Running: 27
Paused: 0
Stopped: 2
Images: 13
Server Version: 1.13.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: 9c3c5f853ebf0ffac0d087e94daef462133b69c7 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
WARNING: You're not using the default seccomp profile
Profile: /etc/docker/seccomp.json
selinux
Kernel Version: 3.10.0-1062.4.1.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 4
Total Memory: 15.51 GiB
Name: qbn-ocp3-master01.lab.example
ID: FVHG:VKWZ:RW33:K5JY:LBDJ:ERON:LHIP:GMAP:CQUU:KRCH:NOQX:YGA2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://registry.redhat.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Registries: registry.redhat.io (secure), docker.io (secure), docker.io (secure)

API Service Status

The OpenShift API service runs on all master instances. To check the status of the service check the kube-system namespace (project).

$ oc get pod -n kube-system -l openshift.io/component=api
NAME READY STATUS RESTARTS AGE
master-api-qbn-ocp3-master01.lab.example 1/1 Running 0 1d

The API service Exposes a health check that can be queried externally using the API host name.

$ oc get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
master-api-qbn-ocp3-master01.lab.example 1/1 Running 0 1d 192.168.1.161 qbn-ocp3-master01.lab.example <none>
master-controllers-qbn-ocp3-master01.lab.example 1/1 Running 0 1d 192.168.1.161 qbn-ocp3-master01.lab.example <none>
master-etcd-qbn-ocp3-master01.lab.example 1/1 Running 0 1d 192.168.1.161 qbn-ocp3-master01.lab.example <none>

$ curl -k https://qbn-ocp3-master01.lab.example:8443/healthz
ok

Links:

--

--

Tosin Akinosho

Associate Principal Solution Architect @RedHat. Cloud & DevOps Enthusiast. AI Integrator. Passionate about sharing knowledge and driving innovation.