Install OpenShift 4.3 (UPI) on PowerVM using PowerVC

Yussuf Shaikh
17 min readApr 20, 2020

--

In this post, I will explain the detailed steps required for installing Openshift 4.3 on PowerVM using PowerVC. Also, I will share the Terraform module created using the UPI mode of installation on PowerVC.

OpenShift 4.3 is now GA to install from https://cloud.redhat.com/openshift/install/power/user-provisioned. Please refer to OpenShift on IBM Power blog for more information.

Table of Contents

Introduction

To install OCP 4.3 on PowerVM we refer to the UPI documentation from Openshift. Here the bastion node and cluster nodes are created by the user on which OpenShift will be installed. We also need to set up a DHCP, DNS and HAPROXY service for the working of the cluster. The bastion node will be used for this purpose. Also, note that we will be using the HTTP server on the bastion node to host the bootstrap ignition file required for booting up the bootstrap node.

Also to create the resources on PowerVC we will make use of OpenStack client which is already installed and configured. It is better to use the client as we will be using CLI throughout. You could use alternative ways as well.

Collecting Information

Let’s gather all the details of the PowerVC server where we will be going to deploy the cluster.

Ensure the PowerVC server have the RHEL image and RHCOS image available to boot. For more information on how to import the RHCOS image in PowerVC follow these steps. Note that we use RHCOS for worker nodes as supported for Power.

We will use a single network for all the nodes and the traffic to the cluster will always flow from the bastion node.

Make use of appropriate flavors for the machines as recommended below. Note that we are using RHCOS for all the cluster nodes.

Node Information

Things to keep handy:

  • Cluster Name: A name for your cluster. Keep it short and simple. export CLUSTER_NAME=<Cluster Name>
  • Base Domain: Domain name for your cluster. eg: example.com export BASE_DOMAIN=<Base Domain>
  • Pull Secret: Secret required for downloading OCP images. You can get this from https://cloud.redhat.com/openshift/install/pull-secret. In case you are planning to use private repo for the release image then add the repo token to the list.
  • SSH Public Key: SSH public key content to inject to the cluster nodes. eg: cat /home/user/.ssh/id_rsa.pub
  • The number of master nodes: For this blog, we will use 3 nos.
  • The number of worker nodes: For this blog, we will use 2 nos. Increase this no as per your workloads.
  • Network Name: Name of the network where the cluster nodes will be deployed. export NETWORK_NAME="<Network Name>"
  • RHCOS Image ID: Image id of the RHCOS image you have imported to PowerVC. export RHCOS_IMAGE_ID="<RHCOS Image ID>"

Install Configuration

Bastion node

To start with the installation we will need to create RHEL 8 or RHEL 8.1 Power VM, this will be our bastion node. Typical 8G memory is more than enough for this purpose. On this machine, we will have all the back-bone services running and the oc client to access the cluster once we complete the installation.

Installer Binary

On bastion, download the installer binary. You can find the current 4.3 GA builds here. We will use the 4.3.18 installer tarball. Download it and ensure it is available in your $PATH.

wget https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/4.3.18/openshift-install-linux-4.3.18.tar.gz -C /usr/local/bin
openshift-install version

Let’s export the bastion IP address as an environment variable for using it later.

export BASTION_IP=<Bastion_IP>

Install Config

Create and change to the installer directory. We will use this directory to run all the commands. This is the place where we create the install-config.yaml and let the installer create the required files.

mkdir install && cd install

Use this template to create the install-config.yaml (Ensure you change the file ext to yaml). Replace the following variables with actual values:

  • ${cluster_domain} : <Base Domain>
  • ${master_count} : <No of master nodes>
  • ${cluster_id} : <Cluster Name>
  • ${pull_secret} : <Pull Secret>
  • ${public_ssh_key} : <SSH Public Key>

Installer directory will look like:

# ls
install-config.yaml

Manifests

Once we have the install-config.yaml ready we can create the manifest files and update values for customizing the cluster. For now, we will do minimal changes at this step. You could also perform other customization here (only if you know what you are doing). Before proceeding, backup install-config.yaml somewhere as this file will be deleted once manifests are created.

For creating manifests, run following command from your installer directory.

openshift-install create manifests

Next, remove the machines and machinesets as we will be creating the compute and control-plane nodes ourself, run command:

rm -f openshift/99_openshift-cluster-api_master-machines-*.yaml openshift/99_openshift-cluster-api_worker-machineset-*.yaml

Last, Make control-plane nodes unschedulable, run command:

sed -i 's/mastersSchedulable: true/mastersSchedulable: False/g' manifests/cluster-scheduler-02-config.yml.

Ignition Config

Once we have the manifests ready, we will create the ignition files required for booting the RHCOS nodes.

You can optionally set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE environment variable for using the image from the private repo.

Example:

export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="quay.io/openshift-release-dev/ocp-release:4.3.18-ppc64le"

To create the ignition files, run the following command from your installer directory. This will create metadata.json, auth, bootstrap.ign, master.ign and worker.ign files in your installer directory.

openshift-install create ignition-configs

Installer directory will look like:

# ls
auth metadata.json bootstrap.ign master.ign worker.ign

We need to update these files before using them. There are some configurations we need to add to the ignition file. Use the below sections directly from the documentation to update the bootstrap.ign and master.ign. You could backup these files before making changes. Follow the links given below:

  1. Capture Infra ID
  2. Update Bootstrap Ignition
  3. Update Master Ignition

The node hostname is constructed using the infra id eg: Bootstrap hostname: ‘$INFRA_ID-bootstrap’, 1st master hostname: ‘$INFRA_ID-master-0’. Note that the example assumes 3 nos of master nodes.

Installer directory will now look like:

# ls
auth bootstrap_update.py metadata.json yusupi-n8qvz-master-0-ignition.json yusupi-n8qvz-master-2-ignition.json
bootstrap.ign master.ign worker.ign yusupi-n8qvz-master-1-ignition.json

HTTP Config

On bastion node, install HTTP using below command:

sudo yum install -y httpd

Since we will be using port 80 for Load Balancer(HAProxy), let’s use port 8080 for HTTP service. To achieve this run command:

sudo sed -i 's/Listen 80/Listen 8080/g' /etc/httpd/conf/httpd.conf

Now enable firewall rules for allowing HTTP connections and start the service by running:

sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --reload
sudo systemctl enable --now httpd
sudo systemctl start httpd

Once the HTTP service is set up, let’s host the bootstrap ignition config. Copy the ignition file updated in the previous step to /var/www/html/ with file permission 755 and restart the service. Run below commands:

sudo cp bootstrap.ign /var/www/html/bootstrap.ign
sudo chmod 755 /var/www/html/bootstrap.ign
sudo systemctl restart httpd

Network Configuration

Network Ports

We need to create network ports for each cluster node (RHCOS). These network port addresses will be used in the DCHP configurations later.

To create the network ports make use of OpenStack client. Run below commands:

openstack port create --network "$NETWORK_NAME" "$INFRA_ID-bootstrap-port"
for index in $(seq 0 2); do openstack port create --network "$NETWORK_NAME" "$INFRA_ID-master-port-$index"; done
for index in $(seq 0 1); do openstack port create --network "$NETWORK_NAME" "$INFRA_ID-worker-port-$index"; done

DHCP Config

Set up a DHCP service for allowing the cluster nodes to pick up the network address while booting.

On bastion node, install DHCP using the command:

sudo yum install -y dhcp-server

Now enable firewall rules for allowing DHCP connections by running:

sudo firewall-cmd --add-service=dhcp --permanent
sudo firewall-cmd --reload

Let’s configure the DHCP service for nodes to pick up the network address as per the ports we created before. Copy and paste below code on the bastion node:

mkdir -p /etc/dhcp/
cat <<EOF >/etc/dhcp/dhcpd.conf
#
# DHCP Server Configuration file.
# see /usr/share/doc/dhcp-server/dhcpd.conf.example
# see dhcpd.conf(5) man page
default-lease-time 900;
max-lease-time 7200;
subnet <cluster_subnet> netmask <cluster_subnet_mask> {
option routers <gateway_ip_address>;
option subnet-mask <cluster_subnet_mask>;
option domain-name-servers $BASTION_IP;
next-server $BASTION_IP;
# filename "{{ grub_filename }}";
}
host bootstrap {
hardware ethernet <Bootstrap MAC>;
fixed-address <Bootstrap IP>;
option host-name "$INFRA_ID-bootstrap";
}
host master-0 {
hardware ethernet <1st Master MAC>;
fixed-address <1st Master IP>;
option host-name "$INFRA_ID-master-0";
}
host master-1 {
hardware ethernet <2nd Master MAC>;
fixed-address <2nd Master IP>;
option host-name "$INFRA_ID-master-1";
}
host master-2 {
hardware ethernet <3rd Master MAC>;
fixed-address <3rd Master IP>;
option host-name "$INFRA_ID-master-2";
}
host worker-0 {
hardware ethernet <1st Worker MAC>;
fixed-address <1st Worker IP>;
option host-name "$INFRA_ID-worker-0";
}
host worker-1 {
hardware ethernet <2nd Worker MAC>;
fixed-address <2nd Worker IP>;
option host-name "$INFRA_ID-worker-1";
}
EOF

You can get the required values by running corresponding commands:

  • network_id: export NETWORK_ID=$(openstack network show "$NETWORK_NAME" -c id -f json | jq -r .id)
  • subnet_id: export SUBNET_ID=$(openstack network show "$NETWORK_NAME" -c subnets -f json | jq -r .subnets[0])
  • cluster_subnet: openstack subnet show $SUBNET_ID -c cidr -f json | jq -r .cidr | xargs ipcalc -n | cut -d '=' -f2
  • cluster_subnet_mask: openstack subnet show $SUBNET_ID -c cidr -f json | jq -r .cidr | xargs ipcalc -m | cut -d '=' -f2
  • gateway_ip_address: openstack subnet show $SUBNET_ID -c gateway_ip -f json | jq -r .gateway_ip

You can get the MAC and IP information by running below example commands:

  • Bootstrap MAC: openstack port show $INFRA_ID-bootstrap-port -c mac_address -f json | jq -r '.mac_address'
  • Bootstrap IP: openstack port show $INFRA_ID-bootstrap-port -c fixed_ips -f json | jq -r '.fixed_ips[0].ip_address'
  • 1st Master MAC: openstack port show $INFRA_ID-master-port-0 -c mac_address -f json | jq -r '.mac_address'
  • 1st Master IP: openstack port show $INFRA_ID-master-port-0 -c fixed_ips -f json | jq -r '.fixed_ips[0].ip_address'
  • 1st Worker MAC: openstack port show $INFRA_ID-worker-port-0 -c mac_address -f json | jq -r '.mac_address'
  • 1st Worker IP: openstack port show $INFRA_ID-worker-port-0 -c fixed_ips -f json | jq -r '.fixed_ips[0].ip_address'

Similarly, fetch the values for other master and worker nodes.

Once the configuration is complete, restart the DHCP service.

sudo chmod 644 /etc/dhcp/dhcpd.conf
sudo systemctl enable --now dhcpd
sudo systemctl restart dhcpd

DNS Config

Since we have all the network addresses and hostnames handy. Let’s set up a DNS before creating the cluster nodes so that the bootstrap process goes smoothly. We will make use of named service for setting up a DNS.

On bastion node, install named using the command:

sudo yum install bind-chroot -y

Now enable firewall rules for allowing DNS connections and start the service by running:

sudo firewall-cmd --add-service=dns --permanent
sudo firewall-cmd --reload
sudo systemctl enable --now named
sudo systemctl start named

Let’s configure the DNS service for nodes to resolve the cluster hostnames.

First, add a new zone entry in /etc/named.conf. Run below commands:

cp /etc/named.conf /etc/named.conf.back
cat <<EOF >/etc/named.conf
options {
listen-on port 53 { any; };
listen-on-v6 port 53 { ::1; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
allow-query { any; };
recursion yes;
forward only;
forwarders { 8.8.8.8; };
dnssec-enable yes;
dnssec-validation no;
managed-keys-directory "/var/named/dynamic";
pid-file "/run/named/named.pid";
session-keyfile "/run/named/session.key";
include "/etc/crypto-policies/back-ends/bind.config";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
};
zone "." IN {
type hint;
file "named.ca";
};
zone "$CLUSTER_NAME.$BASE_DOMAIN" IN {
type master;
file "/etc/named/zones/cluster-zone.db";
};
include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
EOF

Second, create the zone file as referred to in the previous step. Don’t forget to replace the Bootstrap, Masters and Workers IP addresses. Run below commands:

mkdir /etc/named/zones
cat <<EOF >/etc/named/zones/cluster-zone.db
\$TTL 12H
@ IN SOA ns hostmaster (
$(date +"%y%m%d%H%M") ; serial
1H ; refresh (1 hour)
15M ; retry (15 minutes)
1W ; expiry (1 week)
1H ) ; nx = nxdomain RRL (1 hour)
IN NS ns
ns IN A $BASTION_IP
api IN A $BASTION_IP
api-int IN A $BASTION_IP
*.apps IN A $BASTION_IP
etcd-0 IN A <1st Master IP>
etcd-1 IN A <2nd Master IP>
etcd-2 IN A <3rd Master IP>
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-0.$CLUSTER_NAME.$BASE_DOMAIN.
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-1.$CLUSTER_NAME.$BASE_DOMAIN.
_etcd-server-ssl._tcp IN SRV 0 10 2380 etcd-2.$CLUSTER_NAME.$BASE_DOMAIN.
master-0 IN A <1st Master IP>
master-1 IN A <2nd Master IP>
master-2 IN A <3rd Master IP>
worker-0 IN A <1st Worker IP>
worker-1 IN A <2nd Worker IP>
bootstrap IN A <Bootstrap IP>
EOF

Third, put a fix for a known issue with bind service going down. Run below commands:

named_systemd_dir=/usr/lib/systemd/system/named.service.d
sudo mkdir -p $named_systemd_dir
sudo chmod 755 $named_systemd_dir
echo "[Service]" | sudo tee $named_systemd_dir/restart.conf
echo "Restart=always" | sudo tee -a $named_systemd_dir/restart.conf
echo "RestartSec=3" | sudo tee -a $named_systemd_dir/restart.conf

Once the configuration is complete, restart the named service.

sudo systemctl daemon-reload
sudo systemctl restart named

HAProxy Config

We need to provision load balancers for API (6443, 22623) and Ingress (443, 80).

On bastion node, install HAProxy using the command:

sudo yum install -y haproxy

Let’s configure the load balancers by replacing the IPs and running:

cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig
cat <<EOF >/etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
ssl-default-bind-ciphers PROFILE=SYSTEM
ssl-default-server-ciphers PROFILE=SYSTEM
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend ocp4-kubernetes-api-server
mode tcp
option tcplog
bind *:6443
default_backend ocp4-kubernetes-api-server
frontend ocp4-machine-config-server
mode tcp
option tcplog
bind *:22623
default_backend ocp4-machine-config-server
frontend ocp4-router-http
mode tcp
option tcplog
bind *:80
default_backend ocp4-router-http
frontend ocp4-router-https
mode tcp
option tcplog
bind *:443
default_backend ocp4-router-https
backend ocp4-kubernetes-api-server
mode tcp
balance source
server $INFRA_ID-bootstrap <Bootstrap IP>:6443 check
server $INFRA_ID-master-0 <1st Master IP>:6443 check
server $INFRA_ID-master-1 <2nd Master IP>:6443 check
server $INFRA_ID-master-2 <3rd Master IP>:6443 check
backend ocp4-machine-config-server
mode tcp
balance source
server $INFRA_ID-bootstrap <Bootstrap IP>:22623 check
server $INFRA_ID-master-0 <1st Master IP>:22623 check
server $INFRA_ID-master-1 <2nd Master IP>:22623 check
server $INFRA_ID-master-2 <3rd Master IP>:22623 check
backend ocp4-router-http
mode tcp
balance source
server $INFRA_ID-worker-0 <1st Worker IP>:80 check
server $INFRA_ID-worker-1 <2nd Worker IP>:80 check
backend ocp4-router-https
mode tcp
balance source
server $INFRA_ID-worker-0 <1st Worker IP>:443 check
server $INFRA_ID-worker-1 <2nd Worker IP>:443 check
EOF

Now, configure the firewall and start the service:

sudo chmod 644 /etc/haproxy/haproxy.cfg
sudo firewall-cmd --permanent --add-service=http --add-service=https
sudo firewall-cmd --permanent --add-port=6443/tcp --add-port=443/tcp --add-port=22623/tcp
sudo firewall-cmd --reload
sudo setsebool -P haproxy_connect_any=1
sudo systemctl enable --now haproxy
sudo systemctl restart haproxy

Cluster Bootstrap

Bootstrap Ignition Shim

The bootstrap ignition file (bootstrap.ign) size tends to be quite large to be passed to the server directly (user data limit is 64KB). We will need to create a smaller ignition file that will refer to the main ignition file we hosted on the HTTP server.

Change to your installer directory and create the bootstrap-ignition.json.

cat <<EOF >bootstrap-ignition.json
{
"ignition": {
"config": {
"append": [
{
"source": "http://$BASTION_IP:8080/bootstrap.ign",
"verification": {}
}
]
},
"security": {},
"timeouts": {},
"version": "2.2.0"
},
"networkd": {},
"passwd": {},
"storage": {},
"systemd": {}
}
EOF

Bootstrap Server

Create the bootstrap server by using the following openstack command:

openstack server create --image $RHCOS_IMAGE_ID --flavor <m1.xlarge> --user-data "bootstrap-ignition.json" --port "$INFRA_ID-bootstrap-port" "$INFRA_ID-bootstrap"

If you are unable to connect to the bootstrap node then please check the server console if it is getting proper ignition config. Once the server is active, login to check the bootstrapping progress.

$ ssh core@<boostrap_ip>
[core@yusupi-n8qvz-bootstrap ~]$ journalctl -b -f -u bootkube.service

Control Plane

There is no need to host or create an ignition shim for master nodes since the ignition file size is small (around 2KB).

Create the master servers by using the following command:

for index in $(seq 0 2); do
openstack server create --image $RHCOS_IMAGE_ID --flavor m1.xlarge --user-data "$INFRA_ID-master-$index-ignition.json" --port "$INFRA_ID-master-port-$index" "$INFRA_ID-master-$index"
done

Note the for loop seq is 0 to 2 (Total 3 nos of master nodes).

Wait for Bootstrap

Once the master servers are active it will start running pods, run etcd and join the bootstrap. Run the following command from the installer directory to monitor this process from bastion node:

openshift-install wait-for bootstrap-complete

After the above command complete successfully it prints a message:

INFO It is now safe to remove the bootstrap resources

This means we can remove the bootstrap server and port we created which are no longer needed for running the cluster. Before deleting, I suggest you copy the oc client from bootstrap node to bastion node. The client can also be downloaded at any time from here (recommended way).

On bastion run: scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null core@$<bootstrap_ip>:/bin/oc /tmp/oc; sudo mv /tmp/oc /bin/oc

You can now access the cluster from bastion using the oc client:

export KUBECONFIG="$PWD/auth/kubeconfig"
oc get nodes
oc get pods -A

Delete Bootstrap

Run following OpenStack commands to delete Bootstrap resources:

openstack server delete "$INFRA_ID-bootstrap"
openstack port delete "$INFRA_ID-bootstrap-port"

You may also stop the HTTP service and delete the bootstrap ignition file from the bastion node.

Compute Nodes

Now is the time to add the worker nodes to the cluster. The steps are similar to the master nodes but we need to approve the certificate signing requests (CSRs) to join them.

Create the worker servers by using the following command:

for index in $(seq 0 1); do
openstack server create --image $RHCOS_IMAGE_ID --flavor m1.large --user-data "worker.ign" --port "$INFRA_ID-worker-port-$index" "$INFRA_ID-worker-$index"
done

Note the for loop seq is 0 to 1 (Total 2 nos of worker nodes).

Once the worker nodes are up, you need to “Approve the worker CSRs”. Run command oc get csr -A to check any Pending requests. You could approve the CSR requests after validating by using the command: oc adm certificate approve <CSR Name>

I use the following command to approve all CSRs at once (Be Safe).

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

When each worker CSR is approved, it will start showing up in oc get nodes output with Ready state.

Wait for Install

After pods start to run on the worker nodes you could check the cluster operator progress by running the command oc get co.

Run the following command to verify cluster deployment status from the bastion node:

openshift-install --log-level debug wait-for install-complete

Once all the cluster operators are available above command will print the web console URL and user credentials to login. For example:

INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/install/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.yusupi.rh.com
INFO Login to the console with user: kubeadmin, password: 2iwLW-Vzwtd-nczVf-chs5T

For accessing via a Web Browser you will need to add below host entries on your terminal. (Replace with actual values)

<bastion_ip> api.<Cluster Name>.<Base Domain> console-openshift-console.apps.<Cluster Name>.<Base Domain> integrated-oauth-server-openshift-authentication.apps.<Cluster Name>.<Base Domain> oauth-openshift.apps.<Cluster Name>.<Base Domain> prometheus-k8s-openshift-monitoring.apps.<Cluster Name>.<Base Domain> grafana-openshift-monitoring.apps.<Cluster Name>.<Base Domain> example.apps.<Cluster Name>.<Base Domain>  cp-console.apps.<Cluster Name>.<Base Domain>

Below is a snapshot of the landing page with kubeadmin login:

OpenShift 4.3 UI dashboard

Common Issues:

Below are some common issues reported/encountered during install and the troubleshooting steps.

Not able to boot the bootstrap server

Troubleshoot: Check the bootstrap console log to see if it is picking proper ignition URL as hosted on bastion node, if not then verify the ignition passed as user-data.

Unable to access or download the bootstrap ignition file hosted on the bastion

Troubleshoot:

  • Double-check if the bootstrap ignition file can download from other locations.
  • Check the bootstrap.ign file permission at /var/www/html/.
  • Restart the httpd service and try a remote curl.

Unable to SSH to bootstrap node

Troubleshoot:

  • Check the server status on the PowerVC console or openstack client.
  • Verify if the IP assigned to the server is the same as the bootstrap port fixed-ip.
  • Check DHCP service status, restart if not active.
  • Check DHCP service logs for messages similar to below for successful IP assignment:
dhcpd[9048]: DHCPDISCOVER from fa:16:3e:30:ba:a8 via env32
dhcpd[9048]: DHCPOFFER on 9.9.9.9 to fa:16:3e:30:ba:a8 via env32
dhcpd[9048]: DHCPREQUEST for 9.9.9.9 (9.8.8.8) from fa:16:3e:30:ba:a8 via env32
dhcpd[9048]: DHCPACK on 9.9.9.9 to fa:16:3e:30:ba:a8 via env32

Getting “Permission denied” error when SSH to bootstrap

Troubleshoot:

  • Ensure you are using the proper ssh private key.
  • The public key corresponding to the key-pair you are using should match with what you have provided in install-config.yaml
  • Use command ssh -i <keypath> core@<bootstrap_ip>

The bootstrap log is empty

Troubleshoot: After ssh to bootstrap node ssh core@<bootstrap_ip> run journalctl -b -f -u bootkube.service to get the bootstrap log. If you see no output OR the logs are not following:

  1. Are you able to download the release image? Do a sudo podman pull <release_image> to check if image pull does not fail.
  2. Are you able to connect to your private repo or quay.io server? Ping google.com, quay.io or your repo host, it should resolve the address.
  3. Did you provide proper pull-secret in install-config.yaml? If you are using private repo, ensure the token is added to the pull-secret. Also, check to see if the token has expired.
  4. Is the DNS server on bastion running properly and able to resolve cluster addresses? Login to the bastion and check the named service status, verify the zone file configuration, add a DNS forwarder if needed.

Repeated messages in bootstrap logs

Troubleshoot: Are you seeing “Error: unhealthy cluster” messages in the bootstrap log?

  • Wait for some time. It takes some minutes for bootstrap to start all the pods and all masters to boot and join the cluster.
  • If you are not able to ssh master nodes then check the console logs for any ignition error.
  • Is the bootstrap server working fine? See above.
  • Is the HAProxy service running properly?
  • Login to bastion and check the HAProxy service status.
  • Login to each master node ssh core@<master_ip> to check if all the pods/containers are running.

Install never completes

Troubleshoot:

  • Wait for prescribed time for wait-for install-complete command to complete.
  • Once the above command times out it will display the reason for failure.
  • You can also check the cluster operators’ status by running the command oc get co.

Cannot login with oc client

Troubleshoot:

  • Ensure that KUBECONFIG env variable is set: export KUBECONFIG="$PWD/auth/kubeconfig"
  • You can also copy the <installer_dir>/auth/kubeconfig file to ~/.kube/config.
  • kubeadmin password can also be found in <installer_dir>/auth/kubeadmin-pasword file.

Cannot access Web UI Console

Troubleshoot:

  • Check to see if HAProxy service is running properly on the bastion node.
  • Ensure you have added the hosts file entries for all routes or using the DNS from the bastion node.
  • Remove any proxy used by your Browser.
  • If you are using host entries and trying to access any app then ensure you add the <app_name>.apps.. entry to your hosts file.

Worker nodes are not displaying in oc get nodes or status is NotReady

Troubleshoot:

  • Check to see if all the worker nodes are up and able to ssh using ssh core@<worker-ip>.
  • Check on the worker nodes if all the pods/containers are running.
  • Verify that there are no Pending certificates to approve using oc get csr

Conclusion

The blog contains my personal experience installing OpenShift 4.3 on PowerVC. The steps have been validated and works for dev/test environment.

That’s it! We have set up the Openshift cluster with 3 masters and 2 workers. Also, able to know how to access the cluster using the CLI and web-console.

Bonus

If you have used or even if are willing to use Terraform, I bet you would love this.

For installing OCP 4.3 on PowerVC we have a Terraform repo hosted at https://github.com/ppc64le/ocp4_upi_powervm. This automates all the steps given above with many more features. Explore yourself more by going through the README.

References

--

--