CP4D-WKC (3.5.6) Installation Issue: 0072-iis Module Stuck Leading to DeadlineExceeded

Introduction

IBM® Cloud Pak for Data (CP4D) is a cloud-native solution that enables you to put your data to work quickly and efficiently. Cloud Pak for Data lets you do both by enabling you to connect to your data, govern it, find it, and use it for analysis. Cloud Pak for Data also enables all your data users to collaborate from a single, unified interface that supports many services that are designed to work together.

Watson Knowledge Catalog (WKC) is one of the components in CP4D that provides a secure enterprise Catalog management platform that is supported by a data governance framework. A Catalog connects people to the data and knowledge that they need. The data governance framework ensures that data access and data quality are compliant with your business rules and standards.

This blog outlines an issue related to Zookeeper component which surfaced while installing CP4D 3.5.6 WKC (Cloud Pak for Data — Watson Knowledge Catalog) for one of the customers and the workaround required to get past the issue. The environment specification is mentioned in the Environment section below.

The blog aims to provide a prescriptive guide to be followed in case a similar issue be faced in such an environment.

The Environment

Platform

Red Hat Open Shift Container Platform (version — 4.6.39)

Cluster

  • 3 master nodes each having 8 CPU cores, 32 GB RAM
  • 6 worker nodes each having 16 CPU Cores, 64 GB RAM

Virtualization

VMWare VSphere (Client v7.0.2)

Cloud Pak

Cloud Pak for Data — Version 3.5.6

NFS

Nutanix Files on CentOS 7

Ownership

Exports Configuration

Open Shift Project

Name: cpd35

ID Ranges

0072-iis Post Installation Job Issue

Symptoms

This module 0072-iis was stuck forever as shown below:

Cause

There was a scheduled job which was pending for more than 3 hours as shown in the below log snippet.

— time=”2021–08–28T09:27:36Z” level=info msg=”0072-iis Resource Status: Job: 3/4 — Pending: [iis-post-delete-job]”

oc describe job iis-post-delete-job

Volumes:

iis-post-delete-scripts:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: iis-post-delete-config

Optional: false

Events:

Type Reason Age From Message

Normal SuccessfulDelete 177m job-controller Deleted pod: iis-post-delete-job-lb8j6

Warning DeadlineExceeded 177m (x2 over 177m) job-controller Job was active longer than specified deadline

Upon further investigation, found that two of the PVCs (Namely, 0072-iis-en-dedicated-pvc and iis-secrets-pv) associated with this module were in Terminating state not getting deleted. That led the job to get stuck.

Note: In the screen-capture, the status is Bound, however, it was Terminating when the issue took place.

Workaround

It required to patch the PVCs in question to remove the finalizers as follows:

oc patch pvc 0072-iis-en-dedicated-pvc -p ‘{“metadata”:{“finalizers”:null}}’

oc patch pvc iis-secrets-pv -p ‘{“metadata”:{“finalizers”:null}}’

--

--