CP4D-WKC (3.5.6) Installation Issue: Solr Pods Stuck in PodInitializing State

Introduction

IBM® Cloud Pak for Data (CP4D) is a cloud-native solution that enables you to put your data to work quickly and efficiently. Cloud Pak for Data lets you do both by enabling you to connect to your data, govern it, find it, and use it for analysis. Cloud Pak for Data also enables all your data users to collaborate from a single, unified interface that supports many services that are designed to work together.

Watson Knowledge Catalog (WKC) is one of the components in CP4D that provides a secure enterprise Catalog management platform that is supported by a data governance framework. A Catalog connects people to the data and knowledge that they need. The data governance framework ensures that data access and data quality are compliant with your business rules and standards.

This blog outlines an issue related to Zookeeper component which surfaced while installing CP4D 3.5.6 WKC (Cloud Pak for Data — Watson Knowledge Catalog) for one of the customers and the workaround required to get past the issue. The environment specification is mentioned in the Environment section below.

The blog aims to provide a prescriptive guide to be followed in case a similar issue be faced in such an environment.

The Environment

Platform

Red Hat Open Shift Container Platform (version — 4.6.39)

Cluster

  • 3 master nodes each having 8 CPU cores, 32 GB RAM
  • 6 worker nodes each having 16 CPU Cores, 64 GB RAM

Virtualization

VMWare VSphere (Client v7.0.2)

Cloud Pak

Cloud Pak for Data — Version 3.5.6

NFS

Nutanix Files on CentOS 7

Ownership

Exports Configuration

Open Shift Project

Name: cpd35

ID Ranges

Solr Pod Issue

Symptoms

Solr pod remained in PodInitializing state because its init containers was not completing successfully. A similar issue was also seen with “Sample Data Job” pods. The same workaround is applicable to these pods also.

Cause

The init container (ug-config) is copying data from a temp location to a mounted location with preserve permission option as below:

cp -rpf /tmp/configsets /solrconfig;

The files were getting copied to the target location; however, the permissions were not allowed to be set. Hence, the operation and the init container failed.

Here is the output of the oc describe pod solr-0.

Workaround

It is a hack!

1. Un-tarred the charts bundled in (/home/watson/cpd/cpd-cli-workspace/modules/0072-iis/x86_64/13.5.1082/iis-3.5.1082.tgz)

2. Located the solr template. Modified the template to remove the p option from the init-container’s script.

cp -rpf /tmp/configsets /solrconfig;

3. Tarred the charts again.

4. Deleted the module 0072-iis from within the operator pod.

oc rsh

helm delete — purge 0072-iis — tls

5. reinstalled it.

./cpd-cli adm — repo ./repo.yaml — assembly wkc — namespace cpd35 — apply — accept-all-licenses

./cpd-cli install — repo ./repo.yaml — assembly wkc — namespace cpd35 — storageclass managed-nfs-storage — transfer-image-to image-registry-openshift-image-registry.apps.ibmcpd.lifeinsjv.com/cpd35 — cluster-pull-prefix image-registry.openshift-image-registry.svc:5000/cpd35 — target-registry-password $(oc whoami -t) — target-registry-username kubeadmin — latest-dependency — verbose — accept-all-licenses — insecure-skip-tls-verify

This did the trick.

--

--