Kubernetes Mastery : Day 3 (Conglomerate issues and Probable solutions)

Prakhar Gandhi
Google Cloud - Community
3 min readApr 29, 2024

So, in this Article we will be resolving the issues which we might be facing due to various set of failures, not unlike a forum rather finding out the type of issue it is and then hitting the bulls eye;

Pod Scheduling issues:
Scenario: Some pods are failing to schedule due to resource constraints and affinity/anti-affinity rules.

Probable solutions:

  1. Set up a Kubernetes cluster with multiple nodes using Minikube or kind.
  2. Deploy a sample application with resource requests and limits specified.
  3. Introduce resource constraints by saturating node resources or setting tight resource limits.
  4. Modify pod specifications to include affinity/anti-affinity rules and observe their impact on pod scheduling.
  5. Use kubectl describe commands to inspect pod scheduling failures and diagnose the root cause.

Networking Problems:
Scenario: Pods are experiencing network connectivity issues, preventing communication between services.

Probable Solutions:

  1. Deploy a multi-tier application with frontend, backend, and database services.
  2. Introduce network policies to restrict traffic between certain pods or namespaces.
  3. Use kubectl exec to troubleshoot DNS resolution and verify connectivity between pods.
  4. Deploy an ingress controller and create ingress resources to expose services externally.
  5. Use network troubleshooting tools like traceroute or tcpdump to diagnose network connectivity issues.

Resource Exhaustion:

Scenario: Nodes are running out of CPU or memory resources, impacting pod performance.

Probable Solutions:

  1. Deploy a workload with excessive resource requests to simulate resource exhaustion.
  2. Monitor node resource utilization using kubectl top and node metrics APIs.
  3. Identify pods consuming excessive resources and consider strategies to optimize resource usage (e.g., vertical or horizontal scaling, resource quotas).
  4. Use kubectl drain to evict pods from a node for maintenance or resource rebalancing.

Cluster Congestion :

Scenario: High API server load or pod churn is causing cluster congestion and performance degradation.

Probable Solutions:

  1. Generate a high volume of API requests using kubectl or custom scripts to simulate API server load.
  2. Monitor API server performance metrics and identify bottlenecks using tools like kube-state-metrics.
  3. Analyze pod creation and deletion rates using Kubernetes events and audit logs.
  4. Implement strategies to mitigate cluster congestion, such as optimizing resource requests/limits, scaling out control plane components, or offloading workloads to separate namespaces.

Configuration Errors:

Scenario: Pods are failing to start due to misconfigurations in pod specifications or service definitions.

Probable Solutions:

  1. Introduce syntax errors or invalid configurations in pod specifications and service definitions.
  2. Use kubectl apply to deploy misconfigured resources and observe the resulting errors.
  3. Validate YAML files using linting tools like kubeval or yamllint to catch common configuration errors.
  4. Use kubectl logs and describe commands to inspect pod initialization failures and diagnose configuration issues.

Rolling Update Failure :

Scenario: A rolling update of a deployment is failing, causing application downtime.
Probable solution:

  1. Deploy a sample application with a rolling update strategy configured.
  2. Introduce a bug or compatibility issue in the updated container image.
  3. Monitor the deployment rollout using kubectl rollout status and observe any failures.
  4. Rollback the deployment using kubectl rollout undo in case of issues and investigate the root cause of the failure.
  5. Implement strategies for handling rolling updates, such as readiness probes, health checks, or canary deployments.

Service Discovery issue:

Scenario: Pods are unable to discover and connect to backend services due to DNS resolution problems.
Probable Solution:

  1. Deploy a frontend application that relies on DNS-based service discovery to communicate with backend services.
  2. Introduce DNS misconfigurations or network issues to disrupt service discovery.
  3. Use nslookup or dig commands from within pods to troubleshoot DNS resolution.
  4. Verify service endpoints and DNS records using kubectl get services and describe commands.
  5. Implement solutions for DNS-related issues, such as configuring CoreDNS or using alternative service discovery mechanisms.

Persistent Volume Mount Failure

Scenario: Pods are failing to mount persistent volumes, resulting in data loss or application errors.
Probable Solution :

  1. Deploy an application that requires persistent storage using PersistentVolumeClaims (PVCs).
  2. Introduce misconfigurations or permission issues in PV/PVC definitions.
  3. Monitor pod events and describe pod output to diagnose volume mount failures.
  4. Inspect storage class configurations and provisioner status to troubleshoot provisioning issues.
  5. Implement solutions for persistent volume mount failures, such as fixing permissions, provisioning storage, or troubleshooting storage plugins.

Hence, with these sort of things we can find out the root cause of issue by back tracing the main convict and then easily finding the mole in bug resolving the issue much faster.

--

--

Prakhar Gandhi
Google Cloud - Community

Google Developer Educator for Jetpack Compose | Google Cloud Innovator | Geek | Cybersecurity | Code | Strategy