Kubernetes Mastery : Day 5 : Troubleshooting begins

Prakhar Gandhi
Google Cloud - Community
2 min readMay 6, 2024

So, in this article we will be looking at some problems, solutions and resolutions and ofc commands to resolve it;

Issue 1 : Investigating Pod Startup Failure

Scenario: An application pod is failing to start, causing service downtime.

Troubleshooting Steps:

  • Use the ‘kubectl describe pod’ command to inspect the pod’s status, events, and conditions.
  • Check container logs using ‘kubectl logs’ to identify error messages or initialization failures.
  • Review Kubernetes system logs (‘kubelet’, ‘kube-scheduler’, ‘kube-apiserver’) for any scheduling or runtime issues.
  • Validate pod specifications (e.g., PodSpec, environment variables) against the deployment manifest for correctness.
  • Utilize the Kubernetes API to retrieve additional information about the pod’s state and associated resources.

Resolution:

  • Correct the misconfiguration in the pod’s deployment manifest.
  • Verify the pod’s successful startup by monitoring its status and logs.
  • Implement measures to prevent similar issues in the future, such as incorporating automated testing and validation into the CI/CD pipeline.

Issue 2 : Scaling Application Pods

Scenario: An application experiences increased demand, necessitating the scaling of its pod replicas.

Troubleshooting Steps:

  • Monitor resource utilization metrics (CPU, memory) for application pods and cluster nodes using Prometheus and Grafana.
  • Identify pod autoscaling events and scaling activities in Kubernetes events and logs.
  • Evaluate the impact of increased demand on pod scheduling and cluster capacity.
  • Review Kubernetes Horizontal Pod Autoscaler (HPA) configuration and metrics to ensure proper scaling criteria and thresholds.

Resolution:

  • If manual intervention is required, manually scale the application pods using the ‘kubectl scale’ command or by updating the deployment’s replica count.
  • Configure Horizontal Pod Autoscaler (HPA) to automatically adjust the number of pod replicas based on predefined metrics and thresholds.
  • Monitor the effectiveness of autoscaling policies and adjust parameters as needed to optimize resource utilization and application performance.

Issue 3 : Debugging Network Connectivity Issues

Scenario: Pods within a Kubernetes service are unable to communicate with each other.

Troubleshooting Steps:

  • Verify pod connectivity by attempting to ping or establish connections between pods using ‘kubectl exec’ or ‘kubectl port-forward’.
  • Inspect service endpoints and cluster DNS configuration using ‘kubectl get endpoints’ and ‘kubectl get svc’.
  • Review network policies, firewall rules, and ingress/egress configurations affecting pod-to-pod communication.
  • Analyze network traffic and packet captures using tools like Wireshark or tcpdump to diagnose network-level issues.

Resolution:

  • Correct misconfigured service definitions, endpoint subsets, or DNS settings to ensure proper service discovery and communication.
  • Update network policies or firewall rules to allow necessary traffic flows between application components.
  • Implement service mesh solutions (e.g., Istio, Linkerd) to enhance network observability, security, and reliability.

--

--

Prakhar Gandhi
Google Cloud - Community

Google Developer Educator for Jetpack Compose | Google Cloud Innovator | Geek | Cybersecurity | Code | Strategy