Kubernetes Deployment Failures: kubectl Debugging Commands

10 min read

Kubernetes Deployment Failures: kubectl Debugging Commands

Generating audio, please wait...

How to Debug Kubernetes Deployment Failures Using kubectl

Modern engineering teams rely on Kubernetes to deploy and scale applications efficiently. But when deployments fail, debugging can quickly consume valuable engineering time.

Kubernetes deployment failures often occur due to CrashLoopBackOff, ImagePullBackOff, or misconfigured probes. This SupportSages guide provides essential kubectl troubleshooting commands to quickly debug pods, logs, and cluster issues.

Pro Tip: If your application worked in staging but failed in production, start by checking Secrets, ConfigMaps, and environment-specific configurations. Misconfiguration is one of the most common causes of production deployment failures.

Why Kubernetes Deployments Fail

A Kubernetes deployment can fail due to issues in one or more layers:

A Kubernetes deployment can fail due to issues across multiple layers. Identifying the affected layer first helps isolate the root cause much faster.

Infrastructure

Pending

Nodes • CPU • Memory • Scheduling • Taints • Affinity

Container

ImagePullBackOff

Image Tag • Registry Access • ImagePullSecrets • Image Availability

Application

CrashLoopBackOff

Startup Errors • Configuration • Secrets • Runtime Exceptions

Health Check

Not Ready

Readiness Probe • Liveness Probe • Startup Probe

Networking

Connection Errors

Services • Ingress • DNS • SSL • Network Policies

Troubleshooting Tip: Always troubleshoot Kubernetes deployments layer by layer. Start with infrastructure, then verify the container, application, health checks, and finally networking. This systematic approach eliminates guesswork and speeds up root cause analysis.

Step 1: The Quick Look

The first command every engineer should run:

kubectl get pods

Check the STATUS column. It often immediately tells you where the issue is.

Common Pod Status Errors

Pending

If the pod stays in Pending, Kubernetes cannot schedule it.

Typical Causes

No available nodes
CPU or memory requests too high
Node selectors mismatch
Taints/tolerations issue
Affinity / anti-affinity restrictions

Diagnose

kubectl describe pod <pod-name>

Look for:

0/5 nodes available: insufficient memory

Fixes

Reduce resource requests
Add nodes or scale the cluster
Correct node selectors
Update tolerations

CrashLoopBackOff

The pod starts, crashes, and Kubernetes keeps restarting it.

Typical Causes

App startup failure
Missing environment variables
Database connection failure
Wrong command or entrypoint
Dependency service unavailable

Diagnose

kubectl logs <pod-name>

kubectl logs <pod-name> --previous

Fixes

Correct startup command
Validate configs and secrets
Check external dependencies
Patch runtime exceptions

ImagePullBackOff

Kubernetes cannot pull the container image.

Typical Causes

Wrong image tag
Private registry authentication issue
Image not pushed
Network restrictions

Diagnose

kubectl describe pod <pod-name>

Look in Events for:

Failed to pull image
403 Forbidden
Image not found

Fixes

Correct image tag
Verify registry credentials
Add imagePullSecrets
Confirm image exists

Step 2: Deep Diagnostic

If pod status alone doesn’t reveal enough, move deeper.

A. Image Issues

Diagnose

kubectl describe pod <pod-name>

Look For

403 Forbidden
The node or workload identity doesn't have permission to pull the container image.

Cloud Provider Checks

Platform	Verify
AWS	IAM Role attached to the worker node or IRSA permissions
Azure	Managed Identity and Azure Container Registry (ACR) permissions
GCP	Workload Identity or node Service Account permissions

B. Resource Exhaustion

Diagnose

kubectl describe pod <pod-name>

Look For

OOMKilled
The container exceeded its configured memory limit.

Recommended Actions

Increase memory requests and limits.
Investigate memory leaks.
Reduce cache usage.
Review application concurrency.
Use realistic resource requests for scheduling.

C. Readiness Probe

Example Configuration

readinessProbe: httpGet: path: /health port: 8080

Diagnose

kubectl describe pod <pod-name>

Look For

Readiness probe failed
The application is running but failing health checks, so Kubernetes doesn't send traffic to the pod.

Recommended Actions

Verify the health endpoint.
Increase initialDelaySeconds.
Increase probe timeout.
Ensure dependent services are available before the probe starts.

Deep Diagnostic Tip: Always investigate deployment issues in this order: Image → Resources → Health Checks → Application Logs. This structured workflow quickly eliminates the most common Kubernetes deployment failures before diving into application-level debugging.

Step 3: The Network Wall

If pods are healthy but users still can’t access the app, check networking.

A. Service Check

Run:

kubectl get svc

Then verify selectors:

kubectl describe svc <service-name>

Does the service selector match pod labels?

Example mismatch:

selector:
  app: frontend

But pod label:

labels:
  app: web

No endpoints will be created.

Fix:

Align labels and selectors.

B. Ingress Check

If the service works internally but not externally:

kubectl get ingress
kubectl describe ingress <ingress-name>

Check for:

Invalid TLS certificate
Wrong backend service
Host mismatch
Ingress controller errors
502 / 503 upstream failures

Recommended Troubleshooting Workflow

Use this sequence every time:

kubectl get pods
kubectl describe pod
kubectl logs
Check resources
Check probes
Check service selectors
Check ingress/controller logs
Compare prod vs staging configs

Production Best Practices to Prevent Deployment Failures

Use GitOps: Track every config change in version control.
Standardize Health Checks: Use common readiness/liveness patterns across services.
Validate Resources: Use requests/limits baselines for each service type.
Use Secrets Management: Avoid manual secret injection drift between environments.
Add Alerting for:
1. CrashLoopBackOff
2. Pending pods
3. OOMKilled containers
4. 5xx ingress spikes

Problem

Deployment completed successfully, but the website remained inaccessible.

Root Cause

All Kubernetes Pods were running and healthy, but the Service was selecting the wrong Pods because its selector labels didn't match the Pod labels.

Resolution

selector:
  app: web

Final Thoughts

Most Kubernetes deployment failures are not random—they follow predictable patterns. A structured troubleshooting flow helps engineers reduce downtime, avoid guesswork, and restore services faster.

Instead of manually chasing symptoms, inspect:

Pod State

Events

Logs

Resources

Probes

Networking

When teams use a repeatable diagnostic process, Kubernetes becomes easier to operate at scale.

Kubernetes Deployment Failures: kubectl Debugging Commands

How to Debug Kubernetes Deployment Failures Using kubectl

Why Kubernetes Deployments Fail

Step 1: The Quick Look

Common Pod Status Errors

Typical Causes

Diagnose

Fixes

Typical Causes

Diagnose

Fixes

Typical Causes

Diagnose

Fixes

Step 2: Deep Diagnostic

Step 3: The Network Wall

A. Service Check

B. Ingress Check

Recommended Troubleshooting Workflow

Production Best Practices to Prevent Deployment Failures

Example Real-World Scenario

Final Thoughts

Continue Your Journey With…

DevOps as a Service

AWS Architect's Map: Decision and Governance

Benefits of DevOps as a Service: What Your Business Actually Gains

Cloud Security: The Sage’s Hardening Handbook (AWS Edition)

DevOps as a Service Pricing: What Factors Determine What You Pay