Pods are stuck in Pending state and nodes keep getting OOMKilled. Fix this GKE cluster.

Q: Pods are stuck in Pending state and nodes keep getting OOMKilled. Fix this GKE cluster.

Learn the answer to "Pods are stuck in Pending state and nodes keep getting OOMKilled. Fix this GKE cluster." with detailed explanations, code examples, and best practices on DeployU.

The Scenario

Your production GKE cluster is experiencing issues. Some pods are stuck in Pending:

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
api-server-7d9f8c-abc    0/1     Pending   0          45m
api-server-7d9f8c-def    1/1     Running   0          2h
worker-5c6b7d-ghi        0/1     Pending   0          30m

$ kubectl describe pod api-server-7d9f8c-abc
Events:
  Warning  FailedScheduling  2m  default-scheduler
    0/5 nodes are available: 2 Insufficient cpu, 3 Insufficient memory,
    5 node(s) had taint {node.kubernetes.io/memory-pressure: NoSchedule}

Meanwhile, nodes are being evicted due to memory pressure:

$ kubectl get nodes
NAME                                  STATUS                     ROLES    AGE
gke-cluster-default-pool-abc123-def   Ready,SchedulingDisabled   <none>   5h
gke-cluster-default-pool-abc123-ghi   NotReady                   <none>   3h

The Challenge

Diagnose the root cause of pod scheduling failures and node instability. Implement fixes for resource management, autoscaling, and node pool configuration.

Wrong Approach

A junior engineer might manually delete pending pods, increase node size without understanding the cause, disable resource limits entirely, or restart nodes. These approaches mask symptoms without fixing root causes and often make things worse.

Addresses symptoms, not root cause

Right Approach

A senior engineer systematically investigates: resource requests vs actual usage, node allocatable resources, memory pressure causes, autoscaler configuration, and implements proper resource management with requests/limits, PodDisruptionBudgets, and priority classes.

Step 1: Understand Why Pods Are Pending

# Check why pods can't be scheduled
kubectl describe pod api-server-7d9f8c-abc | grep -A 20 Events

# Check node resources
kubectl describe nodes | grep -A 10 "Allocated resources"

# Example output:
# Allocated resources:
#   (Total limits may be over 100 percent, i.e., overcommitted.)
#   Resource           Requests      Limits
#   --------           --------      ------
#   cpu                3800m (95%)   8000m (200%)
#   memory             14Gi (95%)    20Gi (133%)

# See what's consuming resources
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory

Step 2: Check Node Memory Pressure

# Check node conditions
kubectl get nodes -o custom-columns=\
NAME:.metadata.name,\
MEMORY_PRESSURE:.status.conditions[?(@.type=="MemoryPressure")].status,\
DISK_PRESSURE:.status.conditions[?(@.type=="DiskPressure")].status

# Check kubelet logs for evictions
gcloud logging read \
  'resource.type="k8s_node" AND
   textPayload:"eviction"' \
  --limit=50 \
  --format="table(timestamp, textPayload)"

# Check what's being evicted
kubectl get events --sort-by='.lastTimestamp' | grep -i evict

Step 3: Analyze Pod Resource Configuration

# Find pods without resource limits (dangerous!)
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.containers[].resources.limits == null) |
  "\(.metadata.namespace)/\(.metadata.name)"'

# Check current resource configuration
kubectl get pod api-server-7d9f8c-def -o yaml | \
  yq '.spec.containers[].resources'

# Typical problematic config:
# resources:
#   requests:
#     memory: "256Mi"    # Too low request
#     cpu: "100m"
#   limits:
#     memory: "8Gi"      # 32x the request - causes overcommit!
#     cpu: "2000m"

Step 4: Fix Resource Requests and Limits

# Proper resource configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        resources:
          requests:
            memory: "512Mi"    # Based on actual P95 usage
            cpu: "250m"
          limits:
            memory: "1Gi"      # 2x request is reasonable
            cpu: "500m"        # Limit CPU bursting
        # Add liveness/readiness for proper scheduling
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Step 5: Configure Cluster Autoscaler

# Check current autoscaler status
gcloud container clusters describe my-cluster \
  --zone=us-central1-a \
  --format="yaml(autoscaling)"

# Enable cluster autoscaler with proper limits
gcloud container clusters update my-cluster \
  --zone=us-central1-a \
  --enable-autoscaling \
  --min-nodes=2 \
  --max-nodes=20 \
  --node-pool=default-pool

# Check autoscaler events
kubectl get events -n kube-system | grep cluster-autoscaler

Step 6: Create Properly Sized Node Pools

# Terraform: Create node pool with appropriate sizing
resource "google_container_node_pool" "primary" {
  name       = "primary-pool"
  cluster    = google_container_cluster.main.name
  location   = "us-central1"

  # Autoscaling configuration
  autoscaling {
    min_node_count = 2
    max_node_count = 20
  }

  node_config {
    machine_type = "e2-standard-4"  # 4 vCPU, 16GB RAM

    # Reserve resources for system daemons
    # Allocatable = Total - Reserved
    # e2-standard-4: ~3.5 CPU, ~14GB allocatable

    labels = {
      workload = "general"
    }

    # Prevent scheduling on nodes during maintenance
    taint {
      key    = "dedicated"
      value  = "general"
      effect = "NO_SCHEDULE"
    }

    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }
}

# Separate pool for memory-intensive workloads
resource "google_container_node_pool" "memory_optimized" {
  name     = "memory-pool"
  cluster  = google_container_cluster.main.name
  location = "us-central1"

  autoscaling {
    min_node_count = 0
    max_node_count = 10
  }

  node_config {
    machine_type = "n2-highmem-4"  # 4 vCPU, 32GB RAM

    labels = {
      workload = "memory-intensive"
    }

    taint {
      key    = "workload"
      value  = "memory-intensive"
      effect = "NO_SCHEDULE"
    }
  }
}

Step 7: Implement Resource Quotas and Limit Ranges

# Namespace resource quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "50"

---
# Default limits for pods without explicit resources
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:
      memory: "512Mi"
      cpu: "500m"
    defaultRequest:
      memory: "256Mi"
      cpu: "100m"
    max:
      memory: "4Gi"
      cpu: "2"
    min:
      memory: "64Mi"
      cpu: "50m"
    type: Container

Step 8: Set Up Priority Classes

# High priority for critical workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"

---
# Default priority
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: default-priority
value: 0
globalDefault: true
description: "Default priority for all pods"

---
# Low priority for batch jobs (can be preempted)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: -1000
globalDefault: false
preemptionPolicy: Never
description: "Batch jobs that can be preempted"

# Use priority class in deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  template:
    spec:
      priorityClassName: high-priority
      containers:
      - name: api
        # ...

Systematic, production-ready debugging

GKE Resource Debugging Cheatsheet

Symptom	Likely Cause	Fix
Pods Pending	Insufficient resources	Right-size requests or add nodes
Node MemoryPressure	Overcommitted memory	Reduce limits:requests ratio
OOMKilled pods	Memory limit too low	Increase limit based on actual usage
Slow scaling	Autoscaler configuration	Reduce scale-down delay
Uneven distribution	No PodAntiAffinity	Add topology spread constraints

Useful Debugging Commands

# Real-time resource monitoring
kubectl top pods --containers
kubectl top nodes

# Check why autoscaler isn't scaling
kubectl -n kube-system logs -l app=cluster-autoscaler --tail=100

# Find resource hogs
kubectl get pods -A -o json | jq -r '
  .items[] |
  "\(.metadata.namespace)/\(.metadata.name):
   CPU: \(.spec.containers[0].resources.requests.cpu // "none")
   MEM: \(.spec.containers[0].resources.requests.memory // "none")"'

Practice Question

Why does having memory limits much higher than requests (e.g., 256Mi request, 8Gi limit) cause node memory pressure issues?

Questions