Questions
Pods are being OOMKilled in production. How do you diagnose and prevent this?
The Scenario
It’s Monday morning. Your analytics service—which processes customer data for reporting—keeps crashing. The on-call logs show pods restarting every 10-15 minutes during peak hours.
When you check the cluster:
$ kubectl get pods -n analytics
NAME READY STATUS RESTARTS AGE
analytics-worker-7d9f8c-abc 0/1 OOMKilled 12 45m
analytics-worker-7d9f8c-def 1/1 Running 8 45m
analytics-worker-7d9f8c-ghi 0/1 OOMKilled 10 45m
The logs don’t show application errors—pods just suddenly restart. Users are complaining that reports are incomplete or missing data. Your VP of Product is asking for an ETA on the fix.
Current deployment configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: analytics-worker
namespace: analytics
spec:
replicas: 3
template:
spec:
containers:
- name: worker
image: company/analytics-worker:v2.1
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi" # Same as request
cpu: "1000m"
The Challenge
- Diagnose: Prove that OOMKilled is actually the issue and identify the memory consumption pattern
- Immediate fix: Get the service stable ASAP
- Root cause: Why is memory usage increasing?
- Long-term solution: Prevent this from happening again
Walk through your complete debugging and remediation process.
Just increase memory without investigation - increase the memory limit to 4Gi and hope it fixes the problem. Doesn't address root cause (memory leak still exists), wastes cluster resources, problem will return when memory reaches 4Gi, no monitoring or alerting setup, and doesn't understand WHY pods are OOMKilled.
Systematic debugging approach: confirm OOMKilled with exit code 137, check resource metrics with kubectl top, analyze historical memory usage patterns, examine application logs, implement immediate fix with proper Burstable QoS, investigate root cause for memory leaks, set up monitoring and alerts, implement HPA for horizontal scaling, and optimize application memory usage with proper chunking and streaming.
Wrong Approach: Just Increase Memory
The wrong approach is to blindly increase the memory limit:
resources:
limits:
memory: "4Gi" # Just make it biggerProblems with this approach:
- Doesn’t address root cause (memory leak still exists)
- Wastes cluster resources
- Problem will return when memory reaches 4Gi
- No monitoring or alerting setup
- Doesn’t understand WHY pods are OOMKilled
Right Approach: Systematic Debugging and Comprehensive Solution
This is one of the most common production issues in Kubernetes. Here’s how senior SREs handle it:
Phase 1: Confirm OOMKilled (30 seconds)
# Check pod status
kubectl get pods -n analytics
# Describe pod to see exit code
kubectl describe pod analytics-worker-7d9f8c-abc -n analytics
# Look for this in the output:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 15 Jan 2024 09:00:00 +0000
Finished: Mon, 15 Jan 2024 09:12:34 +0000Exit Code 137 = 128 + 9 (SIGKILL) - This is the definitive proof of OOMKilled.
Phase 2: Check Resource Metrics (Next 2 minutes)
# Check current memory usage across all pods
kubectl top pods -n analytics
NAME CPU(cores) MEMORY(bytes)
analytics-worker-7d9f8c-abc 450m 498Mi
analytics-worker-7d9f8c-def 380m 512Mi # At limit!
analytics-worker-7d9f8c-ghi 520m 501Mi
# Check node capacity
kubectl top nodes
# Get detailed resource info
kubectl describe node <node-name> | grep -A 5 "Allocated resources"Phase 3: Analyze Historical Memory Usage
If you have Prometheus + Grafana, run these queries:
# Memory usage over time
container_memory_usage_bytes{
namespace="analytics",
pod=~"analytics-worker.*"
}
# Memory usage as % of limit
(container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100What to look for:
- Gradual increase: Memory leak in application
- Sudden spikes: Processing large datasets
- Cyclic pattern: Batch jobs running periodically
Phase 4: Examine Application Logs
# Check logs from the crashed pod
kubectl logs analytics-worker-7d9f8c-abc -n analytics --previous
# Look for memory-related errors before crash
# Common patterns:
# - "heap out of memory"
# - "cannot allocate memory"
# - Processing large files: "loading 10GB CSV file"Immediate Fix: Increase Memory Limits
Short-term solution (deploy immediately):
apiVersion: apps/v1
kind: Deployment
metadata:
name: analytics-worker
namespace: analytics
spec:
replicas: 3
template:
spec:
containers:
- name: worker
image: company/analytics-worker:v2.1
resources:
requests:
memory: "512Mi" # Request stays same (guaranteed memory)
cpu: "500m"
limits:
memory: "2Gi" # Increased 4x (max burst capacity)
cpu: "2000m"Why this works:
- Gives pods more headroom during memory spikes
- Prevents OOMKilled during peak processing
- Pods can burst above request but stay within limit
Deploy the fix:
kubectl apply -f deployment.yaml
kubectl rollout status deployment/analytics-worker -n analytics
kubectl get pods -n analytics -wUnderstanding Kubernetes QoS Classes
Kubernetes assigns pods to QoS (Quality of Service) classes based on resources:
1. Guaranteed (Highest Priority)
resources:
requests:
memory: "1Gi"
cpu: "1000m"
limits:
memory: "1Gi" # Same as request
cpu: "1000m" # Same as request- Gets evicted last
- Best for critical workloads
- But: No burst capacity!
2. Burstable (Medium Priority)
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi" # Higher than request
cpu: "2000m"- Can burst above request
- Good for variable workloads
- Gets evicted after BestEffort
3. BestEffort (Lowest Priority - AVOID)
resources: {} # No requests or limits- Gets evicted first
- Never use in production!
Our fix uses Burstable QoS:
- Guaranteed 512Mi (request)
- Can burst to 2Gi (limit) during processing
Root Cause Analysis: Memory Leak Investigation
Check application code for memory leaks:
// ❌ MEMORY LEAK - Array grows forever
const processedRecords = [];
async function processData() {
while (true) {
const batch = await fetchNextBatch();
processedRecords.push(...batch); // Never cleared!
await generateReport(processedRecords);
}
}
// ✅ FIX - Clear array after processing
const processedRecords = [];
async function processData() {
while (true) {
const batch = await fetchNextBatch();
processedRecords.push(...batch);
await generateReport(processedRecords);
processedRecords.length = 0; // Clear memory
}
}Common memory leak patterns:
- Event listeners not removed
- Caching without eviction policy
- Large objects kept in memory
- Database connections not closed
- Timers/intervals not cleared
Node.js specific debugging:
# Get heap snapshot from running pod
kubectl exec -it analytics-worker-7d9f8c-abc -n analytics -- \
node --expose-gc --inspect=0.0.0.0:9229 app.js
# Port-forward to access debugger
kubectl port-forward analytics-worker-7d9f8c-abc 9229:9229 -n analytics
# Open Chrome DevTools → Memory tab → Take heap snapshotLong-Term Solutions
1. Implement Memory Monitoring with Alerts
# Prometheus alert rule
groups:
- name: kubernetes-memory
rules:
- alert: PodHighMemoryUsage
expr: |
(container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is using > 90% memory"
description: "Memory usage: {{ $value | humanizePercentage }}"2. Set Appropriate Resource Requests and Limits
Best practice formula:
Request = Average usage + 20% buffer
Limit = Peak usage + 30% bufferExample calculation:
Average memory usage: 400Mi (from metrics)
Request: 400Mi * 1.2 = 480Mi → Round to 512Mi
Peak usage during processing: 1.5Gi (from metrics)
Limit: 1.5Gi * 1.3 = 1.95Gi → Round to 2Gi3. Implement Horizontal Pod Autoscaling
Instead of making pods bigger, make more pods:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: analytics-worker-hpa
namespace: analytics
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: analytics-worker
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70 # Scale when avg pod uses >70% memory
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Double pods at once if needed
periodSeconds: 604. Optimize Application Memory Usage
// Process data in smaller chunks instead of all at once
async function processDataOptimized() {
const CHUNK_SIZE = 1000;
while (true) {
// Fetch small batch
const batch = await fetchNextBatch(CHUNK_SIZE);
if (batch.length === 0) break;
// Process batch
await processBatch(batch);
// Batch is garbage collected after loop iteration
}
}
// Use streams for large files
const fs = require('fs');
const readline = require('readline');
async function processLargeFile(filePath) {
const fileStream = fs.createReadStream(filePath);
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
for await (const line of rl) {
// Process one line at a time
await processLine(line);
// Memory is released after each line
}
}5. Enable Vertical Pod Autoscaler (VPA) for Automatic Tuning
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: analytics-worker-vpa
namespace: analytics
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: analytics-worker
updatePolicy:
updateMode: "Auto" # Automatically adjust requests/limits
resourcePolicy:
containerPolicies:
- containerName: worker
minAllowed:
memory: "512Mi"
maxAllowed:
memory: "4Gi"
controlledResources: ["memory"]VPA will:
- Monitor actual memory usage
- Automatically adjust requests/limits
- Prevent OOMKills by increasing limits proactively
Production Checklist: Preventing OOMKilled
- Set memory requests based on average usage + buffer
- Set memory limits based on peak usage + buffer
- Use Burstable QoS (requests < limits) for variable workloads
- Implement memory usage alerts (> 80% for warning, > 90% for critical)
- Enable HPA to scale out instead of up
- Consider VPA for automatic resource tuning
- Profile application for memory leaks
- Process large datasets in chunks/streams
- Monitor with Prometheus + Grafana
- Test under production-like load before deploying
Practice Question
A pod has memory request of 512Mi and limit of 1Gi. The node has 2Gi total memory. What happens when the pod tries to use 1.5Gi?