DeployU
Interviews / Cloud & DevOps / A container keeps exiting immediately after starting. How do you debug it?

A container keeps exiting immediately after starting. How do you debug it?

debugging Debugging & Troubleshooting Interactive Quiz Code Examples

The Scenario

You deploy a new version of your API service and the container keeps crashing:

$ docker ps -a
CONTAINER ID   IMAGE          STATUS                     NAMES
a1b2c3d4e5f6   api:v2.1.0     Exited (1) 2 seconds ago   api-service

Every time you start it, it exits within seconds. The previous version (v2.0.0) works fine. Production is partially down and you need to fix this quickly.

The Challenge

Walk through your systematic debugging process. What commands would you run, in what order, and why? How do you identify whether this is a code issue, configuration problem, or missing dependency?

Wrong Approach

A junior engineer might immediately try docker run with -it flag hoping to see output, rebuild the image thinking the build failed, check if the port is already in use, or just rollback without understanding the issue. This fails because -it won't help if the process crashes before any output, rebuilding without diagnosing repeats the problem, port conflicts show different errors, and rolling back doesn't prevent the issue from recurring.

Right Approach

A senior engineer follows a systematic approach: first check exit codes and container logs, then inspect the container configuration, compare with working version, and if needed, override the entrypoint to get shell access for live debugging. Exit codes reveal the failure type, and logs usually contain the actual error message.

Step 1: Check Exit Code and Logs (First 30 seconds)

# Check the exit code
docker inspect api-service --format='{{.State.ExitCode}}'
# Exit code 1 = application error
# Exit code 137 = OOM killed (128 + 9 SIGKILL)
# Exit code 139 = Segmentation fault
# Exit code 143 = SIGTERM (graceful shutdown)

# Get container logs
docker logs api-service

# If container restarts too fast, get logs from last run
docker logs api-service 2>&1 | tail -100

Common log patterns:

  • Error: Cannot find module 'express' → Missing dependency
  • EADDRINUSE → Port already in use
  • permission denied → File/directory access issue
  • ECONNREFUSED → Can’t reach database/dependency

Step 2: Inspect Container Configuration

# Check full container details
docker inspect api-service

# Key things to look for:
docker inspect api-service --format='
CMD: {{.Config.Cmd}}
Entrypoint: {{.Config.Entrypoint}}
Env: {{range .Config.Env}}{{.}} {{end}}
WorkingDir: {{.Config.WorkingDir}}
'

# Compare with working version
docker inspect api:v2.0.0 --format='{{.Config.Cmd}}'
docker inspect api:v2.1.0 --format='{{.Config.Cmd}}'

Step 3: Get Shell Access for Live Debugging

# Override entrypoint to get shell access
docker run -it --entrypoint /bin/sh api:v2.1.0

# Inside the container, manually run the command
$ node server.js
# Now you'll see the actual error in real-time

# Check if files exist
$ ls -la /app
$ cat /app/package.json

# Check environment variables
$ env | grep -i database

Step 4: Check Resource Constraints

# Was it killed due to memory limits?
docker inspect api-service --format='{{.State.OOMKilled}}'

# Check memory limit vs usage
docker stats api-service --no-stream

# Check if there are resource limits
docker inspect api-service --format='
Memory Limit: {{.HostConfig.Memory}}
CPU Shares: {{.HostConfig.CpuShares}}
'

Step 5: Compare Image Layers

# Check what changed between versions
docker history api:v2.0.0
docker history api:v2.1.0

# Use dive tool for detailed analysis
dive api:v2.1.0

Common Root Causes and Fixes

Exit CodeMeaningCommon CausesFix
0SuccessCMD finished (not a daemon)Ensure process runs in foreground
1General errorApplication crash, missing configCheck logs for specific error
126Permission deniedScript not executablechmod +x script.sh
127Command not foundWrong CMD/ENTRYPOINT pathVerify binary exists in image
137SIGKILL (OOM)Memory limit exceededIncrease memory or fix leak
139SIGSEGVSegmentation faultDebug application code
143SIGTERMGraceful shutdownCheck why container was stopped

Real-World Example: Missing Environment Variable

Logs show:

Error: DATABASE_URL environment variable is required
    at validateConfig (/app/dist/config.js:15:11)
    at Object.<anonymous> (/app/dist/server.js:3:1)

Root Cause: New version added database URL validation, but the environment variable wasn’t set in the docker run command.

Fix:

# Add the missing environment variable
docker run -d \
  -e DATABASE_URL=postgres://user:pass@db:5432/myapp \
  --name api-service \
  api:v2.1.0

Debugging Quick Reference

# Full debugging workflow
docker logs <container>                          # Check logs
docker inspect <container> --format='{{.State}}' # Check state
docker diff <container>                          # Check filesystem changes
docker top <container>                           # Check running processes
docker exec -it <container> /bin/sh              # Get shell (if running)
docker run -it --entrypoint /bin/sh <image>      # Override entrypoint

Practice Question

A container exits with code 137. What is the most likely cause?