A container keeps exiting immediately after starting. How do you debug it?

Q: A container keeps exiting immediately after starting. How do you debug it?

Learn the answer to "A container keeps exiting immediately after starting. How do you debug it?" with detailed explanations, code examples, and best practices on DeployU.

The Scenario

You deploy a new version of your API service and the container keeps crashing:

$ docker ps -a
CONTAINER ID   IMAGE          STATUS                     NAMES
a1b2c3d4e5f6   api:v2.1.0     Exited (1) 2 seconds ago   api-service

Every time you start it, it exits within seconds. The previous version (v2.0.0) works fine. Production is partially down and you need to fix this quickly.

The Challenge

Walk through your systematic debugging process. What commands would you run, in what order, and why? How do you identify whether this is a code issue, configuration problem, or missing dependency?

Wrong Approach

A junior engineer might immediately try docker run with -it flag hoping to see output, rebuild the image thinking the build failed, check if the port is already in use, or just rollback without understanding the issue. This fails because -it won't help if the process crashes before any output, rebuilding without diagnosing repeats the problem, port conflicts show different errors, and rolling back doesn't prevent the issue from recurring.

Addresses symptoms, not root cause

Right Approach

A senior engineer follows a systematic approach: first check exit codes and container logs, then inspect the container configuration, compare with working version, and if needed, override the entrypoint to get shell access for live debugging. Exit codes reveal the failure type, and logs usually contain the actual error message.

Step 1: Check Exit Code and Logs (First 30 seconds)

# Check the exit code
docker inspect api-service --format='{{.State.ExitCode}}'
# Exit code 1 = application error
# Exit code 137 = OOM killed (128 + 9 SIGKILL)
# Exit code 139 = Segmentation fault
# Exit code 143 = SIGTERM (graceful shutdown)

# Get container logs
docker logs api-service

# If container restarts too fast, get logs from last run
docker logs api-service 2>&1 | tail -100

Common log patterns:

Error: Cannot find module 'express' → Missing dependency
EADDRINUSE → Port already in use
permission denied → File/directory access issue
ECONNREFUSED → Can’t reach database/dependency

Step 2: Inspect Container Configuration

# Check full container details
docker inspect api-service

# Key things to look for:
docker inspect api-service --format='
CMD: {{.Config.Cmd}}
Entrypoint: {{.Config.Entrypoint}}
Env: {{range .Config.Env}}{{.}} {{end}}
WorkingDir: {{.Config.WorkingDir}}
'

# Compare with working version
docker inspect api:v2.0.0 --format='{{.Config.Cmd}}'
docker inspect api:v2.1.0 --format='{{.Config.Cmd}}'

Step 3: Get Shell Access for Live Debugging

# Override entrypoint to get shell access
docker run -it --entrypoint /bin/sh api:v2.1.0

# Inside the container, manually run the command
$ node server.js
# Now you'll see the actual error in real-time

# Check if files exist
$ ls -la /app
$ cat /app/package.json

# Check environment variables
$ env | grep -i database

Step 4: Check Resource Constraints

# Was it killed due to memory limits?
docker inspect api-service --format='{{.State.OOMKilled}}'

# Check memory limit vs usage
docker stats api-service --no-stream

# Check if there are resource limits
docker inspect api-service --format='
Memory Limit: {{.HostConfig.Memory}}
CPU Shares: {{.HostConfig.CpuShares}}
'

Step 5: Compare Image Layers

# Check what changed between versions
docker history api:v2.0.0
docker history api:v2.1.0

# Use dive tool for detailed analysis
dive api:v2.1.0

Systematic, production-ready debugging

Common Root Causes and Fixes

Exit Code	Meaning	Common Causes	Fix
0	Success	CMD finished (not a daemon)	Ensure process runs in foreground
1	General error	Application crash, missing config	Check logs for specific error
126	Permission denied	Script not executable	`chmod +x script.sh`
127	Command not found	Wrong CMD/ENTRYPOINT path	Verify binary exists in image
137	SIGKILL (OOM)	Memory limit exceeded	Increase memory or fix leak
139	SIGSEGV	Segmentation fault	Debug application code
143	SIGTERM	Graceful shutdown	Check why container was stopped

Real-World Example: Missing Environment Variable

Logs show:

Error: DATABASE_URL environment variable is required
    at validateConfig (/app/dist/config.js:15:11)
    at Object.<anonymous> (/app/dist/server.js:3:1)

Root Cause: New version added database URL validation, but the environment variable wasn’t set in the docker run command.

Fix:

# Add the missing environment variable
docker run -d \
  -e DATABASE_URL=postgres://user:pass@db:5432/myapp \
  --name api-service \
  api:v2.1.0

Debugging Quick Reference

# Full debugging workflow
docker logs <container>                          # Check logs
docker inspect <container> --format='{{.State}}' # Check state
docker diff <container>                          # Check filesystem changes
docker top <container>                           # Check running processes
docker exec -it <container> /bin/sh              # Get shell (if running)
docker run -it --entrypoint /bin/sh <image>      # Override entrypoint

Practice Question

A container exits with code 137. What is the most likely cause?

Questions