DeployU
Interviews / Cloud & DevOps / This container doesn't shut down gracefully and loses in-flight requests. Fix it.

This container doesn't shut down gracefully and loses in-flight requests. Fix it.

practical Container Lifecycle Interactive Quiz Code Examples

The Scenario

During deployments, your API container is losing requests:

$ docker stop api-service
# Takes 10 seconds (Docker's default timeout), then force kills

$ docker logs api-service
# Last lines:
[2024-01-15T10:30:45] Request received: POST /orders
[2024-01-15T10:30:45] Processing order...
# No completion log - request was killed mid-processing!

Users are seeing failed requests during every deployment. The engineering team has resorted to deploying only during low-traffic periods.

The Challenge

Implement graceful shutdown so the container finishes processing in-flight requests before exiting. Explain the SIGTERM/SIGKILL lifecycle and how to handle it properly.

Wrong Approach

A junior engineer might increase the stop timeout to 60 seconds hoping requests complete, add sleep before stopping, or not understand why the application isn't receiving the shutdown signal. These fail because longer timeouts just delay the inevitable if signals aren't handled, sleep adds unnecessary delay, and the real issue is signal handling in the application.

Right Approach

A senior engineer understands the container shutdown sequence: Docker sends SIGTERM, waits for graceful shutdown, then sends SIGKILL after timeout. The fix requires: 1) Ensuring the application receives SIGTERM (not trapped by shell), 2) Implementing a signal handler that stops accepting new requests, 3) Waiting for in-flight requests to complete, 4) Then exiting cleanly. This requires both Dockerfile and application code changes.

Understanding the Shutdown Sequence

1. docker stop <container>
2. Docker sends SIGTERM to PID 1 in container
3. Container has 10 seconds (default) to exit gracefully
4. If still running, Docker sends SIGKILL (immediate termination)

Common Problem: If your app runs via a shell script, the shell (PID 1) receives SIGTERM but doesn’t forward it to your app!

Step 1: Fix the Dockerfile

# WRONG: Shell form - runs through /bin/sh
CMD npm start

# WRONG: Shell wrapper traps signals
CMD ["./start.sh"]

# RIGHT: Exec form - app is PID 1, receives signals directly
CMD ["node", "server.js"]

Step 2: Use a Proper Init System

FROM node:18-alpine

# Install tini (included in Alpine)
RUN apk add --no-cache tini

WORKDIR /app
COPY . .
RUN npm ci --only=production

# Tini handles signal forwarding and zombie reaping
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]

Why tini?

  • Forwards signals to child processes
  • Reaps zombie processes
  • Lightweight (< 1MB)
  • Handles edge cases your app shouldn’t worry about

Step 3: Implement Signal Handler in Application

// server.js
const express = require('express');
const app = express();

let isShuttingDown = false;
let server;

// Middleware to reject requests during shutdown
app.use((req, res, next) => {
  if (isShuttingDown) {
    res.status(503).json({ error: 'Server is shutting down' });
    return;
  }
  next();
});

// Your routes
app.post('/orders', async (req, res) => {
  // Long-running request simulation
  await processOrder(req.body);
  res.json({ status: 'completed' });
});

// Start server
server = app.listen(3000, () => {
  console.log('Server started on port 3000');
});

// Graceful shutdown handler
function gracefulShutdown(signal) {
  console.log(`Received ${signal}. Starting graceful shutdown...`);
  isShuttingDown = true;

  // Stop accepting new connections
  server.close((err) => {
    if (err) {
      console.error('Error during shutdown:', err);
      process.exit(1);
    }

    console.log('All connections closed. Exiting.');
    process.exit(0);
  });

  // Force exit after timeout (safety net)
  setTimeout(() => {
    console.error('Shutdown timeout. Forcing exit.');
    process.exit(1);
  }, 25000); // Leave buffer before Docker's SIGKILL
}

// Handle shutdown signals
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

Step 4: Configure Docker Stop Timeout

# docker-compose.yml
services:
  api:
    image: api:v2.0
    stop_grace_period: 30s  # Give app 30 seconds to shutdown
# Command line
docker stop --time 30 api-service

Step 5: Handle Database Connections

const pool = require('./db');

async function gracefulShutdown(signal) {
  console.log(`Received ${signal}. Starting graceful shutdown...`);
  isShuttingDown = true;

  // 1. Stop accepting new requests
  server.close(async () => {
    try {
      // 2. Close database connections
      await pool.end();
      console.log('Database pool closed');

      // 3. Close other resources (Redis, MQ, etc.)
      await redis.quit();
      await mqConnection.close();

      console.log('All resources closed. Exiting.');
      process.exit(0);
    } catch (err) {
      console.error('Error during cleanup:', err);
      process.exit(1);
    }
  });
}

Graceful Shutdown Checklist

StepAction
1Receive SIGTERM signal
2Stop accepting new connections
3Return 503 for new requests
4Wait for in-flight requests to complete
5Close database connections
6Close message queue connections
7Flush logs and metrics
8Exit with code 0

Testing Graceful Shutdown

# Terminal 1: Start container
docker run --name api -p 3000:3000 api:v2.0

# Terminal 2: Send long-running request
curl -X POST http://localhost:3000/orders -d '{"item":"test"}' &

# Terminal 3: Stop container while request is in progress
docker stop api

# Check logs - request should complete before shutdown
docker logs api

Python Example (Flask)

import signal
import sys
from flask import Flask

app = Flask(__name__)
is_shutting_down = False

@app.before_request
def check_shutdown():
    if is_shutting_down:
        return {'error': 'Server shutting down'}, 503

def graceful_shutdown(signum, frame):
    global is_shutting_down
    print(f'Received signal {signum}. Shutting down...')
    is_shutting_down = True
    # In production, use a proper WSGI server that handles this
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)
signal.signal(signal.SIGINT, graceful_shutdown)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=3000)

Practice Question

Why does using 'CMD npm start' in a Dockerfile often prevent graceful shutdown?