DeployU
Interviews / Cloud & DevOps / Lambda functions are timing out when accessing RDS in a VPC. Debug the connectivity issue.

Lambda functions are timing out when accessing RDS in a VPC. Debug the connectivity issue.

debugging Lambda Interactive Quiz Code Examples

The Scenario

Your Lambda function suddenly started timing out:

Task timed out after 30.00 seconds
START RequestId: abc-123
END RequestId: abc-123
REPORT RequestId: abc-123 Duration: 30003.45 ms Billed Duration: 30000 ms Memory Size: 128 MB Max Memory Used: 45 MB Init Duration: 2534.12 ms

The function was working fine until you moved it into a VPC to access an RDS database.

The Challenge

Debug why the Lambda function times out in a VPC, identify the root cause, and implement a fix that maintains security while enabling connectivity.

Wrong Approach

A junior engineer might increase the timeout to 5 minutes, add the Lambda to a public subnet, or attach an Elastic IP to the Lambda. These approaches don't fix the root cause, create security risks, and don't solve the connectivity issue.

Right Approach

A senior engineer understands that Lambda in a VPC needs a NAT Gateway or VPC endpoints for outbound connectivity, configures proper subnet routing, and optimizes for cold starts with connection reuse.

Step 1: Understand the VPC Networking Issue

Lambda VPC Connectivity:
┌─────────────────────────────────────────────────────────────────┐
│                           VPC                                    │
│  ┌─────────────────┐              ┌─────────────────┐           │
│  │  Public Subnet  │              │  Private Subnet │           │
│  │                 │              │                 │           │
│  │  ┌───────────┐  │              │  ┌───────────┐  │           │
│  │  │    NAT    │  │              │  │  Lambda   │  │           │
│  │  │  Gateway  │◄─┼──────────────┼──│ Function  │  │           │
│  │  └─────┬─────┘  │              │  └───────────┘  │           │
│  │        │        │              │        │        │           │
│  └────────┼────────┘              └────────┼────────┘           │
│           │                                │                     │
│           ▼                                ▼                     │
│    Internet Gateway                 RDS (same VPC)              │
│           │                                                      │
└───────────┼──────────────────────────────────────────────────────┘

      AWS Services
   (Secrets Manager,
    CloudWatch, etc.)

Step 2: Diagnose the Issue

# Check Lambda VPC configuration
aws lambda get-function-configuration \
  --function-name my-function \
  --query '{VpcConfig: VpcConfig, Timeout: Timeout}'

# Check if subnets have route to NAT Gateway
aws ec2 describe-route-tables \
  --filters "Name=association.subnet-id,Values=subnet-abc123" \
  --query 'RouteTables[].Routes[]'

# Verify security group allows outbound traffic
aws ec2 describe-security-groups \
  --group-ids sg-lambda123 \
  --query 'SecurityGroups[].{Egress: IpPermissionsEgress}'

# Check NAT Gateway status
aws ec2 describe-nat-gateways \
  --filter "Name=vpc-id,Values=vpc-123" \
  --query 'NatGateways[].{State: State, SubnetId: SubnetId}'

Step 3: Fix with Terraform - NAT Gateway Approach

# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
}

# Public subnet for NAT Gateway
resource "aws_subnet" "public" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "public-${count.index}"
  }
}

# Private subnets for Lambda
resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index + 10}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "private-${count.index}"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

# Elastic IP for NAT Gateway
resource "aws_eip" "nat" {
  domain = "vpc"
}

# NAT Gateway in public subnet
resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public[0].id

  depends_on = [aws_internet_gateway.main]
}

# Route table for public subnets
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}

# Route table for private subnets (Lambda)
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }
}

# Associate private subnets with private route table
resource "aws_route_table_association" "private" {
  count          = 2
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}

# Lambda security group
resource "aws_security_group" "lambda" {
  name   = "lambda-sg"
  vpc_id = aws_vpc.main.id

  # Allow all outbound traffic
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# RDS security group
resource "aws_security_group" "rds" {
  name   = "rds-sg"
  vpc_id = aws_vpc.main.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.lambda.id]
  }
}

# Lambda function
resource "aws_lambda_function" "main" {
  function_name = "my-function"
  runtime       = "python3.11"
  handler       = "handler.main"
  timeout       = 30
  memory_size   = 256

  vpc_config {
    subnet_ids         = aws_subnet.private[*].id
    security_group_ids = [aws_security_group.lambda.id]
  }

  environment {
    variables = {
      DB_HOST = aws_db_instance.main.endpoint
    }
  }
}

Step 4: Better Approach - VPC Endpoints (No NAT costs)

# VPC Endpoint for Secrets Manager (Interface endpoint)
resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.secretsmanager"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoint.id]
  private_dns_enabled = true
}

# VPC Endpoint for CloudWatch Logs
resource "aws_vpc_endpoint" "logs" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.logs"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoint.id]
  private_dns_enabled = true
}

# VPC Endpoint for S3 (Gateway endpoint - free)
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

# VPC Endpoint for DynamoDB (Gateway endpoint - free)
resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.dynamodb"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoint" {
  name   = "vpc-endpoint-sg"
  vpc_id = aws_vpc.main.id

  ingress {
    from_port       = 443
    to_port         = 443
    protocol        = "tcp"
    security_groups = [aws_security_group.lambda.id]
  }
}

Step 5: Optimize Cold Starts

# handler.py - Connection reuse pattern
import os
import psycopg2
from psycopg2 import pool

# Initialize connection pool OUTSIDE handler (reused across invocations)
connection_pool = None

def get_connection_pool():
    global connection_pool
    if connection_pool is None:
        connection_pool = psycopg2.pool.SimpleConnectionPool(
            minconn=1,
            maxconn=5,
            host=os.environ['DB_HOST'],
            database=os.environ['DB_NAME'],
            user=os.environ['DB_USER'],
            password=os.environ['DB_PASSWORD'],
            connect_timeout=5
        )
    return connection_pool

def main(event, context):
    pool = get_connection_pool()
    conn = pool.getconn()

    try:
        with conn.cursor() as cur:
            cur.execute("SELECT * FROM orders WHERE id = %s", (event['order_id'],))
            result = cur.fetchone()
            return {'statusCode': 200, 'body': result}
    finally:
        pool.putconn(conn)

Step 6: Use RDS Proxy for Better Connection Management

resource "aws_db_proxy" "main" {
  name                   = "rds-proxy"
  debug_logging          = false
  engine_family          = "POSTGRESQL"
  idle_client_timeout    = 1800
  require_tls            = true
  vpc_security_group_ids = [aws_security_group.rds_proxy.id]
  vpc_subnet_ids         = aws_subnet.private[*].id

  auth {
    auth_scheme               = "SECRETS"
    iam_auth                  = "REQUIRED"
    secret_arn                = aws_secretsmanager_secret.db_credentials.arn
  }
}

resource "aws_db_proxy_default_target_group" "main" {
  db_proxy_name = aws_db_proxy.main.name

  connection_pool_config {
    max_connections_percent      = 100
    max_idle_connections_percent = 50
    connection_borrow_timeout    = 120
  }
}

resource "aws_db_proxy_target" "main" {
  db_proxy_name          = aws_db_proxy.main.name
  target_group_name      = aws_db_proxy_default_target_group.main.name
  db_instance_identifier = aws_db_instance.main.id
}

Common Lambda VPC Issues

IssueSymptomSolution
No internet accessTimeout calling AWS APIsAdd NAT Gateway or VPC endpoints
ENI creation slowCold starts over 10sUse provisioned concurrency
Connection exhaustion”Too many connections”Use RDS Proxy
DNS resolution fails”Name resolution failed”Enable DNS support in VPC
Security group blocksTimeout on specific portsCheck egress rules

Practice Question

Why do Lambda functions in a VPC lose internet connectivity by default?