Questions
Lambda functions are timing out when accessing RDS in a VPC. Debug the connectivity issue.
The Scenario
Your Lambda function suddenly started timing out:
Task timed out after 30.00 seconds
START RequestId: abc-123
END RequestId: abc-123
REPORT RequestId: abc-123 Duration: 30003.45 ms Billed Duration: 30000 ms Memory Size: 128 MB Max Memory Used: 45 MB Init Duration: 2534.12 ms
The function was working fine until you moved it into a VPC to access an RDS database.
The Challenge
Debug why the Lambda function times out in a VPC, identify the root cause, and implement a fix that maintains security while enabling connectivity.
A junior engineer might increase the timeout to 5 minutes, add the Lambda to a public subnet, or attach an Elastic IP to the Lambda. These approaches don't fix the root cause, create security risks, and don't solve the connectivity issue.
A senior engineer understands that Lambda in a VPC needs a NAT Gateway or VPC endpoints for outbound connectivity, configures proper subnet routing, and optimizes for cold starts with connection reuse.
Step 1: Understand the VPC Networking Issue
Lambda VPC Connectivity:
┌─────────────────────────────────────────────────────────────────┐
│ VPC │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Public Subnet │ │ Private Subnet │ │
│ │ │ │ │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ NAT │ │ │ │ Lambda │ │ │
│ │ │ Gateway │◄─┼──────────────┼──│ Function │ │ │
│ │ └─────┬─────┘ │ │ └───────────┘ │ │
│ │ │ │ │ │ │ │
│ └────────┼────────┘ └────────┼────────┘ │
│ │ │ │
│ ▼ ▼ │
│ Internet Gateway RDS (same VPC) │
│ │ │
└───────────┼──────────────────────────────────────────────────────┘
▼
AWS Services
(Secrets Manager,
CloudWatch, etc.)Step 2: Diagnose the Issue
# Check Lambda VPC configuration
aws lambda get-function-configuration \
--function-name my-function \
--query '{VpcConfig: VpcConfig, Timeout: Timeout}'
# Check if subnets have route to NAT Gateway
aws ec2 describe-route-tables \
--filters "Name=association.subnet-id,Values=subnet-abc123" \
--query 'RouteTables[].Routes[]'
# Verify security group allows outbound traffic
aws ec2 describe-security-groups \
--group-ids sg-lambda123 \
--query 'SecurityGroups[].{Egress: IpPermissionsEgress}'
# Check NAT Gateway status
aws ec2 describe-nat-gateways \
--filter "Name=vpc-id,Values=vpc-123" \
--query 'NatGateways[].{State: State, SubnetId: SubnetId}'Step 3: Fix with Terraform - NAT Gateway Approach
# VPC Configuration
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
}
# Public subnet for NAT Gateway
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "public-${count.index}"
}
}
# Private subnets for Lambda
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 10}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-${count.index}"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
# Elastic IP for NAT Gateway
resource "aws_eip" "nat" {
domain = "vpc"
}
# NAT Gateway in public subnet
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
depends_on = [aws_internet_gateway.main]
}
# Route table for public subnets
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
# Route table for private subnets (Lambda)
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
}
# Associate private subnets with private route table
resource "aws_route_table_association" "private" {
count = 2
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
# Lambda security group
resource "aws_security_group" "lambda" {
name = "lambda-sg"
vpc_id = aws_vpc.main.id
# Allow all outbound traffic
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# RDS security group
resource "aws_security_group" "rds" {
name = "rds-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.lambda.id]
}
}
# Lambda function
resource "aws_lambda_function" "main" {
function_name = "my-function"
runtime = "python3.11"
handler = "handler.main"
timeout = 30
memory_size = 256
vpc_config {
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.lambda.id]
}
environment {
variables = {
DB_HOST = aws_db_instance.main.endpoint
}
}
}Step 4: Better Approach - VPC Endpoints (No NAT costs)
# VPC Endpoint for Secrets Manager (Interface endpoint)
resource "aws_vpc_endpoint" "secretsmanager" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.secretsmanager"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoint.id]
private_dns_enabled = true
}
# VPC Endpoint for CloudWatch Logs
resource "aws_vpc_endpoint" "logs" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.logs"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoint.id]
private_dns_enabled = true
}
# VPC Endpoint for S3 (Gateway endpoint - free)
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = [aws_route_table.private.id]
}
# VPC Endpoint for DynamoDB (Gateway endpoint - free)
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.dynamodb"
vpc_endpoint_type = "Gateway"
route_table_ids = [aws_route_table.private.id]
}
# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoint" {
name = "vpc-endpoint-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.lambda.id]
}
}Step 5: Optimize Cold Starts
# handler.py - Connection reuse pattern
import os
import psycopg2
from psycopg2 import pool
# Initialize connection pool OUTSIDE handler (reused across invocations)
connection_pool = None
def get_connection_pool():
global connection_pool
if connection_pool is None:
connection_pool = psycopg2.pool.SimpleConnectionPool(
minconn=1,
maxconn=5,
host=os.environ['DB_HOST'],
database=os.environ['DB_NAME'],
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD'],
connect_timeout=5
)
return connection_pool
def main(event, context):
pool = get_connection_pool()
conn = pool.getconn()
try:
with conn.cursor() as cur:
cur.execute("SELECT * FROM orders WHERE id = %s", (event['order_id'],))
result = cur.fetchone()
return {'statusCode': 200, 'body': result}
finally:
pool.putconn(conn)Step 6: Use RDS Proxy for Better Connection Management
resource "aws_db_proxy" "main" {
name = "rds-proxy"
debug_logging = false
engine_family = "POSTGRESQL"
idle_client_timeout = 1800
require_tls = true
vpc_security_group_ids = [aws_security_group.rds_proxy.id]
vpc_subnet_ids = aws_subnet.private[*].id
auth {
auth_scheme = "SECRETS"
iam_auth = "REQUIRED"
secret_arn = aws_secretsmanager_secret.db_credentials.arn
}
}
resource "aws_db_proxy_default_target_group" "main" {
db_proxy_name = aws_db_proxy.main.name
connection_pool_config {
max_connections_percent = 100
max_idle_connections_percent = 50
connection_borrow_timeout = 120
}
}
resource "aws_db_proxy_target" "main" {
db_proxy_name = aws_db_proxy.main.name
target_group_name = aws_db_proxy_default_target_group.main.name
db_instance_identifier = aws_db_instance.main.id
} Common Lambda VPC Issues
| Issue | Symptom | Solution |
|---|---|---|
| No internet access | Timeout calling AWS APIs | Add NAT Gateway or VPC endpoints |
| ENI creation slow | Cold starts over 10s | Use provisioned concurrency |
| Connection exhaustion | ”Too many connections” | Use RDS Proxy |
| DNS resolution fails | ”Name resolution failed” | Enable DNS support in VPC |
| Security group blocks | Timeout on specific ports | Check egress rules |
Practice Question
Why do Lambda functions in a VPC lose internet connectivity by default?