Interviews / Cloud & DevOps / Design a multi-tier VPC architecture with public, private, and database subnets.
Lambda functions are timing out when accessing RDS in a VPC. Debug the connectivity issue.
Design a multi-tier VPC architecture with public, private, and database subnets.
DynamoDB is throttling requests and costs are high. Optimize the table design.
RDS connections are exhausted and failover takes too long. Fix the database setup.
Implement S3 with CloudFront for secure, cached content delivery with signed URLs.
ECS tasks are failing with exit code 137 and health check failures. Debug the container issues.
Messages are being lost and processed multiple times. Implement reliable SQS/SNS messaging.
Design a scalable API Gateway with throttling, caching, and Lambda integration.
Production incidents take hours to detect. Implement CloudWatch alarms and dashboards.
IAM policies are too permissive. Implement least privilege access with proper role design.
Build a CI/CD pipeline with CodePipeline that deploys to ECS with blue-green deployments.
Your AWS bill increased 40% last month. Identify waste and implement cost controls.
Questions
Design a multi-tier VPC architecture with public, private, and database subnets.
The Scenario
You’re designing the network architecture for a new application:
Requirements:
├── Web tier: Public-facing load balancer
├── App tier: Private EC2/ECS instances
├── Data tier: RDS and ElastiCache
├── Security: No direct internet access to app/data tiers
├── Compliance: All traffic logged
└── High availability: Multi-AZ deployment
The Challenge
Design a secure, scalable VPC architecture that properly isolates tiers, enables necessary connectivity, and follows AWS best practices.
Wrong Approach
A junior engineer might put everything in public subnets, use one security group for all resources, skip NAT Gateways to save costs, or use a single AZ. These approaches create security risks, violate least privilege, break high availability, and fail compliance audits.
Right Approach
A senior engineer designs with proper subnet tiers, security groups per resource type, NAT Gateways for outbound traffic, VPC Flow Logs for auditing, and multi-AZ deployment for high availability.
Step 1: VPC Architecture Overview
Multi-Tier VPC Architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│ VPC │
│ 10.0.0.0/16 │
│ │
│ Availability Zone A Availability Zone B │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ Public Subnet │ │ Public Subnet │ │
│ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │
│ │ ┌────────────────┐ │ │ ┌────────────────┐ │ │
│ │ │ NAT Gateway │ │ │ │ NAT Gateway │ │ │
│ │ │ ALB │ │ │ │ ALB │ │ │
│ │ └────────────────┘ │ │ └────────────────┘ │ │
│ └──────────────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ Private Subnet │ │ Private Subnet │ │
│ │ 10.0.11.0/24 │ │ 10.0.12.0/24 │ │
│ │ ┌────────────────┐ │ │ ┌────────────────┐ │ │
│ │ │ EC2/ECS Tasks │ │ │ │ EC2/ECS Tasks │ │ │
│ │ │ Lambda │ │ │ │ Lambda │ │ │
│ │ └────────────────┘ │ │ └────────────────┘ │ │
│ └──────────────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ Database Subnet │ │ Database Subnet │ │
│ │ 10.0.21.0/24 │ │ 10.0.22.0/24 │ │
│ │ ┌────────────────┐ │ │ ┌────────────────┐ │ │
│ │ │ RDS Primary │ │◄────────────►│ │ RDS Standby │ │ │
│ │ │ ElastiCache │ │ │ │ ElastiCache │ │ │
│ │ └────────────────┘ │ │ └────────────────┘ │ │
│ └──────────────────────┘ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘Step 2: VPC and Subnet Configuration
# VPC
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "production-vpc"
}
}
# Public Subnets (for ALB, NAT Gateway)
resource "aws_subnet" "public" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
map_public_ip_on_launch = true
tags = {
Name = "public-${data.aws_availability_zones.available.names[count.index]}"
Tier = "public"
}
}
# Private Subnets (for application tier)
resource "aws_subnet" "private" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 11}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-${data.aws_availability_zones.available.names[count.index]}"
Tier = "private"
}
}
# Database Subnets (isolated tier)
resource "aws_subnet" "database" {
count = 2
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 21}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "database-${data.aws_availability_zones.available.names[count.index]}"
Tier = "database"
}
}
# DB Subnet Group
resource "aws_db_subnet_group" "main" {
name = "main-db-subnet-group"
subnet_ids = aws_subnet.database[*].id
tags = {
Name = "Main DB Subnet Group"
}
}Step 3: Internet Gateway and NAT Gateways
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "main-igw"
}
}
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = 2
domain = "vpc"
tags = {
Name = "nat-eip-${count.index}"
}
}
# NAT Gateways (one per AZ for HA)
resource "aws_nat_gateway" "main" {
count = 2
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "nat-gw-${count.index}"
}
depends_on = [aws_internet_gateway.main]
}Step 4: Route Tables
# Public Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "public-rt"
}
}
resource "aws_route_table_association" "public" {
count = 2
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Private Route Tables (one per AZ for AZ-local NAT)
resource "aws_route_table" "private" {
count = 2
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "private-rt-${count.index}"
}
}
resource "aws_route_table_association" "private" {
count = 2
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# Database Route Table (no internet access)
resource "aws_route_table" "database" {
vpc_id = aws_vpc.main.id
# No default route - isolated from internet
tags = {
Name = "database-rt"
}
}
resource "aws_route_table_association" "database" {
count = 2
subnet_id = aws_subnet.database[count.index].id
route_table_id = aws_route_table.database.id
}Step 5: Security Groups
# ALB Security Group
resource "aws_security_group" "alb" {
name = "alb-sg"
description = "Security group for Application Load Balancer"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP for redirect"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "alb-sg"
}
}
# Application Security Group
resource "aws_security_group" "app" {
name = "app-sg"
description = "Security group for application tier"
vpc_id = aws_vpc.main.id
ingress {
description = "Traffic from ALB"
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "app-sg"
}
}
# Database Security Group
resource "aws_security_group" "database" {
name = "database-sg"
description = "Security group for database tier"
vpc_id = aws_vpc.main.id
ingress {
description = "PostgreSQL from app tier"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
# No egress rule needed for RDS (managed by AWS)
tags = {
Name = "database-sg"
}
}
# ElastiCache Security Group
resource "aws_security_group" "cache" {
name = "cache-sg"
description = "Security group for ElastiCache"
vpc_id = aws_vpc.main.id
ingress {
description = "Redis from app tier"
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
tags = {
Name = "cache-sg"
}
}Step 6: VPC Flow Logs
# CloudWatch Log Group for Flow Logs
resource "aws_cloudwatch_log_group" "flow_logs" {
name = "/aws/vpc/flow-logs"
retention_in_days = 30
}
# IAM Role for Flow Logs
resource "aws_iam_role" "flow_logs" {
name = "vpc-flow-logs-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "vpc-flow-logs.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "flow_logs" {
name = "vpc-flow-logs-policy"
role = aws_iam_role.flow_logs.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
]
Resource = "*"
}]
})
}
# VPC Flow Logs
resource "aws_flow_log" "main" {
vpc_id = aws_vpc.main.id
traffic_type = "ALL"
log_destination_type = "cloud-watch-logs"
log_destination = aws_cloudwatch_log_group.flow_logs.arn
iam_role_arn = aws_iam_role.flow_logs.arn
max_aggregation_interval = 60
tags = {
Name = "main-vpc-flow-logs"
}
}Step 7: VPC Endpoints for AWS Services
# Gateway Endpoints (free)
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = concat(
aws_route_table.private[*].id,
[aws_route_table.database.id]
)
tags = {
Name = "s3-endpoint"
}
}
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.dynamodb"
vpc_endpoint_type = "Gateway"
route_table_ids = aws_route_table.private[*].id
tags = {
Name = "dynamodb-endpoint"
}
}
# Interface Endpoints (for services that need them)
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "ecr-api-endpoint"
}
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "ecr-dkr-endpoint"
}
}
resource "aws_vpc_endpoint" "secretsmanager" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.secretsmanager"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = {
Name = "secretsmanager-endpoint"
}
}
# Security Group for VPC Endpoints
resource "aws_security_group" "vpc_endpoints" {
name = "vpc-endpoints-sg"
description = "Security group for VPC endpoints"
vpc_id = aws_vpc.main.id
ingress {
description = "HTTPS from VPC"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [aws_vpc.main.cidr_block]
}
tags = {
Name = "vpc-endpoints-sg"
}
}Step 8: Network ACLs (Defense in Depth)
# Database NACL - additional layer of security
resource "aws_network_acl" "database" {
vpc_id = aws_vpc.main.id
subnet_ids = aws_subnet.database[*].id
# Allow inbound PostgreSQL from private subnets only
ingress {
protocol = "tcp"
rule_no = 100
action = "allow"
cidr_block = "10.0.11.0/24"
from_port = 5432
to_port = 5432
}
ingress {
protocol = "tcp"
rule_no = 101
action = "allow"
cidr_block = "10.0.12.0/24"
from_port = 5432
to_port = 5432
}
# Allow ephemeral ports for return traffic
ingress {
protocol = "tcp"
rule_no = 200
action = "allow"
cidr_block = "10.0.0.0/16"
from_port = 1024
to_port = 65535
}
# Allow outbound to private subnets
egress {
protocol = "tcp"
rule_no = 100
action = "allow"
cidr_block = "10.0.11.0/24"
from_port = 1024
to_port = 65535
}
egress {
protocol = "tcp"
rule_no = 101
action = "allow"
cidr_block = "10.0.12.0/24"
from_port = 1024
to_port = 65535
}
tags = {
Name = "database-nacl"
}
} VPC Design Best Practices
| Component | Recommendation | Purpose |
|---|---|---|
| CIDR Block | /16 for VPC, /24 for subnets | Room for growth |
| AZs | Minimum 2, prefer 3 | High availability |
| NAT Gateway | One per AZ | AZ-independent failover |
| Security Groups | Per resource type | Least privilege |
| NACLs | Database tier | Defense in depth |
| Flow Logs | All traffic | Compliance and debugging |
| VPC Endpoints | S3, DynamoDB, ECR | Reduce NAT costs |
Practice Question
Why should you deploy NAT Gateways in multiple Availability Zones?