Interviews / Cloud & DevOps / DynamoDB is throttling requests and costs are high. Optimize the table design.
Lambda functions are timing out when accessing RDS in a VPC. Debug the connectivity issue.
Design a multi-tier VPC architecture with public, private, and database subnets.
DynamoDB is throttling requests and costs are high. Optimize the table design.
RDS connections are exhausted and failover takes too long. Fix the database setup.
Implement S3 with CloudFront for secure, cached content delivery with signed URLs.
ECS tasks are failing with exit code 137 and health check failures. Debug the container issues.
Messages are being lost and processed multiple times. Implement reliable SQS/SNS messaging.
Design a scalable API Gateway with throttling, caching, and Lambda integration.
Production incidents take hours to detect. Implement CloudWatch alarms and dashboards.
IAM policies are too permissive. Implement least privilege access with proper role design.
Build a CI/CD pipeline with CodePipeline that deploys to ECS with blue-green deployments.
Your AWS bill increased 40% last month. Identify waste and implement cost controls.
Questions
DynamoDB is throttling requests and costs are high. Optimize the table design.
The Scenario
Your DynamoDB table is having issues:
Current Problems:
├── ProvisionedThroughputExceededException: 500+ per hour
├── Monthly cost: $2,400 (10x expected)
├── Hot partition detected: "USER#premium"
├── Table size: 50GB
├── Read capacity: 5,000 RCU (provisioned)
├── Write capacity: 1,000 WCU (provisioned)
└── GSI count: 5 (all with same partition key)
The Challenge
Optimize the DynamoDB table design to eliminate throttling, reduce costs, and handle traffic spikes without over-provisioning.
Wrong Approach
A junior engineer might just increase provisioned capacity, enable auto-scaling with high maximums, or add more GSIs. These approaches increase costs without fixing the root cause, don't solve hot partition issues, and GSIs duplicate storage costs.
Right Approach
A senior engineer analyzes access patterns, fixes partition key design for even distribution, uses write sharding for hot keys, implements efficient GSIs, and chooses the right capacity mode based on traffic patterns.
Step 1: Analyze Current Access Patterns
# Check table metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name ConsumedReadCapacityUnits \
--dimensions Name=TableName,Value=orders \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-02T00:00:00Z \
--period 3600 \
--statistics Sum
# Check for throttling
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name ThrottledRequests \
--dimensions Name=TableName,Value=orders \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-02T00:00:00Z \
--period 300 \
--statistics Sum
# Describe table
aws dynamodb describe-table --table-name orders \
--query '{PartitionKey: Table.KeySchema, ItemCount: Table.ItemCount, Size: Table.TableSizeBytes}'Step 2: Fix Hot Partition with Write Sharding
# BEFORE: Hot partition key
# PK: "USER#premium" receives 80% of writes
# AFTER: Write sharding
import random
import boto3
from boto3.dynamodb.conditions import Key
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('orders')
SHARD_COUNT = 10
def get_sharded_pk(user_type: str) -> str:
"""Add random shard suffix to distribute writes."""
shard = random.randint(0, SHARD_COUNT - 1)
return f"{user_type}#{shard}"
def put_order(order: dict):
"""Write order with sharded partition key."""
item = {
'PK': get_sharded_pk(order['user_type']),
'SK': f"ORDER#{order['order_id']}",
'user_id': order['user_id'],
'amount': order['amount'],
'created_at': order['created_at'],
# Store original PK for queries
'user_type': order['user_type']
}
table.put_item(Item=item)
def query_orders_by_type(user_type: str) -> list:
"""Query all shards in parallel."""
import concurrent.futures
def query_shard(shard: int):
response = table.query(
KeyConditionExpression=Key('PK').eq(f"{user_type}#{shard}")
)
return response['Items']
all_items = []
with concurrent.futures.ThreadPoolExecutor(max_workers=SHARD_COUNT) as executor:
futures = [executor.submit(query_shard, i) for i in range(SHARD_COUNT)]
for future in concurrent.futures.as_completed(futures):
all_items.extend(future.result())
return all_itemsStep 3: Optimized Table Design
resource "aws_dynamodb_table" "orders" {
name = "orders"
billing_mode = "PAY_PER_REQUEST" # On-demand for variable traffic
hash_key = "PK"
range_key = "SK"
attribute {
name = "PK"
type = "S"
}
attribute {
name = "SK"
type = "S"
}
attribute {
name = "GSI1PK"
type = "S"
}
attribute {
name = "GSI1SK"
type = "S"
}
# Single GSI with overloaded keys (instead of 5 GSIs)
global_secondary_index {
name = "GSI1"
hash_key = "GSI1PK"
range_key = "GSI1SK"
projection_type = "ALL"
}
# TTL for automatic cleanup
ttl {
attribute_name = "expires_at"
enabled = true
}
# Point-in-time recovery
point_in_time_recovery {
enabled = true
}
tags = {
Environment = "production"
}
}Step 4: Single Table Design Pattern
"""
Single Table Design - Multiple entity types in one table
Entity Types:
- USER: User profile data
- ORDER: Order records
- PRODUCT: Product catalog
Access Patterns:
1. Get user by ID
2. Get all orders for a user
3. Get order by ID
4. Get orders by status (GSI)
5. Get products by category (GSI)
"""
# Item structure examples
items = [
# User entity
{
'PK': 'USER#user123',
'SK': 'PROFILE',
'entity_type': 'USER',
'email': 'user@example.com',
'name': 'John Doe',
'GSI1PK': 'USER', # For listing all users
'GSI1SK': 'user123'
},
# Order entity
{
'PK': 'USER#user123',
'SK': 'ORDER#2024-01-15#order456',
'entity_type': 'ORDER',
'order_id': 'order456',
'amount': 99.99,
'status': 'SHIPPED',
'GSI1PK': 'ORDER#SHIPPED', # For querying by status
'GSI1SK': '2024-01-15#order456'
},
# Product entity
{
'PK': 'PRODUCT#prod789',
'SK': 'DETAILS',
'entity_type': 'PRODUCT',
'name': 'Widget',
'price': 29.99,
'GSI1PK': 'CATEGORY#electronics', # For querying by category
'GSI1SK': 'PRODUCT#prod789'
}
]
# Access pattern implementations
def get_user(user_id: str):
"""Get user profile."""
response = table.get_item(
Key={'PK': f'USER#{user_id}', 'SK': 'PROFILE'}
)
return response.get('Item')
def get_user_orders(user_id: str, limit: int = 20):
"""Get all orders for a user, sorted by date."""
response = table.query(
KeyConditionExpression=Key('PK').eq(f'USER#{user_id}') & Key('SK').begins_with('ORDER#'),
ScanIndexForward=False, # Descending order
Limit=limit
)
return response['Items']
def get_orders_by_status(status: str, limit: int = 100):
"""Get orders by status using GSI."""
response = table.query(
IndexName='GSI1',
KeyConditionExpression=Key('GSI1PK').eq(f'ORDER#{status}'),
Limit=limit
)
return response['Items']
def get_products_by_category(category: str):
"""Get products by category using GSI."""
response = table.query(
IndexName='GSI1',
KeyConditionExpression=Key('GSI1PK').eq(f'CATEGORY#{category}')
)
return response['Items']Step 5: Batch Operations for Efficiency
from boto3.dynamodb.types import TypeSerializer
import boto3
dynamodb = boto3.client('dynamodb')
serializer = TypeSerializer()
def batch_write_orders(orders: list):
"""Batch write up to 25 items at a time."""
# DynamoDB limit: 25 items per batch
batch_size = 25
for i in range(0, len(orders), batch_size):
batch = orders[i:i + batch_size]
request_items = {
'orders': [
{
'PutRequest': {
'Item': {k: serializer.serialize(v) for k, v in order.items()}
}
}
for order in batch
]
}
response = dynamodb.batch_write_item(RequestItems=request_items)
# Handle unprocessed items (throttling)
unprocessed = response.get('UnprocessedItems', {})
while unprocessed:
import time
time.sleep(0.1) # Exponential backoff in production
response = dynamodb.batch_write_item(RequestItems=unprocessed)
unprocessed = response.get('UnprocessedItems', {})
def batch_get_orders(order_keys: list):
"""Batch get up to 100 items at a time."""
batch_size = 100
all_items = []
for i in range(0, len(order_keys), batch_size):
batch = order_keys[i:i + batch_size]
request_items = {
'orders': {
'Keys': [
{
'PK': {'S': key['PK']},
'SK': {'S': key['SK']}
}
for key in batch
]
}
}
response = dynamodb.batch_get_item(RequestItems=request_items)
all_items.extend(response['Responses']['orders'])
return all_itemsStep 6: Cost Optimization Strategies
# On-demand for unpredictable traffic
resource "aws_dynamodb_table" "orders_ondemand" {
name = "orders"
billing_mode = "PAY_PER_REQUEST"
# Automatically scales
# Pay only for what you use
# Best for: variable traffic, new applications
}
# Provisioned with auto-scaling for predictable traffic
resource "aws_dynamodb_table" "orders_provisioned" {
name = "orders"
billing_mode = "PROVISIONED"
read_capacity = 100
write_capacity = 50
}
resource "aws_appautoscaling_target" "read" {
max_capacity = 1000
min_capacity = 100
resource_id = "table/${aws_dynamodb_table.orders_provisioned.name}"
scalable_dimension = "dynamodb:table:ReadCapacityUnits"
service_namespace = "dynamodb"
}
resource "aws_appautoscaling_policy" "read" {
name = "DynamoDBReadAutoScaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.read.resource_id
scalable_dimension = aws_appautoscaling_target.read.scalable_dimension
service_namespace = aws_appautoscaling_target.read.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "DynamoDBReadCapacityUtilization"
}
target_value = 70.0
scale_in_cooldown = 60
scale_out_cooldown = 60
}
}
# Reserved capacity for consistent workloads (save up to 76%)
# Must be purchased through AWS Console or CLIStep 7: Query Optimization
# Use ProjectionExpression to reduce read costs
def get_order_summary(user_id: str, order_id: str):
"""Get only needed attributes."""
response = table.get_item(
Key={'PK': f'USER#{user_id}', 'SK': f'ORDER#{order_id}'},
ProjectionExpression='order_id, amount, #s, created_at',
ExpressionAttributeNames={'#s': 'status'} # 'status' is reserved
)
return response.get('Item')
# Use FilterExpression sparingly (still consumes full read capacity)
def get_recent_high_value_orders(user_id: str, min_amount: float):
"""Filter after query - use with caution."""
response = table.query(
KeyConditionExpression=Key('PK').eq(f'USER#{user_id}') & Key('SK').begins_with('ORDER#'),
FilterExpression='amount >= :min_amount',
ExpressionAttributeValues={':min_amount': min_amount}
)
return response['Items']
# Better: Use GSI with amount in sort key
def get_high_value_orders_optimized(min_amount: float):
"""Query GSI where amount is in the key."""
response = table.query(
IndexName='GSI-ByAmount',
KeyConditionExpression=Key('entity_type').eq('ORDER') & Key('amount').gte(min_amount)
)
return response['Items'] DynamoDB Cost Optimization Summary
| Strategy | Savings | When to Use |
|---|---|---|
| On-demand billing | Variable | Unpredictable traffic |
| Auto-scaling | 30-50% | Predictable patterns |
| Reserved capacity | Up to 76% | Consistent baseline |
| Single table design | 50%+ storage | Multiple entity types |
| ProjectionExpression | Per-query | Large items, few attributes needed |
| TTL | Storage costs | Temporary data |
Practice Question
Why does a hot partition cause throttling even when you have unused capacity in other partitions?