Questions
Design a Terraform architecture for managing 50+ AWS accounts with proper isolation and DRY principles.
The Scenario
Your company is scaling from 5 AWS accounts to 50+ using AWS Organizations. Current setup:
terraform/
├── account-dev/
│ ├── main.tf # 500 lines, mostly copy-pasted
│ ├── vpc.tf
│ └── iam.tf
├── account-staging/
│ ├── main.tf # Same 500 lines with different values
│ ├── vpc.tf
│ └── iam.tf
└── account-prod/
├── main.tf # Same again
├── vpc.tf
└── iam.tf
Problems:
- Each account is managed separately with duplicated code
- No centralized governance or compliance
- Engineers must manually assume roles in each account
- State files scattered across S3 buckets
The Challenge
Design a scalable multi-account architecture that centralizes management, enforces compliance, and allows teams to self-serve while maintaining security boundaries.
A junior engineer might create a single Terraform workspace with all 50 accounts, use a single state file for everything, or hardcode account IDs throughout. This leads to massive blast radius, slow plans, and security risks where one mistake affects all accounts.
A senior engineer designs a hub-and-spoke model with account vending automation, separate state per account/component, centralized modules with account-specific configurations, and proper IAM role assumption chains.
Architecture Overview
terraform-infrastructure/
├── modules/ # Shared, versioned modules
│ ├── account-baseline/ # SCPs, CloudTrail, Config, GuardDuty
│ ├── vpc/
│ ├── iam-roles/
│ └── security-baseline/
├── live/
│ ├── organization/ # AWS Organizations, SCPs
│ │ ├── main.tf
│ │ └── accounts.tf
│ ├── shared-services/ # Central logging, networking
│ │ └── main.tf
│ └── workloads/ # Account-specific configs
│ ├── dev/
│ │ ├── account-1/
│ │ └── account-2/
│ ├── staging/
│ └── prod/
├── terragrunt.hcl # Root config
└── account-vending/ # New account automationProvider Configuration for Multi-Account
# providers.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Default provider (management account)
provider "aws" {
region = "us-east-1"
}
# Assume role into target account
provider "aws" {
alias = "target"
region = var.aws_region
assume_role {
role_arn = "arn:aws:iam::${var.target_account_id}:role/TerraformExecutionRole"
session_name = "terraform-${var.environment}"
external_id = var.external_id # Optional additional security
}
}Account Vending Machine
# account-vending/main.tf
# Automated new account creation with baseline configuration
resource "aws_organizations_account" "workload" {
for_each = var.accounts
name = each.value.name
email = each.value.email
parent_id = each.value.ou_id
role_name = "OrganizationAccountAccessRole"
lifecycle {
ignore_changes = [role_name]
}
tags = {
Environment = each.value.environment
Team = each.value.team
CostCenter = each.value.cost_center
}
}
# Apply baseline to each new account
module "account_baseline" {
source = "../modules/account-baseline"
for_each = aws_organizations_account.workload
providers = {
aws = aws.target
}
account_id = each.value.id
account_name = each.value.name
environment = var.accounts[each.key].environment
# Baseline components
enable_cloudtrail = true
enable_config = true
enable_guardduty = true
enable_security_hub = true
# Central logging account
logging_account_id = var.logging_account_id
logging_bucket_name = var.central_logging_bucket
}Terragrunt for DRY Configuration
# terragrunt.hcl (root)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite"
}
config = {
bucket = "company-terraform-state-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = <<EOF
provider "aws" {
region = "${local.aws_region}"
assume_role {
role_arn = "arn:aws:iam::${local.account_id}:role/TerraformExecutionRole"
}
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = "${local.environment}"
Repository = "terraform-infrastructure"
}
}
}
EOF
}# live/workloads/prod/account-1/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
locals {
account_id = "111111111111"
environment = "prod"
aws_region = "us-east-1"
}
terraform {
source = "../../../../modules//vpc"
}
inputs = {
name = "prod-vpc"
cidr_block = "10.1.0.0/16"
enable_nat_gateway = true
single_nat_gateway = false # HA for prod
}Cross-Account IAM Role Chain
# modules/iam-roles/main.tf
# Create execution role in each account
resource "aws_iam_role" "terraform_execution" {
name = "TerraformExecutionRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
AWS = [
"arn:aws:iam::${var.management_account_id}:role/TerraformCIRole",
"arn:aws:iam::${var.management_account_id}:root"
]
}
Action = "sts:AssumeRole"
Condition = {
StringEquals = {
"sts:ExternalId" = var.external_id
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "terraform_admin" {
role = aws_iam_role.terraform_execution.name
policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
}
# More restrictive policy for non-prod
resource "aws_iam_role_policy" "terraform_restrictions" {
count = var.environment != "prod" ? 1 : 0
name = "TerraformRestrictions"
role = aws_iam_role.terraform_execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Deny"
Action = [
"organizations:*",
"account:*"
]
Resource = "*"
}
]
})
}Centralized State Management
# One state bucket per account, centralized lock table
# State structure:
# s3://company-terraform-state-{account_id}/
# └── {component}/terraform.tfstate
# Alternative: Single bucket with account prefix
# s3://company-terraform-state/
# └── accounts/{account_id}/{component}/terraform.tfstate
resource "aws_s3_bucket" "terraform_state" {
bucket = "company-terraform-state-${data.aws_caller_identity.current.account_id}"
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}CI/CD Pipeline for Multi-Account
# .github/workflows/terraform.yml
name: Terraform Multi-Account
on:
pull_request:
paths:
- 'live/**'
push:
branches: [main]
paths:
- 'live/**'
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
accounts: ${{ steps.changes.outputs.accounts }}
steps:
- uses: actions/checkout@v4
- id: changes
run: |
# Detect which account directories changed
ACCOUNTS=$(git diff --name-only origin/main | grep "^live/workloads" | cut -d'/' -f3-4 | sort -u | jq -R -s -c 'split("\n")[:-1]')
echo "accounts=$ACCOUNTS" >> $GITHUB_OUTPUT
plan:
needs: detect-changes
runs-on: ubuntu-latest
strategy:
matrix:
account: ${{ fromJson(needs.detect-changes.outputs.accounts) }}
fail-fast: false
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.MGMT_ACCOUNT_ID }}:role/GitHubActionsRole
aws-region: us-east-1
- name: Terragrunt Plan
run: |
cd live/workloads/${{ matrix.account }}
terragrunt plan -out=plan.tfplan
apply:
if: github.ref == 'refs/heads/main'
needs: [detect-changes, plan]
runs-on: ubuntu-latest
strategy:
matrix:
account: ${{ fromJson(needs.detect-changes.outputs.accounts) }}
max-parallel: 5 # Limit concurrent applies
environment: ${{ matrix.account }} # Requires approval
steps:
- uses: actions/checkout@v4
- name: Terragrunt Apply
run: |
cd live/workloads/${{ matrix.account }}
terragrunt apply -auto-approve Account Organization Structure
Root
├── Security OU
│ ├── Log Archive (centralized logging)
│ └── Security Tooling (GuardDuty, Security Hub)
├── Infrastructure OU
│ ├── Shared Services (Transit Gateway, DNS)
│ └── Network (centralized networking)
├── Workloads OU
│ ├── Production OU
│ │ ├── prod-account-1
│ │ └── prod-account-2
│ ├── Staging OU
│ └── Development OU
└── Sandbox OU (isolated experimentation)
Practice Question
Why should each AWS account have its own Terraform state file instead of one global state?