DeployU
Interviews / Cloud & DevOps / Design a Terraform architecture for managing 50+ AWS accounts with proper isolation and DRY principles.

Design a Terraform architecture for managing 50+ AWS accounts with proper isolation and DRY principles.

architecture Multi-Account Strategy Interactive Quiz Code Examples

The Scenario

Your company is scaling from 5 AWS accounts to 50+ using AWS Organizations. Current setup:

terraform/
├── account-dev/
│   ├── main.tf      # 500 lines, mostly copy-pasted
│   ├── vpc.tf
│   └── iam.tf
├── account-staging/
│   ├── main.tf      # Same 500 lines with different values
│   ├── vpc.tf
│   └── iam.tf
└── account-prod/
    ├── main.tf      # Same again
    ├── vpc.tf
    └── iam.tf

Problems:

  • Each account is managed separately with duplicated code
  • No centralized governance or compliance
  • Engineers must manually assume roles in each account
  • State files scattered across S3 buckets

The Challenge

Design a scalable multi-account architecture that centralizes management, enforces compliance, and allows teams to self-serve while maintaining security boundaries.

Wrong Approach

A junior engineer might create a single Terraform workspace with all 50 accounts, use a single state file for everything, or hardcode account IDs throughout. This leads to massive blast radius, slow plans, and security risks where one mistake affects all accounts.

Right Approach

A senior engineer designs a hub-and-spoke model with account vending automation, separate state per account/component, centralized modules with account-specific configurations, and proper IAM role assumption chains.

Architecture Overview

terraform-infrastructure/
├── modules/                      # Shared, versioned modules
│   ├── account-baseline/        # SCPs, CloudTrail, Config, GuardDuty
│   ├── vpc/
│   ├── iam-roles/
│   └── security-baseline/
├── live/
│   ├── organization/            # AWS Organizations, SCPs
│   │   ├── main.tf
│   │   └── accounts.tf
│   ├── shared-services/         # Central logging, networking
│   │   └── main.tf
│   └── workloads/              # Account-specific configs
│       ├── dev/
│       │   ├── account-1/
│       │   └── account-2/
│       ├── staging/
│       └── prod/
├── terragrunt.hcl              # Root config
└── account-vending/            # New account automation

Provider Configuration for Multi-Account

# providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Default provider (management account)
provider "aws" {
  region = "us-east-1"
}

# Assume role into target account
provider "aws" {
  alias  = "target"
  region = var.aws_region

  assume_role {
    role_arn     = "arn:aws:iam::${var.target_account_id}:role/TerraformExecutionRole"
    session_name = "terraform-${var.environment}"
    external_id  = var.external_id  # Optional additional security
  }
}

Account Vending Machine

# account-vending/main.tf
# Automated new account creation with baseline configuration

resource "aws_organizations_account" "workload" {
  for_each = var.accounts

  name      = each.value.name
  email     = each.value.email
  parent_id = each.value.ou_id

  role_name = "OrganizationAccountAccessRole"

  lifecycle {
    ignore_changes = [role_name]
  }

  tags = {
    Environment = each.value.environment
    Team        = each.value.team
    CostCenter  = each.value.cost_center
  }
}

# Apply baseline to each new account
module "account_baseline" {
  source   = "../modules/account-baseline"
  for_each = aws_organizations_account.workload

  providers = {
    aws = aws.target
  }

  account_id   = each.value.id
  account_name = each.value.name
  environment  = var.accounts[each.key].environment

  # Baseline components
  enable_cloudtrail    = true
  enable_config        = true
  enable_guardduty     = true
  enable_security_hub  = true

  # Central logging account
  logging_account_id   = var.logging_account_id
  logging_bucket_name  = var.central_logging_bucket
}

Terragrunt for DRY Configuration

# terragrunt.hcl (root)
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config = {
    bucket         = "company-terraform-state-${get_aws_account_id()}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite"
  contents  = <<EOF
provider "aws" {
  region = "${local.aws_region}"

  assume_role {
    role_arn = "arn:aws:iam::${local.account_id}:role/TerraformExecutionRole"
  }

  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = "${local.environment}"
      Repository  = "terraform-infrastructure"
    }
  }
}
EOF
}
# live/workloads/prod/account-1/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

locals {
  account_id  = "111111111111"
  environment = "prod"
  aws_region  = "us-east-1"
}

terraform {
  source = "../../../../modules//vpc"
}

inputs = {
  name               = "prod-vpc"
  cidr_block         = "10.1.0.0/16"
  enable_nat_gateway = true
  single_nat_gateway = false  # HA for prod
}

Cross-Account IAM Role Chain

# modules/iam-roles/main.tf
# Create execution role in each account

resource "aws_iam_role" "terraform_execution" {
  name = "TerraformExecutionRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = [
            "arn:aws:iam::${var.management_account_id}:role/TerraformCIRole",
            "arn:aws:iam::${var.management_account_id}:root"
          ]
        }
        Action = "sts:AssumeRole"
        Condition = {
          StringEquals = {
            "sts:ExternalId" = var.external_id
          }
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "terraform_admin" {
  role       = aws_iam_role.terraform_execution.name
  policy_arn = "arn:aws:iam::aws:policy/AdministratorAccess"
}

# More restrictive policy for non-prod
resource "aws_iam_role_policy" "terraform_restrictions" {
  count = var.environment != "prod" ? 1 : 0
  name  = "TerraformRestrictions"
  role  = aws_iam_role.terraform_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Deny"
        Action = [
          "organizations:*",
          "account:*"
        ]
        Resource = "*"
      }
    ]
  })
}

Centralized State Management

# One state bucket per account, centralized lock table
# State structure:
# s3://company-terraform-state-{account_id}/
#   └── {component}/terraform.tfstate

# Alternative: Single bucket with account prefix
# s3://company-terraform-state/
#   └── accounts/{account_id}/{component}/terraform.tfstate

resource "aws_s3_bucket" "terraform_state" {
  bucket = "company-terraform-state-${data.aws_caller_identity.current.account_id}"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.terraform_state.arn
    }
  }
}

CI/CD Pipeline for Multi-Account

# .github/workflows/terraform.yml
name: Terraform Multi-Account

on:
  pull_request:
    paths:
      - 'live/**'
  push:
    branches: [main]
    paths:
      - 'live/**'

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      accounts: ${{ steps.changes.outputs.accounts }}
    steps:
      - uses: actions/checkout@v4
      - id: changes
        run: |
          # Detect which account directories changed
          ACCOUNTS=$(git diff --name-only origin/main | grep "^live/workloads" | cut -d'/' -f3-4 | sort -u | jq -R -s -c 'split("\n")[:-1]')
          echo "accounts=$ACCOUNTS" >> $GITHUB_OUTPUT

  plan:
    needs: detect-changes
    runs-on: ubuntu-latest
    strategy:
      matrix:
        account: ${{ fromJson(needs.detect-changes.outputs.accounts) }}
      fail-fast: false
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ secrets.MGMT_ACCOUNT_ID }}:role/GitHubActionsRole
          aws-region: us-east-1

      - name: Terragrunt Plan
        run: |
          cd live/workloads/${{ matrix.account }}
          terragrunt plan -out=plan.tfplan

  apply:
    if: github.ref == 'refs/heads/main'
    needs: [detect-changes, plan]
    runs-on: ubuntu-latest
    strategy:
      matrix:
        account: ${{ fromJson(needs.detect-changes.outputs.accounts) }}
      max-parallel: 5  # Limit concurrent applies
    environment: ${{ matrix.account }}  # Requires approval
    steps:
      - uses: actions/checkout@v4
      - name: Terragrunt Apply
        run: |
          cd live/workloads/${{ matrix.account }}
          terragrunt apply -auto-approve

Account Organization Structure

Root
├── Security OU
│   ├── Log Archive (centralized logging)
│   └── Security Tooling (GuardDuty, Security Hub)
├── Infrastructure OU
│   ├── Shared Services (Transit Gateway, DNS)
│   └── Network (centralized networking)
├── Workloads OU
│   ├── Production OU
│   │   ├── prod-account-1
│   │   └── prod-account-2
│   ├── Staging OU
│   └── Development OU
└── Sandbox OU (isolated experimentation)

Practice Question

Why should each AWS account have its own Terraform state file instead of one global state?