DeployU
Interviews / Cloud & DevOps / Your team argues about workspaces vs directories for dev/staging/prod. Design the right approach.

Your team argues about workspaces vs directories for dev/staging/prod. Design the right approach.

architecture Environment Strategy Interactive Quiz Code Examples

The Scenario

Your team has three environments (dev, staging, prod) and is debating how to structure Terraform:

Engineer A: “Let’s use Terraform workspaces! One codebase, switch environments with terraform workspace select prod

Engineer B: “No, workspaces are dangerous! Let’s use separate directories for each environment.”

Current structure being debated:

# Option A: Workspaces
terraform/
├── main.tf
├── variables.tf
└── terraform.tfvars  # Which environment's values?

# Option B: Directories
terraform/
├── dev/
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
│   └── ...
└── prod/
    └── ...

The Challenge

Design an environment strategy that balances code reuse, safety, and operational clarity. Consider the pros/cons of each approach and when each is appropriate.

Wrong Approach

A junior engineer might pick one approach dogmatically without understanding tradeoffs, use workspaces for completely different infrastructure between environments, or duplicate everything in directories losing DRY benefits. They might also mix workspace selection with manual tfvars files, leading to dangerous mistakes.

Right Approach

A senior engineer understands that workspaces are best for identical infrastructure with different scale (same resources, different sizes), while directories are better when environments have structural differences. The ideal solution often combines both: shared modules with environment-specific root configurations.

Understanding Workspaces

# Workspaces create separate state files
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

# State files:
# s3://bucket/env:/dev/terraform.tfstate
# s3://bucket/env:/staging/terraform.tfstate
# s3://bucket/env:/prod/terraform.tfstate

# Switch environments
terraform workspace select prod
terraform apply  # Applies to prod state

Workspace-aware configuration:

# main.tf
locals {
  env_config = {
    dev = {
      instance_type = "t3.micro"
      instance_count = 1
      multi_az = false
    }
    staging = {
      instance_type = "t3.small"
      instance_count = 2
      multi_az = false
    }
    prod = {
      instance_type = "t3.large"
      instance_count = 3
      multi_az = true
    }
  }

  config = local.env_config[terraform.workspace]
}

resource "aws_instance" "web" {
  count         = local.config.instance_count
  instance_type = local.config.instance_type
  # ...
}

When Workspaces Work Well

Good for:

  • Same infrastructure, different scale
  • Temporary environments (feature branches)
  • Simple projects with identical structure
  • Single engineer/small team
# Feature branch environments
# terraform workspace new feature-123
# terraform workspace delete feature-123

resource "aws_instance" "web" {
  count = terraform.workspace == "prod" ? 3 : 1

  tags = {
    Environment = terraform.workspace
  }
}

When Workspaces Fail

Bad for:

  • Environments with different resources
  • Different AWS accounts per environment
  • Different providers/regions
  • Large teams (workspace confusion risk)
# DANGER: Easy to forget which workspace is selected
$ terraform workspace select prod
# ... go to meeting ...
# ... come back ...
$ terraform destroy  # Oops, just destroyed prod!

The Directory Approach

terraform/
├── modules/              # Shared modules
│   ├── vpc/
│   ├── ecs-cluster/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf      # Uses modules
│   │   ├── backend.tf   # Dev state location
│   │   └── terraform.tfvars
│   ├── staging/
│   │   ├── main.tf
│   │   ├── backend.tf
│   │   └── terraform.tfvars
│   └── prod/
│       ├── main.tf
│       ├── backend.tf   # Prod state location (different bucket!)
│       └── terraform.tfvars

Environment-specific configuration:

# environments/prod/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block         = "10.0.0.0/16"
  enable_nat_gateway = true
  single_nat_gateway = false  # HA for prod
}

module "ecs" {
  source = "../../modules/ecs-cluster"

  cluster_name   = "prod-cluster"
  instance_type  = "m5.xlarge"
  min_size       = 3
  max_size       = 10
}

# Prod-only resources
module "disaster_recovery" {
  source = "../../modules/dr"
  # Only prod has DR setup
}
# environments/dev/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block         = "10.1.0.0/16"
  enable_nat_gateway = true
  single_nat_gateway = true  # Cost savings for dev
}

module "ecs" {
  source = "../../modules/ecs-cluster"

  cluster_name   = "dev-cluster"
  instance_type  = "t3.medium"
  min_size       = 1
  max_size       = 2
}

# No DR module in dev
infrastructure/
├── modules/                    # Versioned modules
│   └── ...
├── terragrunt.hcl             # Root config
└── live/
    ├── dev/
    │   ├── us-east-1/
    │   │   ├── vpc/
    │   │   │   └── terragrunt.hcl
    │   │   └── ecs/
    │   │       └── terragrunt.hcl
    │   └── account.hcl
    ├── staging/
    │   └── ...
    └── prod/
        ├── us-east-1/
        │   └── ...
        ├── us-west-2/          # Prod has multi-region
        │   └── ...
        └── account.hcl
# live/terragrunt.hcl (root)
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config = {
    bucket         = "terraform-state-${local.account_id}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}
# live/prod/us-east-1/vpc/terragrunt.hcl
include "root" {
  path = find_in_parent_folders()
}

include "env" {
  path = find_in_parent_folders("account.hcl")
}

terraform {
  source = "../../../../modules//vpc"
}

inputs = {
  cidr_block         = "10.0.0.0/16"
  enable_nat_gateway = true
  single_nat_gateway = false
}

Safety Mechanisms

1. Different AWS accounts per environment:

# environments/prod/backend.tf
terraform {
  backend "s3" {
    bucket = "prod-terraform-state"  # In prod account
    key    = "infrastructure/terraform.tfstate"
    region = "us-east-1"
  }
}

# environments/prod/provider.tf
provider "aws" {
  region = "us-east-1"

  # Fail if wrong account
  assume_role {
    role_arn = "arn:aws:iam::PROD_ACCOUNT_ID:role/TerraformRole"
  }
}

2. Account validation:

data "aws_caller_identity" "current" {}

locals {
  expected_account = "123456789012"  # Prod account ID
}

resource "null_resource" "account_check" {
  count = data.aws_caller_identity.current.account_id != local.expected_account ? "ERROR: Wrong AWS account!" : 0
}

3. CI/CD environment isolation:

# .github/workflows/terraform.yml
jobs:
  plan-dev:
    environment: dev
    env:
      AWS_ROLE_ARN: ${{ secrets.DEV_AWS_ROLE }}
    steps:
      - run: cd environments/dev && terraform plan

  plan-prod:
    environment: production
    env:
      AWS_ROLE_ARN: ${{ secrets.PROD_AWS_ROLE }}
    steps:
      - run: cd environments/prod && terraform plan

Decision Matrix

FactorWorkspacesDirectoriesTerragrunt
Code duplicationNoneSomeMinimal
Environment isolationLowHighHigh
Accidental wrong envHigh riskLow riskLow risk
Different resources per envHardEasyEasy
Different accountsTrickyEasyEasy
Learning curveLowLowMedium
CI/CD complexityMediumLowMedium

Practice Question

Why are Terraform workspaces considered risky for managing production vs development environments?