Your Terraform modules have no tests. Implement a comprehensive testing strategy.

Q: Your Terraform modules have no tests. Implement a comprehensive testing strategy.

Learn the answer to "Your Terraform modules have no tests. Implement a comprehensive testing strategy." with detailed explanations, code examples, and best practices on DeployU.

The Scenario

Your team has 15 Terraform modules used across 50+ projects. Problems:

# Module changes break consumers
$ terraform plan
Error: Invalid count argument
  on .terraform/modules/vpc/main.tf line 45:
  count = var.enable_nat_gateway ? length(var.azs) : 0

  The "count" value depends on resource attributes that cannot be determined
  until apply...

# No one knows if modules work until production
# Breaking changes slip through code review
# CI only runs terraform validate (catches syntax, not logic)

The Challenge

Implement a testing pyramid for Terraform: static analysis, unit tests, integration tests, and end-to-end tests. Balance test coverage with execution time and cost.

Wrong Approach

A junior engineer might only use terraform validate, skip tests because 'infrastructure is hard to test', or write tests that deploy real resources on every PR (expensive and slow). These approaches miss logic errors, leave critical code untested, or make CI prohibitively expensive.

Addresses symptoms, not root cause

Right Approach

A senior engineer implements a testing pyramid: fast static analysis (tflint, tfsec) runs on every commit, unit tests (terraform test) validate logic without deploying, integration tests (terratest) deploy to a sandbox on merge, and periodic E2E tests validate complete environments.

Testing Pyramid for Terraform

                    ▲
                   ╱ ╲
                  ╱   ╲     End-to-End Tests
                 ╱     ╲    (Weekly, full environment)
                ╱───────╲
               ╱         ╲   Integration Tests
              ╱           ╲  (On merge, deploy to sandbox)
             ╱─────────────╲
            ╱               ╲  Unit Tests
           ╱                 ╲ (On PR, no deployment)
          ╱───────────────────╲
         ╱                     ╲ Static Analysis
        ╱                       ╲(On every commit)
       ╱─────────────────────────╲

Layer 1: Static Analysis (Every Commit)

# .github/workflows/static-analysis.yml
name: Static Analysis

on: [push, pull_request]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Terraform Format
        run: terraform fmt -check -recursive

      - name: TFLint
        uses: terraform-linters/setup-tflint@v4
      - run: |
          tflint --init
          tflint --recursive

      - name: tfsec Security Scan
        uses: aquasecurity/tfsec-action@v1.0.0

      - name: Checkov Policy Check
        uses: bridgecrewio/checkov-action@v12

TFLint configuration:

# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.27.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
}

rule "terraform_documented_variables" {
  enabled = true
}

rule "terraform_documented_outputs" {
  enabled = true
}

# Catch common AWS mistakes
rule "aws_instance_invalid_type" {
  enabled = true
}

rule "aws_resource_missing_tags" {
  enabled = true
  tags    = ["Environment", "Owner", "Project"]
}

Layer 2: Unit Tests (Terraform Test Framework)

# tests/vpc_test.tftest.hcl
# Native Terraform testing (1.6+)

variables {
  name               = "test-vpc"
  cidr_block         = "10.0.0.0/16"
  azs                = ["us-east-1a", "us-east-1b"]
  public_subnets     = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnets    = ["10.0.11.0/24", "10.0.12.0/24"]
  enable_nat_gateway = true
}

run "vpc_creates_correct_subnets" {
  command = plan  # Don't apply, just plan

  assert {
    condition     = length(aws_subnet.public) == 2
    error_message = "Expected 2 public subnets"
  }

  assert {
    condition     = length(aws_subnet.private) == 2
    error_message = "Expected 2 private subnets"
  }

  assert {
    condition     = aws_vpc.main.cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block incorrect"
  }
}

run "nat_gateway_disabled" {
  variables {
    enable_nat_gateway = false
  }

  command = plan

  assert {
    condition     = length(aws_nat_gateway.main) == 0
    error_message = "NAT gateway should not be created when disabled"
  }
}

# Run tests
terraform test

# Output:
# tests/vpc_test.tftest.hcl... pass
#   run "vpc_creates_correct_subnets"... pass
#   run "nat_gateway_disabled"... pass

Layer 3: Integration Tests (Terratest)

// test/vpc_test.go
package test

import (
    "testing"

    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestVpcModule(t *testing.T) {
    t.Parallel()

    // Use a unique name to avoid conflicts
    uniqueID := random.UniqueId()
    vpcName := fmt.Sprintf("test-vpc-%s", uniqueID)

    terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "name":               vpcName,
            "cidr_block":         "10.0.0.0/16",
            "azs":                []string{"us-east-1a", "us-east-1b"},
            "enable_nat_gateway": true,
        },
        EnvVars: map[string]string{
            "AWS_DEFAULT_REGION": "us-east-1",
        },
    })

    // Clean up after test
    defer terraform.Destroy(t, terraformOptions)

    // Deploy
    terraform.InitAndApply(t, terraformOptions)

    // Validate outputs
    vpcID := terraform.Output(t, terraformOptions, "vpc_id")
    assert.NotEmpty(t, vpcID)

    // Validate actual AWS resources
    vpc := aws.GetVpcById(t, vpcID, "us-east-1")
    assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)

    // Validate subnets
    publicSubnets := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
    assert.Equal(t, 2, len(publicSubnets))

    // Validate NAT gateway
    natGatewayIPs := terraform.OutputList(t, terraformOptions, "nat_gateway_ips")
    assert.Equal(t, 2, len(natGatewayIPs))
}

func TestVpcModuleWithoutNAT(t *testing.T) {
    t.Parallel()

    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "enable_nat_gateway": false,
        },
    }

    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)

    // Verify no NAT gateways created
    natGatewayIPs := terraform.OutputList(t, terraformOptions, "nat_gateway_ips")
    assert.Equal(t, 0, len(natGatewayIPs))
}

# Run integration tests
cd test
go test -v -timeout 30m

# Run specific test
go test -v -run TestVpcModule -timeout 30m

Layer 4: End-to-End Tests

# .github/workflows/e2e-test.yml
name: E2E Tests

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly on Sunday
  workflow_dispatch:

jobs:
  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Deploy Complete Environment
        run: |
          cd examples/complete
          terraform init
          terraform apply -auto-approve

      - name: Run E2E Tests
        run: |
          # Test actual application behavior
          ./scripts/e2e-tests.sh

      - name: Cleanup
        if: always()
        run: |
          cd examples/complete
          terraform destroy -auto-approve

Test Cost Optimization

# tests/cost-optimized.tftest.hcl
# Use smaller instances for tests

variables {
  instance_type = "t3.micro"  # Override production default
  multi_az      = false       # Single AZ for tests
}

# Use mocks for expensive resources
mock_provider "aws" {
  mock_resource "aws_rds_cluster" {
    defaults = {
      endpoint = "mock-endpoint.cluster-xxx.us-east-1.rds.amazonaws.com"
      port     = 5432
    }
  }
}

CI/CD Integration

# .github/workflows/terraform-tests.yml
name: Terraform Tests

on:
  pull_request:
    paths:
      - 'modules/**'

jobs:
  # Fast: Run on every PR
  static:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: terraform fmt -check -recursive
      - run: tflint --recursive
      - run: tfsec .

  # Medium: Run on every PR
  unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: terraform test

  # Slow: Run on merge to main
  integration:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.TEST_AWS_ROLE }}
      - run: |
          cd test
          go test -v -timeout 30m

Systematic, production-ready debugging

Testing Strategy Summary

Test Type	Runs When	Duration	Cost	Catches
Format/Lint	Every commit	Seconds	Free	Style, naming
tfsec/Checkov	Every commit	Seconds	Free	Security issues
terraform validate	Every commit	Seconds	Free	Syntax errors
terraform test	Every PR	Minutes	Free	Logic errors
Terratest	On merge	10-30 min	$$	Integration issues
E2E	Weekly	Hours	$$$	Full system issues

Practice Question

Why should Terraform integration tests (that deploy real resources) run on merge rather than on every PR commit?

Questions