Questions
Your Terraform modules have no tests. Implement a comprehensive testing strategy.
The Scenario
Your team has 15 Terraform modules used across 50+ projects. Problems:
# Module changes break consumers
$ terraform plan
Error: Invalid count argument
on .terraform/modules/vpc/main.tf line 45:
count = var.enable_nat_gateway ? length(var.azs) : 0
The "count" value depends on resource attributes that cannot be determined
until apply...
# No one knows if modules work until production
# Breaking changes slip through code review
# CI only runs terraform validate (catches syntax, not logic)
The Challenge
Implement a testing pyramid for Terraform: static analysis, unit tests, integration tests, and end-to-end tests. Balance test coverage with execution time and cost.
A junior engineer might only use terraform validate, skip tests because 'infrastructure is hard to test', or write tests that deploy real resources on every PR (expensive and slow). These approaches miss logic errors, leave critical code untested, or make CI prohibitively expensive.
A senior engineer implements a testing pyramid: fast static analysis (tflint, tfsec) runs on every commit, unit tests (terraform test) validate logic without deploying, integration tests (terratest) deploy to a sandbox on merge, and periodic E2E tests validate complete environments.
Testing Pyramid for Terraform
▲
╱ ╲
╱ ╲ End-to-End Tests
╱ ╲ (Weekly, full environment)
╱───────╲
╱ ╲ Integration Tests
╱ ╲ (On merge, deploy to sandbox)
╱─────────────╲
╱ ╲ Unit Tests
╱ ╲ (On PR, no deployment)
╱───────────────────╲
╱ ╲ Static Analysis
╱ ╲(On every commit)
╱─────────────────────────╲Layer 1: Static Analysis (Every Commit)
# .github/workflows/static-analysis.yml
name: Static Analysis
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Terraform Format
run: terraform fmt -check -recursive
- name: TFLint
uses: terraform-linters/setup-tflint@v4
- run: |
tflint --init
tflint --recursive
- name: tfsec Security Scan
uses: aquasecurity/tfsec-action@v1.0.0
- name: Checkov Policy Check
uses: bridgecrewio/checkov-action@v12TFLint configuration:
# .tflint.hcl
plugin "aws" {
enabled = true
version = "0.27.0"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}
rule "terraform_naming_convention" {
enabled = true
}
rule "terraform_documented_variables" {
enabled = true
}
rule "terraform_documented_outputs" {
enabled = true
}
# Catch common AWS mistakes
rule "aws_instance_invalid_type" {
enabled = true
}
rule "aws_resource_missing_tags" {
enabled = true
tags = ["Environment", "Owner", "Project"]
}Layer 2: Unit Tests (Terraform Test Framework)
# tests/vpc_test.tftest.hcl
# Native Terraform testing (1.6+)
variables {
name = "test-vpc"
cidr_block = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
private_subnets = ["10.0.11.0/24", "10.0.12.0/24"]
enable_nat_gateway = true
}
run "vpc_creates_correct_subnets" {
command = plan # Don't apply, just plan
assert {
condition = length(aws_subnet.public) == 2
error_message = "Expected 2 public subnets"
}
assert {
condition = length(aws_subnet.private) == 2
error_message = "Expected 2 private subnets"
}
assert {
condition = aws_vpc.main.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR block incorrect"
}
}
run "nat_gateway_disabled" {
variables {
enable_nat_gateway = false
}
command = plan
assert {
condition = length(aws_nat_gateway.main) == 0
error_message = "NAT gateway should not be created when disabled"
}
}# Run tests
terraform test
# Output:
# tests/vpc_test.tftest.hcl... pass
# run "vpc_creates_correct_subnets"... pass
# run "nat_gateway_disabled"... passLayer 3: Integration Tests (Terratest)
// test/vpc_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestVpcModule(t *testing.T) {
t.Parallel()
// Use a unique name to avoid conflicts
uniqueID := random.UniqueId()
vpcName := fmt.Sprintf("test-vpc-%s", uniqueID)
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"name": vpcName,
"cidr_block": "10.0.0.0/16",
"azs": []string{"us-east-1a", "us-east-1b"},
"enable_nat_gateway": true,
},
EnvVars: map[string]string{
"AWS_DEFAULT_REGION": "us-east-1",
},
})
// Clean up after test
defer terraform.Destroy(t, terraformOptions)
// Deploy
terraform.InitAndApply(t, terraformOptions)
// Validate outputs
vpcID := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcID)
// Validate actual AWS resources
vpc := aws.GetVpcById(t, vpcID, "us-east-1")
assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)
// Validate subnets
publicSubnets := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
assert.Equal(t, 2, len(publicSubnets))
// Validate NAT gateway
natGatewayIPs := terraform.OutputList(t, terraformOptions, "nat_gateway_ips")
assert.Equal(t, 2, len(natGatewayIPs))
}
func TestVpcModuleWithoutNAT(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"enable_nat_gateway": false,
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
// Verify no NAT gateways created
natGatewayIPs := terraform.OutputList(t, terraformOptions, "nat_gateway_ips")
assert.Equal(t, 0, len(natGatewayIPs))
}# Run integration tests
cd test
go test -v -timeout 30m
# Run specific test
go test -v -run TestVpcModule -timeout 30mLayer 4: End-to-End Tests
# .github/workflows/e2e-test.yml
name: E2E Tests
on:
schedule:
- cron: '0 2 * * 0' # Weekly on Sunday
workflow_dispatch:
jobs:
e2e:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy Complete Environment
run: |
cd examples/complete
terraform init
terraform apply -auto-approve
- name: Run E2E Tests
run: |
# Test actual application behavior
./scripts/e2e-tests.sh
- name: Cleanup
if: always()
run: |
cd examples/complete
terraform destroy -auto-approveTest Cost Optimization
# tests/cost-optimized.tftest.hcl
# Use smaller instances for tests
variables {
instance_type = "t3.micro" # Override production default
multi_az = false # Single AZ for tests
}
# Use mocks for expensive resources
mock_provider "aws" {
mock_resource "aws_rds_cluster" {
defaults = {
endpoint = "mock-endpoint.cluster-xxx.us-east-1.rds.amazonaws.com"
port = 5432
}
}
}CI/CD Integration
# .github/workflows/terraform-tests.yml
name: Terraform Tests
on:
pull_request:
paths:
- 'modules/**'
jobs:
# Fast: Run on every PR
static:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: terraform fmt -check -recursive
- run: tflint --recursive
- run: tfsec .
# Medium: Run on every PR
unit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: terraform test
# Slow: Run on merge to main
integration:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.TEST_AWS_ROLE }}
- run: |
cd test
go test -v -timeout 30m Testing Strategy Summary
| Test Type | Runs When | Duration | Cost | Catches |
|---|---|---|---|---|
| Format/Lint | Every commit | Seconds | Free | Style, naming |
| tfsec/Checkov | Every commit | Seconds | Free | Security issues |
| terraform validate | Every commit | Seconds | Free | Syntax errors |
| terraform test | Every PR | Minutes | Free | Logic errors |
| Terratest | On merge | 10-30 min | $$ | Integration issues |
| E2E | Weekly | Hours | $$$ | Full system issues |
Practice Question
Why should Terraform integration tests (that deploy real resources) run on merge rather than on every PR commit?