Terraform fails with 'Error: Cycle' involving security groups and EC2 instances. Debug and fix it.

Q: Terraform fails with 'Error: Cycle' involving security groups and EC2 instances. Debug and fix it.

Learn the answer to "Terraform fails with 'Error: Cycle' involving security groups and EC2 instances. Debug and fix it." with detailed explanations, code examples, and best practices on DeployU.

The Scenario

You’re setting up a web application with frontend and backend servers that need to communicate. Terraform fails with:

$ terraform plan

Error: Cycle: aws_security_group.backend, aws_security_group.frontend

  on main.tf line 15:
  15: resource "aws_security_group" "frontend" {

Your configuration:

resource "aws_security_group" "frontend" {
  name = "frontend-sg"

  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    cidr_blocks     = ["0.0.0.0/0"]
  }

  # Frontend needs to call backend
  egress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.backend.id]  # References backend
  }
}

resource "aws_security_group" "backend" {
  name = "backend-sg"

  # Backend accepts traffic from frontend
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.frontend.id]  # References frontend
  }
}

The Challenge

Understand why dependency cycles occur and implement the correct pattern to break them while maintaining the required security group relationships.

Wrong Approach

A junior engineer might try using depends_on to force an order, remove one of the references and use CIDR blocks instead, or split into multiple terraform apply runs. These approaches either don't work (depends_on can't break cycles), weaken security (CIDR blocks are less precise), or create operational complexity.

Addresses symptoms, not root cause

Right Approach

A senior engineer recognizes this as a classic circular dependency and uses separate aws_security_group_rule resources instead of inline rules. This breaks the cycle because the security groups are created first, then the rules reference them independently.

Understanding the Cycle

frontend_sg ──references──► backend_sg
     ▲                           │
     │                           │
     └────────references─────────┘

Both need the other to exist first = impossible!

Solution: Separate Security Group Rules

# Step 1: Create security groups WITHOUT inline rules
resource "aws_security_group" "frontend" {
  name        = "frontend-sg"
  description = "Frontend web servers"
  vpc_id      = var.vpc_id

  tags = {
    Name = "frontend-sg"
  }
}

resource "aws_security_group" "backend" {
  name        = "backend-sg"
  description = "Backend API servers"
  vpc_id      = var.vpc_id

  tags = {
    Name = "backend-sg"
  }
}

# Step 2: Create rules as SEPARATE resources
resource "aws_security_group_rule" "frontend_ingress_http" {
  type              = "ingress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.frontend.id
  description       = "HTTP from internet"
}

resource "aws_security_group_rule" "frontend_egress_to_backend" {
  type                     = "egress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.backend.id  # Now this works!
  security_group_id        = aws_security_group.frontend.id
  description              = "To backend API"
}

resource "aws_security_group_rule" "backend_ingress_from_frontend" {
  type                     = "ingress"
  from_port                = 8080
  to_port                  = 8080
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.frontend.id  # And this!
  security_group_id        = aws_security_group.backend.id
  description              = "From frontend servers"
}

resource "aws_security_group_rule" "backend_egress_all" {
  type              = "egress"
  from_port         = 0
  to_port           = 0
  protocol          = "-1"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.backend.id
  description       = "Allow all outbound"
}

Why This Works

Step 1: Create both security groups (no dependencies)
frontend_sg ──created──► (no references)
backend_sg  ──created──► (no references)

Step 2: Create rules (both SGs now exist)
frontend_egress_rule ──references──► backend_sg ✓ (exists)
backend_ingress_rule ──references──► frontend_sg ✓ (exists)

Alternative: Self-Referencing Security Group

For cases where instances in the same security group need to communicate:

resource "aws_security_group" "cluster" {
  name        = "cluster-sg"
  description = "Cluster nodes"
  vpc_id      = var.vpc_id
}

# Self-reference works with separate rules
resource "aws_security_group_rule" "cluster_internal" {
  type                     = "ingress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  self                     = true  # Special case for self-reference
  security_group_id        = aws_security_group.cluster.id
  description              = "Internal cluster communication"
}

Debugging Cycles

# Visualize the dependency graph
terraform graph | dot -Tpng > graph.png

# Or use text output
terraform graph

# Look for bidirectional arrows between resources
# Example cycle in graph output:
# "aws_security_group.frontend" -> "aws_security_group.backend"
# "aws_security_group.backend" -> "aws_security_group.frontend"

Common Cycle Patterns and Fixes

Pattern 1: IAM Role and Policy

# WRONG: Cycle between role and policy
resource "aws_iam_role" "app" {
  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Principal = { Service = "ec2.amazonaws.com" }
    }]
  })
}

resource "aws_iam_policy" "app" {
  policy = jsonencode({
    Statement = [{
      Action   = "s3:GetObject"
      Resource = "arn:aws:s3:::bucket/*"
    }]
  })
}

# This can cause issues if policy references role ARN
resource "aws_iam_role_policy_attachment" "app" {
  role       = aws_iam_role.app.name
  policy_arn = aws_iam_policy.app.arn
}

Pattern 2: Lambda and CloudWatch Logs

# WRONG: Lambda needs log group, but log group name includes Lambda name
resource "aws_lambda_function" "app" {
  function_name = "my-function"
  # ...
  depends_on = [aws_cloudwatch_log_group.lambda]
}

resource "aws_cloudwatch_log_group" "lambda" {
  name = "/aws/lambda/${aws_lambda_function.app.function_name}"  # Cycle!
}

# RIGHT: Use a local or hardcode the name
locals {
  function_name = "my-function"
}

resource "aws_cloudwatch_log_group" "lambda" {
  name = "/aws/lambda/${local.function_name}"
}

resource "aws_lambda_function" "app" {
  function_name = local.function_name
  depends_on    = [aws_cloudwatch_log_group.lambda]
}

Pattern 3: Route53 and ACM Certificate

# Certificate validation requires DNS record
# DNS record requires certificate ARN
# Use for_each with certificate_validation to break cycle

resource "aws_acm_certificate" "main" {
  domain_name       = "example.com"
  validation_method = "DNS"
}

resource "aws_route53_record" "cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }

  zone_id = var.zone_id
  name    = each.value.name
  type    = each.value.type
  records = [each.value.record]
  ttl     = 60
}

resource "aws_acm_certificate_validation" "main" {
  certificate_arn         = aws_acm_certificate.main.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

Systematic, production-ready debugging

Cycle Prevention Best Practices

Pattern	Problem	Solution
Inline SG rules	Mutual references	Separate `aws_security_group_rule` resources
Resource names	Resource A name depends on B	Use `locals` for names
IAM policies	Policy references resource being created	Use `aws_iam_policy_document` data source
Module outputs	Module A needs Module B output and vice versa	Restructure modules or use data sources

Practice Question

Why does using separate aws_security_group_rule resources break a cycle between two security groups?

Questions