DeployU
Interviews / Cloud & DevOps / Resources were changed manually in the AWS console. Terraform plan shows unexpected changes. Handle it.

Resources were changed manually in the AWS console. Terraform plan shows unexpected changes. Handle it.

debugging Drift Detection Interactive Quiz Code Examples

The Scenario

During an incident, an engineer manually modified security group rules in the AWS console to temporarily allow traffic. Now terraform plan shows:

$ terraform plan

aws_security_group.api: Refreshing state... [id=sg-0abc123def456789]

Terraform will perform the following actions:

  # aws_security_group_rule.api_ingress will be destroyed
  - resource "aws_security_group_rule" "api_ingress" {
      - cidr_blocks       = ["10.0.0.0/8"]
      - from_port         = 443
      - protocol          = "tcp"
      - security_group_id = "sg-0abc123def456789"
      - to_port           = 443
      - type              = "ingress"
    }

Plan: 0 to add, 0 to change, 1 to destroy.

The manually-added rule is critical for the temporary fix. Running terraform apply would break production.

The Challenge

Understand why drift happens, how to detect it, and the options for reconciliation. Decide whether to adopt the manual change into Terraform or revert it.

Wrong Approach

A junior engineer might blindly run terraform apply to 'fix' the drift, delete the manual change without understanding its purpose, or modify the state file directly. These approaches risk production outages, lose important context, and can corrupt state.

Right Approach

A senior engineer first investigates why the manual change was made, decides whether to adopt it into Terraform or revert it, uses the appropriate Terraform command to reconcile state, and implements drift detection to catch future manual changes early.

Step 1: Investigate the Drift

# See what Terraform thinks the current state is
terraform show

# See what's actually in AWS
aws ec2 describe-security-groups --group-ids sg-0abc123def456789

# Compare the two to understand the drift
# Look for:
# - Rules in AWS but not in Terraform (manually added)
# - Rules in Terraform but not in AWS (manually deleted)
# - Rules with different values (manually modified)

Step 2: Understand the Context

Before changing anything:

  • Check incident channel/tickets for why the change was made
  • Verify if the temporary fix is still needed
  • Confirm with the team whether to keep or revert

Option A: Adopt Manual Change into Terraform

If the manual change should become permanent:

# Add the new rule to your Terraform configuration
resource "aws_security_group_rule" "api_temp_ingress" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = ["10.0.0.0/8"]
  security_group_id = aws_security_group.api.id
  description       = "Temporary fix from incident INC-1234 - review for removal"
}

Then import the existing resource:

# Import the manually-created rule
terraform import aws_security_group_rule.api_temp_ingress sg-0abc123def456789_ingress_tcp_443_443_10.0.0.0/8

# Verify import worked
terraform plan
# Should show: No changes. Infrastructure is up-to-date.

Option B: Revert Manual Change

If the manual change should be removed:

# Simply run apply - Terraform will remove the drift
terraform apply

# The manual rule will be deleted

Option C: Refresh State Only

If you want to update state to match reality without changing anything:

# Terraform 1.5+
terraform apply -refresh-only

# This updates state to match AWS reality
# Future plans will show changes needed to match your config

Implementing Drift Detection

Scheduled drift detection in CI/CD:

# .github/workflows/drift-detection.yml
name: Terraform Drift Detection

on:
  schedule:
    - cron: '0 */4 * * *'  # Every 4 hours
  workflow_dispatch:

jobs:
  detect-drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init

      - name: Detect Drift
        id: plan
        run: |
          terraform plan -detailed-exitcode -out=plan.tfplan 2>&1 | tee plan.log
          echo "exitcode=$?" >> $GITHUB_OUTPUT
        continue-on-error: true

      - name: Report Drift
        if: steps.plan.outputs.exitcode == '2'
        run: |
          echo "DRIFT DETECTED!"
          cat plan.log
          # Send alert to Slack/PagerDuty
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -H 'Content-Type: application/json' \
            -d '{"text": "Terraform drift detected in production! Check GitHub Actions for details."}'

Using Terraform Cloud:

# Terraform Cloud has built-in drift detection
terraform {
  cloud {
    organization = "company"
    workspaces {
      name = "prod-infrastructure"
    }
  }
}

# In TFC settings, enable:
# - Health Assessments (drift detection)
# - Continuous Validation

Preventing Drift

1. AWS Service Control Policies (SCPs):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyManualSecurityGroupChanges",
      "Effect": "Deny",
      "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:AuthorizeSecurityGroupEgress",
        "ec2:RevokeSecurityGroupIngress",
        "ec2:RevokeSecurityGroupEgress"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalArn": [
            "arn:aws:iam::*:role/TerraformExecutionRole"
          ]
        }
      }
    }
  ]
}

2. AWS Config Rules:

resource "aws_config_config_rule" "security_group_drift" {
  name = "security-group-drift-detection"

  source {
    owner             = "AWS"
    source_identifier = "EC2_SECURITY_GROUP_ATTACHED_TO_ENI_PERIODIC"
  }

  scope {
    compliance_resource_types = ["AWS::EC2::SecurityGroup"]
  }
}

3. Lifecycle ignore_changes (use sparingly):

# For resources that are expected to change outside Terraform
resource "aws_autoscaling_group" "web" {
  # ...

  lifecycle {
    ignore_changes = [
      desired_capacity,  # Changed by autoscaling
      target_group_arns, # Changed by deployments
    ]
  }
}

Drift Reconciliation Decision Tree

Manual change detected


Is the change intentional?

    ┌───┴───┐
    │       │
   Yes      No
    │       │
    ▼       ▼
Should it   terraform apply
be permanent? (reverts change)

┌───┴───┐
│       │
Yes     No (temporary)
│       │
▼       ▼
Add to   Document & schedule
Terraform removal
+ import

State Manipulation Commands

CommandUse Case
terraform importAdd existing resource to state
terraform state rmRemove resource from state (keeps in AWS)
terraform state mvRename/move resource in state
terraform apply -refresh-onlyUpdate state without changing infra
terraform taintMark resource for recreation

Practice Question

What command should you use to update Terraform state to match actual AWS resources without making any infrastructure changes?