Questions
Resources were changed manually in the AWS console. Terraform plan shows unexpected changes. Handle it.
The Scenario
During an incident, an engineer manually modified security group rules in the AWS console to temporarily allow traffic. Now terraform plan shows:
$ terraform plan
aws_security_group.api: Refreshing state... [id=sg-0abc123def456789]
Terraform will perform the following actions:
# aws_security_group_rule.api_ingress will be destroyed
- resource "aws_security_group_rule" "api_ingress" {
- cidr_blocks = ["10.0.0.0/8"]
- from_port = 443
- protocol = "tcp"
- security_group_id = "sg-0abc123def456789"
- to_port = 443
- type = "ingress"
}
Plan: 0 to add, 0 to change, 1 to destroy.
The manually-added rule is critical for the temporary fix. Running terraform apply would break production.
The Challenge
Understand why drift happens, how to detect it, and the options for reconciliation. Decide whether to adopt the manual change into Terraform or revert it.
A junior engineer might blindly run terraform apply to 'fix' the drift, delete the manual change without understanding its purpose, or modify the state file directly. These approaches risk production outages, lose important context, and can corrupt state.
A senior engineer first investigates why the manual change was made, decides whether to adopt it into Terraform or revert it, uses the appropriate Terraform command to reconcile state, and implements drift detection to catch future manual changes early.
Step 1: Investigate the Drift
# See what Terraform thinks the current state is
terraform show
# See what's actually in AWS
aws ec2 describe-security-groups --group-ids sg-0abc123def456789
# Compare the two to understand the drift
# Look for:
# - Rules in AWS but not in Terraform (manually added)
# - Rules in Terraform but not in AWS (manually deleted)
# - Rules with different values (manually modified)Step 2: Understand the Context
Before changing anything:
- Check incident channel/tickets for why the change was made
- Verify if the temporary fix is still needed
- Confirm with the team whether to keep or revert
Option A: Adopt Manual Change into Terraform
If the manual change should become permanent:
# Add the new rule to your Terraform configuration
resource "aws_security_group_rule" "api_temp_ingress" {
type = "ingress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["10.0.0.0/8"]
security_group_id = aws_security_group.api.id
description = "Temporary fix from incident INC-1234 - review for removal"
}Then import the existing resource:
# Import the manually-created rule
terraform import aws_security_group_rule.api_temp_ingress sg-0abc123def456789_ingress_tcp_443_443_10.0.0.0/8
# Verify import worked
terraform plan
# Should show: No changes. Infrastructure is up-to-date.Option B: Revert Manual Change
If the manual change should be removed:
# Simply run apply - Terraform will remove the drift
terraform apply
# The manual rule will be deletedOption C: Refresh State Only
If you want to update state to match reality without changing anything:
# Terraform 1.5+
terraform apply -refresh-only
# This updates state to match AWS reality
# Future plans will show changes needed to match your configImplementing Drift Detection
Scheduled drift detection in CI/CD:
# .github/workflows/drift-detection.yml
name: Terraform Drift Detection
on:
schedule:
- cron: '0 */4 * * *' # Every 4 hours
workflow_dispatch:
jobs:
detect-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
- name: Detect Drift
id: plan
run: |
terraform plan -detailed-exitcode -out=plan.tfplan 2>&1 | tee plan.log
echo "exitcode=$?" >> $GITHUB_OUTPUT
continue-on-error: true
- name: Report Drift
if: steps.plan.outputs.exitcode == '2'
run: |
echo "DRIFT DETECTED!"
cat plan.log
# Send alert to Slack/PagerDuty
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-H 'Content-Type: application/json' \
-d '{"text": "Terraform drift detected in production! Check GitHub Actions for details."}'Using Terraform Cloud:
# Terraform Cloud has built-in drift detection
terraform {
cloud {
organization = "company"
workspaces {
name = "prod-infrastructure"
}
}
}
# In TFC settings, enable:
# - Health Assessments (drift detection)
# - Continuous ValidationPreventing Drift
1. AWS Service Control Policies (SCPs):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyManualSecurityGroupChanges",
"Effect": "Deny",
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RevokeSecurityGroupEgress"
],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:PrincipalArn": [
"arn:aws:iam::*:role/TerraformExecutionRole"
]
}
}
}
]
}2. AWS Config Rules:
resource "aws_config_config_rule" "security_group_drift" {
name = "security-group-drift-detection"
source {
owner = "AWS"
source_identifier = "EC2_SECURITY_GROUP_ATTACHED_TO_ENI_PERIODIC"
}
scope {
compliance_resource_types = ["AWS::EC2::SecurityGroup"]
}
}3. Lifecycle ignore_changes (use sparingly):
# For resources that are expected to change outside Terraform
resource "aws_autoscaling_group" "web" {
# ...
lifecycle {
ignore_changes = [
desired_capacity, # Changed by autoscaling
target_group_arns, # Changed by deployments
]
}
} Drift Reconciliation Decision Tree
Manual change detected
│
▼
Is the change intentional?
│
┌───┴───┐
│ │
Yes No
│ │
▼ ▼
Should it terraform apply
be permanent? (reverts change)
│
┌───┴───┐
│ │
Yes No (temporary)
│ │
▼ ▼
Add to Document & schedule
Terraform removal
+ import
State Manipulation Commands
| Command | Use Case |
|---|---|
terraform import | Add existing resource to state |
terraform state rm | Remove resource from state (keeps in AWS) |
terraform state mv | Rename/move resource in state |
terraform apply -refresh-only | Update state without changing infra |
terraform taint | Mark resource for recreation |
Practice Question
What command should you use to update Terraform state to match actual AWS resources without making any infrastructure changes?