Questions
Terraform apply is stuck because of a state lock. Another engineer's apply crashed mid-execution. Fix it.
The Scenario
Your teammate started a terraform apply but their laptop crashed mid-execution. Now everyone on the team sees this error:
$ terraform plan
Acquiring state lock. This may take a few moments...
Error: Error acquiring the state lock
Error message: ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: a1b2c3d4-5678-90ab-cdef-1234567890ab
Path: s3://company-terraform-state/prod/vpc/terraform.tfstate
Operation: OperationTypeApply
Who: john@company-laptop
Version: 1.5.0
Created: 2024-01-15 10:30:45.123456 +0000 UTC
Info:
Terraform acquires a state lock to protect against two processes
writing to state simultaneously. This error indicates that the
lock has not been released.
The original engineer is unavailable and production changes are blocked.
The Challenge
Safely release the state lock, verify state integrity, and prevent this from happening again. Explain the risks and safeguards.
A junior engineer might immediately run 'terraform force-unlock' without understanding the risks, skip state verification afterward, or not investigate whether the previous apply partially completed. These approaches risk state corruption, resource inconsistencies, or applying changes on top of a broken state.
A senior engineer first verifies the lock holder is truly unavailable, checks if any apply is actually running, uses force-unlock with the correct lock ID, immediately runs terraform plan to verify state consistency, and implements safeguards to prevent future orphaned locks.
Step 1: Verify the Lock is Actually Orphaned
Before force-unlocking, confirm no apply is running:
# Check if the lock holder's process might still be running
# Contact the engineer if possible - their apply might be in progress!
# Check CloudWatch or your monitoring for recent Terraform activity
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name ConsumedWriteCapacityUnits \
--dimensions Name=TableName,Value=terraform-locks \
--start-time 2024-01-15T10:00:00Z \
--end-time 2024-01-15T11:00:00Z \
--period 300 \
--statistics Sum
# Check DynamoDB directly for lock details
aws dynamodb get-item \
--table-name terraform-locks \
--key '{"LockID": {"S": "s3://company-terraform-state/prod/vpc/terraform.tfstate-md5"}}'Step 2: Force Unlock the State
# Use the Lock ID from the error message
terraform force-unlock a1b2c3d4-5678-90ab-cdef-1234567890ab
# You'll see a confirmation prompt:
Do you really want to force-unlock?
Terraform will remove the lock on the remote state.
This will allow local Terraform commands to modify this state, even though it
may still be in use. Only 'yes' will be accepted to confirm.
Enter a value: yes
# Output on success:
Terraform state has been successfully unlocked!Step 3: Verify State Integrity
Critical: After force-unlock, immediately check for issues:
# Run plan to see current state vs reality
terraform plan
# If plan shows unexpected changes, the previous apply may have partially completed
# Look for resources that were being created/modified
# Check state list to see all managed resources
terraform state list
# For specific resources that might be affected:
terraform state show aws_vpc.mainStep 4: Handle Partial Apply Scenarios
If the previous apply partially completed:
# Option 1: If resources were partially created, import them
terraform import aws_instance.web i-1234567890abcdef0
# Option 2: If resources are in bad state, taint and recreate
terraform taint aws_instance.web
terraform apply
# Option 3: Refresh state to match reality
terraform refresh # Deprecated in 1.5+, use:
terraform apply -refresh-onlyStep 5: Implement Safeguards
Backend configuration with proper timeouts:
# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
# Reduce lock timeout for faster failure detection
# Default is 0 (wait forever)
}
}Use Terraform Cloud/Enterprise for better lock management:
terraform {
cloud {
organization = "company"
workspaces {
name = "prod-vpc"
}
}
}CI/CD pipeline with automatic lock release:
# .github/workflows/terraform.yml
jobs:
terraform:
runs-on: ubuntu-latest
timeout-minutes: 30 # Pipeline timeout prevents infinite locks
steps:
- uses: actions/checkout@v4
- name: Terraform Apply
run: terraform apply -auto-approve
timeout-minutes: 20 # Step-level timeout
# Cleanup step runs even if apply fails
- name: Release Lock on Failure
if: failure()
run: |
# Log for audit trail
echo "Apply failed, checking for orphaned lock..."
# Don't auto-unlock - notify team instead
# terraform force-unlock would be dangerous here State Locking Deep Dive
| Backend | Lock Mechanism | Lock Location |
|---|---|---|
| S3 | DynamoDB | Separate DynamoDB table |
| GCS | Native | Built into GCS |
| Azure Blob | Blob Lease | Built into Azure |
| Terraform Cloud | Native | Managed by TFC |
| Local | Filesystem | .terraform.tfstate.lock.info |
DynamoDB Lock Table Schema
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Purpose = "Terraform state locking"
}
}
Emergency Procedures Runbook
#!/bin/bash
# emergency-unlock.sh - Use with extreme caution!
set -e
LOCK_ID=$1
STATE_PATH=$2
if [ -z "$LOCK_ID" ] || [ -z "$STATE_PATH" ]; then
echo "Usage: ./emergency-unlock.sh <lock-id> <state-path>"
exit 1
fi
echo "WARNING: About to force-unlock Terraform state"
echo "Lock ID: $LOCK_ID"
echo "State Path: $STATE_PATH"
echo ""
echo "Checklist before proceeding:"
echo "[ ] Confirmed lock holder is unavailable"
echo "[ ] Checked no apply is currently running"
echo "[ ] Have backup of current state"
echo ""
read -p "Type 'UNLOCK' to proceed: " confirm
if [ "$confirm" != "UNLOCK" ]; then
echo "Aborted"
exit 1
fi
# Create state backup first
terraform state pull > "state-backup-$(date +%Y%m%d-%H%M%S).json"
# Force unlock
terraform force-unlock -force "$LOCK_ID"
# Immediate verification
echo "Running terraform plan to verify state..."
terraform plan -detailed-exitcode || true
echo "Done. Review the plan output carefully!"
Practice Question
Why is it dangerous to run 'terraform force-unlock' without first verifying the lock holder's status?