Interviews / DevOps & Cloud Infrastructure / Design a deployment workflow with environment approvals, staging, and production rollbacks.
A critical deployment workflow is failing intermittently. Debug and fix the issue.
100 repositories duplicate the same CI workflow. Design a reusable workflow architecture.
Workflows are consuming too many minutes and running slowly. Optimize for speed and cost.
Workflows use long-lived credentials that could be leaked. Implement secure authentication with OIDC.
GitHub-hosted runners don't meet our requirements. Configure self-hosted runners at scale.
We need to test across multiple OS, Node versions, and configurations. Implement efficient matrix builds.
A workflow is vulnerable to script injection attacks. Identify and fix the security issues.
Every workflow run downloads the same dependencies. Implement an effective caching strategy.
Multiple workflows share the same setup steps. Create a composite action for reuse.
Releases are manual and error-prone. Automate with semantic versioning and changelogs.
Design a deployment workflow with environment approvals, staging, and production rollbacks.
Our monorepo builds everything on every change. Implement efficient path-based workflows.
Questions
Design a deployment workflow with environment approvals, staging, and production rollbacks.
The Scenario
Your deployment process needs enterprise controls:
Requirements:
- Deploy to staging automatically on main branch push
- Require approval from 2 team leads for production
- Run smoke tests before promoting to production
- Enable one-click rollback to any previous version
- Track who deployed what and when
- Prevent deployments during maintenance windows
The Challenge
Design a comprehensive deployment workflow with proper environment gates, approvals, rollback capabilities, and audit trails.
Wrong Approach
A junior engineer might deploy directly to production without gates, use manual kubectl commands for rollback, or skip staging entirely for hot fixes. These approaches risk production outages, make rollbacks error-prone, and bypass safety checks.
Right Approach
A senior engineer implements GitHub Environments with protection rules, uses deployment_status events for orchestration, builds automated rollback mechanisms, and creates comprehensive audit logging.
Step 1: Configure GitHub Environments
# Configure in Repository Settings > Environments
# staging:
# - No required reviewers
# - Deployment branches: main
# production:
# - Required reviewers: 2 from @org/release-managers
# - Wait timer: 5 minutes
# - Deployment branches: main only
# - Environment secrets: PROD_* credentialsStep 2: Multi-Stage Deployment Workflow
name: Deploy
on:
push:
branches: [main]
workflow_dispatch:
inputs:
environment:
description: 'Target environment'
required: true
type: choice
options:
- staging
- production
version:
description: 'Version to deploy (for rollback)'
required: false
type: string
permissions:
contents: read
id-token: write
deployments: write
jobs:
build:
runs-on: ubuntu-latest
outputs:
image: ${{ steps.build.outputs.image }}
version: ${{ steps.version.outputs.version }}
steps:
- uses: actions/checkout@v4
- name: Generate version
id: version
run: |
VERSION="${{ inputs.version || github.sha }}"
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Build and push
id: build
run: |
IMAGE="ghcr.io/${{ github.repository }}:${{ steps.version.outputs.version }}"
docker build -t $IMAGE .
docker push $IMAGE
echo "image=$IMAGE" >> $GITHUB_OUTPUT
deploy-staging:
needs: build
if: github.event_name == 'push' || inputs.environment == 'staging'
runs-on: ubuntu-latest
environment:
name: staging
url: https://staging.app.example.com
steps:
- uses: actions/checkout@v4
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Deploy to staging
id: deploy
run: |
aws eks update-kubeconfig --name staging-cluster
kubectl set image deployment/app app=${{ needs.build.outputs.image }} -n app
kubectl rollout status deployment/app -n app --timeout=300s
- name: Run smoke tests
run: |
# Wait for deployment to be ready
sleep 30
curl -sf https://staging.app.example.com/health || exit 1
npm run test:smoke -- --env=staging
- name: Record deployment
if: always()
run: |
echo '{
"environment": "staging",
"version": "${{ needs.build.outputs.version }}",
"image": "${{ needs.build.outputs.image }}",
"status": "${{ job.status }}",
"timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
"actor": "${{ github.actor }}",
"run_id": "${{ github.run_id }}"
}' | tee deployment-record.json
# Store in deployment history
aws s3 cp deployment-record.json \
s3://deployments-bucket/staging/${{ github.run_id }}.json
deploy-production:
needs: [build, deploy-staging]
if: success() && (github.event_name == 'push' || inputs.environment == 'production')
runs-on: ubuntu-latest
environment:
name: production
url: https://app.example.com
steps:
- uses: actions/checkout@v4
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN_PROD }}
aws-region: us-east-1
- name: Pre-deployment checks
run: |
# Check maintenance window
HOUR=$(date +%H)
DAY=$(date +%u)
if [[ $HOUR -ge 22 || $HOUR -lt 6 ]] && [[ $DAY -le 5 ]]; then
echo "::error::Deployments blocked during maintenance window (10PM-6AM weekdays)"
exit 1
fi
# Check for active incidents
curl -sf https://status.example.com/api/incidents/active | jq -e '.count == 0' || {
echo "::error::Cannot deploy during active incident"
exit 1
}
- name: Deploy to production
run: |
aws eks update-kubeconfig --name production-cluster
# Record pre-deployment state for rollback
kubectl get deployment/app -n app -o json > pre-deploy-state.json
# Deploy with canary strategy
kubectl set image deployment/app app=${{ needs.build.outputs.image }} -n app
kubectl rollout status deployment/app -n app --timeout=600s
- name: Post-deployment verification
run: |
# Health check
for i in {1..5}; do
curl -sf https://app.example.com/health && break
sleep 10
done
# Smoke tests
npm run test:smoke -- --env=production
- name: Record successful deployment
run: |
echo '{
"environment": "production",
"version": "${{ needs.build.outputs.version }}",
"image": "${{ needs.build.outputs.image }}",
"status": "success",
"timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
"actor": "${{ github.actor }}",
"approvers": "${{ github.event.review.user.login || 'auto' }}",
"run_id": "${{ github.run_id }}"
}' | aws s3 cp - s3://deployments-bucket/production/${{ github.run_id }}.jsonStep 3: Rollback Workflow
name: Rollback
on:
workflow_dispatch:
inputs:
environment:
description: 'Environment to rollback'
required: true
type: choice
options:
- staging
- production
target_version:
description: 'Version to rollback to (leave empty for previous)'
required: false
type: string
reason:
description: 'Reason for rollback'
required: true
type: string
permissions:
contents: read
id-token: write
deployments: write
jobs:
rollback:
runs-on: ubuntu-latest
environment: ${{ inputs.environment }}
steps:
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Determine rollback target
id: target
run: |
if [ -n "${{ inputs.target_version }}" ]; then
echo "version=${{ inputs.target_version }}" >> $GITHUB_OUTPUT
else
# Get previous successful deployment
PREV=$(aws s3 ls s3://deployments-bucket/${{ inputs.environment }}/ | \
sort -r | head -2 | tail -1 | awk '{print $4}')
VERSION=$(aws s3 cp s3://deployments-bucket/${{ inputs.environment }}/$PREV - | jq -r '.version')
echo "version=$VERSION" >> $GITHUB_OUTPUT
fi
- name: Perform rollback
run: |
aws eks update-kubeconfig --name ${{ inputs.environment }}-cluster
IMAGE="ghcr.io/${{ github.repository }}:${{ steps.target.outputs.version }}"
kubectl set image deployment/app app=$IMAGE -n app
kubectl rollout status deployment/app -n app --timeout=300s
- name: Verify rollback
run: |
curl -sf https://${{ inputs.environment == 'production' && 'app' || 'staging.app' }}.example.com/health
- name: Record rollback
run: |
echo '{
"type": "rollback",
"environment": "${{ inputs.environment }}",
"from_version": "current",
"to_version": "${{ steps.target.outputs.version }}",
"reason": "${{ inputs.reason }}",
"actor": "${{ github.actor }}",
"timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}' | aws s3 cp - s3://deployments-bucket/rollbacks/${{ github.run_id }}.json
- name: Notify team
uses: slackapi/slack-github-action@v1
with:
webhook: ${{ secrets.SLACK_WEBHOOK }}
webhook-type: incoming-webhook
payload: |
{
"text": "Rollback completed",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Rollback to ${{ inputs.environment }}*\nVersion: ${{ steps.target.outputs.version }}\nReason: ${{ inputs.reason }}\nBy: ${{ github.actor }}"
}
}
]
}Step 4: Blue-Green Deployment
name: Blue-Green Deploy
on:
workflow_dispatch:
inputs:
environment:
type: choice
options: [staging, production]
jobs:
deploy:
runs-on: ubuntu-latest
environment: ${{ inputs.environment }}
steps:
- uses: actions/checkout@v4
- name: Configure AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Get current deployment color
id: current
run: |
aws eks update-kubeconfig --name ${{ inputs.environment }}-cluster
CURRENT=$(kubectl get service/app -n app -o jsonpath='{.spec.selector.color}')
if [ "$CURRENT" == "blue" ]; then
echo "current=blue" >> $GITHUB_OUTPUT
echo "target=green" >> $GITHUB_OUTPUT
else
echo "current=green" >> $GITHUB_OUTPUT
echo "target=blue" >> $GITHUB_OUTPUT
fi
- name: Deploy to inactive color
run: |
IMAGE="ghcr.io/${{ github.repository }}:${{ github.sha }}"
kubectl set image deployment/app-${{ steps.current.outputs.target }} \
app=$IMAGE -n app
kubectl rollout status deployment/app-${{ steps.current.outputs.target }} \
-n app --timeout=300s
- name: Test inactive deployment
run: |
# Test the inactive deployment directly
POD=$(kubectl get pod -n app -l color=${{ steps.current.outputs.target }} -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward $POD 8080:8080 -n app &
sleep 5
curl -sf http://localhost:8080/health
- name: Switch traffic
run: |
kubectl patch service/app -n app \
-p '{"spec":{"selector":{"color":"${{ steps.current.outputs.target }}"}}}'
- name: Verify switch
run: |
sleep 10
curl -sf https://app.example.com/health
npm run test:smoke
- name: Keep old deployment for quick rollback
run: |
echo "Previous deployment (${{ steps.current.outputs.current }}) kept for rollback"
echo "To rollback, run: kubectl patch service/app -n app -p '{\"spec\":{\"selector\":{\"color\":\"${{ steps.current.outputs.current }}\"}}}'"Step 5: Deployment Dashboard Data
name: Update Deployment Dashboard
on:
deployment_status:
jobs:
update-dashboard:
runs-on: ubuntu-latest
steps:
- name: Update deployment metrics
run: |
# Send metrics to monitoring system
curl -X POST https://metrics.example.com/deployments \
-H "Content-Type: application/json" \
-d '{
"repository": "${{ github.repository }}",
"environment": "${{ github.event.deployment.environment }}",
"status": "${{ github.event.deployment_status.state }}",
"sha": "${{ github.event.deployment.sha }}",
"creator": "${{ github.event.deployment.creator.login }}",
"timestamp": "${{ github.event.deployment_status.created_at }}"
}' Deployment Pattern Comparison
| Pattern | Downtime | Rollback Speed | Resource Usage |
|---|---|---|---|
| Rolling | Zero | Minutes | 1x |
| Blue-Green | Zero | Seconds | 2x |
| Canary | Zero | Seconds | 1.1x |
| Recreate | Yes | Minutes | 1x |
Practice Question
What is the purpose of the 'wait timer' protection rule in GitHub Environments?