Questions
Implement a GitOps workflow for deploying applications to multiple environments (dev/staging/prod).
The Scenario
You’re the Platform Engineering Lead at a rapidly growing SaaS company. Your team manages 50+ microservices deployed across three environments:
- Development: 5 microservices, rapid deployments (50+ per day)
- Staging: 20 microservices, QA testing environment
- Production: 50 microservices, serving 1M+ users
Current problems with manual kubectl deployments:
- No audit trail - Can’t track who deployed what, when
- Configuration drift - Production configs don’t match Git
- Manual errors - Typos in kubectl commands caused 2 outages last month
- No rollback mechanism - Reverting bad deploys takes 30+ minutes
- Security risk - 15 developers have direct kubectl access to production
Your CTO’s mandate:
“Implement GitOps. All deployments must go through Git. No direct kubectl access to production. Full audit trail. Automatic rollback on failures.”
The Challenge
Design and implement a complete GitOps pipeline using ArgoCD or Flux that:
- Single source of truth: All configs in Git repositories
- Automated deployments: Git commit → automatic deployment
- Environment promotion: Tested changes flow from dev → staging → prod
- Rollback capability: Instant rollback via Git revert
- Security: Remove direct kubectl access, use RBAC for Git-based approvals
Show the complete implementation with repository structure, ArgoCD configs, and workflows.
How Different Experience Levels Approach This
Basic CI/CD without GitOps principles - set up a Jenkins job that runs kubectl apply when code is pushed. No single source of truth (deployments happen from CI, not Git), configuration drift (cluster state may differ from Git), no automatic healing (manual changes not reverted), no clear audit trail, still requires kubectl access from CI, no environment-specific configs, and manual rollback process.
Production GitOps architecture using ArgoCD with complete environment separation, automated sync policies, Kustomize overlays for environment-specific configs, AppProjects for RBAC, sync windows for production control, self-healing to prevent drift, and integrated CI/CD pipeline that updates Git repos to trigger deployments.
Junior Approach: Basic CI/CD Without GitOps
The junior developer sets up a simple Jenkins pipeline:
# Jenkins pipeline
stage('Deploy') {
sh 'kubectl apply -f deployment.yaml'
}Problems with this approach:
- No single source of truth (deployments happen from CI, not Git)
- Configuration drift (cluster state may differ from Git)
- No automatic healing (manual changes not reverted)
- No clear audit trail
- Still requires kubectl access from CI
- No environment-specific configs
- Manual rollback process
Senior Approach: Production GitOps Architecture
This mirrors what companies like Weaveworks, Intuit, and Adobe use. Here’s the complete solution:
GitOps Architecture Overview
┌─────────────────┐
│ Git Repository │ (Source of Truth)
│ - Helm charts │
│ - Manifests │
│ - Kustomize │
└────────┬────────┘
│
│ Git Commit
↓
┌─────────────────┐
│ ArgoCD │ (Continuous Delivery)
│ - Monitors Git │
│ - Syncs K8s │
│ - Auto-heal │
└────────┬────────┘
│
│ kubectl apply
↓
┌─────────────────┐
│ Kubernetes │ (Runtime)
│ - Dev cluster │
│ - Staging │
│ - Production │
└─────────────────┘Repository Structure (GitOps Best Practice)
company-gitops/
├── apps/ # Application definitions
│ ├── frontend/
│ │ ├── base/ # Common configs
│ │ │ ├── deployment.yaml
│ │ │ ├── service.yaml
│ │ │ └── kustomization.yaml
│ │ ├── overlays/
│ │ │ ├── dev/
│ │ │ │ ├── kustomization.yaml
│ │ │ │ └── config.yaml
│ │ │ ├── staging/
│ │ │ │ ├── kustomization.yaml
│ │ │ │ └── config.yaml
│ │ │ └── production/
│ │ │ ├── kustomization.yaml
│ │ │ └── config.yaml
│ ├── api-service/
│ └── payment-service/
│
├── infrastructure/ # Cluster configs
│ ├── namespaces/
│ ├── ingress/
│ ├── monitoring/
│ └── storage/
│
├── argocd/ # ArgoCD application definitions
│ ├── apps/
│ │ ├── frontend-dev.yaml
│ │ ├── frontend-staging.yaml
│ │ └── frontend-prod.yaml
│ └── projects/
│ ├── dev-project.yaml
│ ├── staging-project.yaml
│ └── prod-project.yaml
│
└── README.mdComplete ArgoCD Installation
# 1. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f \
https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 2. Expose ArgoCD UI
kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
# 3. Get initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# 4. Login to ArgoCD
argocd login <ARGOCD_SERVER>
argocd account update-passwordArgoCD Project Configuration (Environment Isolation)
---
# Production project - strict controls
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: production
namespace: argocd
spec:
description: Production environment
# Which Git repos are allowed
sourceRepos:
- 'https://github.com/company/gitops.git'
# Where apps can be deployed
destinations:
- namespace: 'prod-*'
server: 'https://kubernetes.default.svc'
# What resources can be managed
clusterResourceWhitelist:
- group: '*'
kind: '*'
# Require manual approval for prod
syncWindows:
- kind: allow
schedule: '0 10 * * 1-5' # Mon-Fri 10 AM only
duration: 8h
applications:
- '*'
manualSync: true # Require manual approval
# Orphaned resource protection
orphanedResources:
warn: true
---
# Development project - more permissive
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: development
namespace: argocd
spec:
description: Development environment
sourceRepos:
- 'https://github.com/company/gitops.git'
destinations:
- namespace: 'dev-*'
server: 'https://kubernetes.default.svc'
clusterResourceWhitelist:
- group: '*'
kind: '*'
# Auto-sync enabled for dev
syncWindows:
- kind: allow
schedule: '* * * * *' # Always allowed
duration: 24h
applications:
- '*'Application Configuration (Frontend Service Example)
---
# ArgoCD Application for Production
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: frontend-production
namespace: argocd
# Finalizer ensures cascade delete
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
# Project for RBAC
project: production
# Git source
source:
repoURL: 'https://github.com/company/gitops.git'
targetRevision: main
path: apps/frontend/overlays/production
# Use Kustomize for config management
kustomize:
namePrefix: prod-
commonLabels:
environment: production
images:
- name: company/frontend
newTag: v1.2.4 # Managed by CI/CD
# Destination cluster and namespace
destination:
server: 'https://kubernetes.default.svc'
namespace: prod-frontend
# Sync policy
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Auto-correct manual kubectl changes
allowEmpty: false
# Sync options
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
# Retry strategy
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# Health checks
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas # Ignore HPA-controlled replicas
---
# ArgoCD Application for Staging (auto-sync enabled)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: frontend-staging
namespace: argocd
spec:
project: staging
source:
repoURL: 'https://github.com/company/gitops.git'
targetRevision: main
path: apps/frontend/overlays/staging
destination:
server: 'https://kubernetes.default.svc'
namespace: staging-frontend
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
---
# ArgoCD Application for Development (fastest sync)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: frontend-dev
namespace: argocd
spec:
project: development
source:
repoURL: 'https://github.com/company/gitops.git'
targetRevision: main # Or 'dev' branch for dev environment
path: apps/frontend/overlays/dev
destination:
server: 'https://kubernetes.default.svc'
namespace: dev-frontend
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueApplication Manifests (Kustomize Structure)
Base deployment (apps/frontend/base/deployment.yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3 # Overridden by overlays
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: company/frontend:latest # Overridden by ArgoCD
ports:
- containerPort: 80
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: ENVIRONMENT
value: "production" # Overridden by overlays
---
apiVersion: v1
kind: Service
metadata:
name: frontend
spec:
selector:
app: frontend
ports:
- port: 80
targetPort: 80Production overlay (apps/frontend/overlays/production/kustomization.yaml):
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# Base resources
resources:
- ../../base
# Production-specific namespace
namespace: prod-frontend
# Labels for all resources
commonLabels:
environment: production
managed-by: argocd
# Production-specific patches
patchesStrategicMerge:
- deployment-patch.yaml
# Production replicas
replicas:
- name: frontend
count: 10 # Higher replicas for production
# Production image
images:
- name: company/frontend
newTag: v1.2.4 # Specific stable versionProduction deployment patch:
# apps/frontend/overlays/production/deployment-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
template:
spec:
containers:
- name: frontend
env:
- name: ENVIRONMENT
value: "production"
- name: API_URL
value: "https://api.company.com"
- name: LOG_LEVEL
value: "info"
resources:
requests:
memory: "512Mi" # Higher for production
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"GitOps Workflow: CI/CD Pipeline Integration
# .github/workflows/deploy.yml
name: Deploy to Kubernetes
on:
push:
branches:
- main
paths:
- 'src/**'
- 'Dockerfile'
jobs:
build-and-update:
runs-on: ubuntu-latest
steps:
# Build Docker image
- name: Checkout code
uses: actions/checkout@v3
- name: Build and push Docker image
run: |
docker build -t company/frontend:${{ github.sha }} .
docker tag company/frontend:${{ github.sha }} company/frontend:latest
docker push company/frontend:${{ github.sha }}
docker push company/frontend:latest
# Update GitOps repository
- name: Update image tag in GitOps repo
run: |
git clone https://github.com/company/gitops.git
cd gitops
# Update development environment (auto-deploy)
cd apps/frontend/overlays/dev
kustomize edit set image company/frontend:${{ github.sha }}
# Commit and push
git config user.name "GitHub Actions"
git config user.email "actions@github.com"
git add .
git commit -m "Update frontend image to ${{ github.sha }}"
git push
# ArgoCD will automatically sync the change to dev cluster
promote-to-staging:
needs: build-and-update
runs-on: ubuntu-latest
# Only run if tests pass
if: github.ref == 'refs/heads/main'
steps:
- name: Run integration tests
run: |
# Wait for dev deployment
sleep 60
# Run tests against dev environment
npm run test:integration
- name: Promote to staging
if: success()
run: |
git clone https://github.com/company/gitops.git
cd gitops/apps/frontend/overlays/staging
kustomize edit set image company/frontend:${{ github.sha }}
git commit -am "Promote frontend to staging: ${{ github.sha }}"
git push
promote-to-production:
needs: promote-to-staging
runs-on: ubuntu-latest
# Require manual approval for production
environment: production
steps:
- name: Promote to production
run: |
git clone https://github.com/company/gitops.git
cd gitops/apps/frontend/overlays/production
kustomize edit set image company/frontend:${{ github.sha }}
git commit -am "Deploy frontend to production: ${{ github.sha }}"
git push
# ArgoCD syncs during production sync window (Mon-Fri 10 AM)Rollback Strategy
Instant rollback via Git revert:
# Check Git history
cd gitops
git log --oneline apps/frontend/overlays/production/kustomization.yaml
abc123 Deploy frontend to production: sha256abc
def456 Deploy frontend to production: sha256def # Last good version
ghi789 Deploy frontend to production: sha256ghi
# Rollback to previous version
git revert abc123
git push
# ArgoCD automatically syncs the rollback within 3 minutes
# Or trigger immediate sync:
argocd app sync frontend-productionMonitoring GitOps Deployments
# Prometheus alert for sync failures
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: argocd-alerts
namespace: monitoring
spec:
groups:
- name: argocd
rules:
- alert: ArgocdAppOutOfSync
expr: |
argocd_app_info{sync_status="OutOfSync"} > 0
for: 10m
labels:
severity: warning
annotations:
summary: "ArgoCD app {{ $labels.name }} is out of sync"
description: "Application has been out of sync for 10+ minutes"
- alert: ArgocdAppSyncFailed
expr: |
argocd_app_sync_total{phase="Failed"} > 0
labels:
severity: critical
annotations:
summary: "ArgoCD sync failed for {{ $labels.name }}"Security and RBAC for GitOps
# ArgoCD RBAC policy (restrict production access)
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
policy.csv: |
# Developers: read-only on production
p, role:developer, applications, get, production/*, allow
p, role:developer, applications, sync, development/*, allow
p, role:developer, applications, sync, staging/*, allow
# Platform team: full access
p, role:platform-admin, applications, *, */*, allow
p, role:platform-admin, clusters, *, *, allow
p, role:platform-admin, repositories, *, *, allow
# Bind users to roles
g, alice@company.com, role:platform-admin
g, dev-team, role:developerComplete Deployment Workflow Example
┌─────────────────────────────────────────────────────────┐
│ 1. Developer commits code to application repository │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 2. CI builds Docker image: company/frontend:sha256abc │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 3. CI updates GitOps repo: overlays/dev/kustomization │
│ sets image: company/frontend:sha256abc │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 4. ArgoCD detects Git change (polls every 3 minutes) │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 5. ArgoCD syncs to dev cluster (automatic) │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 6. Integration tests run against dev │
└────────────────────┬────────────────────────────────────┘
│
↓ (tests pass)
┌─────────────────────────────────────────────────────────┐
│ 7. CI updates staging overlay with same image tag │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 8. ArgoCD syncs to staging (automatic) │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 9. QA team approves staging deployment │
└────────────────────┬────────────────────────────────────┘
│
↓ (manual approval)
┌─────────────────────────────────────────────────────────┐
│ 10. CI updates production overlay │
└────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────┐
│ 11. ArgoCD syncs during sync window (manual trigger) │
│ Production deployment complete! │
└─────────────────────────────────────────────────────────┘GitOps Best Practices Checklist
- Separate Git repos: application code vs GitOps configs
- Use Kustomize or Helm for environment-specific configs
- Implement environment promotion (dev → staging → prod)
- Require manual approval for production deployments
- Use sync windows to control when prod deploys happen
- Enable auto-heal to prevent configuration drift
- Enable auto-prune to remove deleted resources
- Set up RBAC to restrict production access
- Monitor ArgoCD sync status with Prometheus alerts
- Use Git tags/branches for production stability
- Document rollback procedures
- Test GitOps workflow in dev before applying to prod
- Context over facts: Explains when and why, not just what
- Real examples: Provides specific use cases from production experience
- Trade-offs: Acknowledges pros, cons, and decision factors
Practice Question
Your ArgoCD Application has selfHeal: true enabled. A developer runs kubectl scale deployment frontend --replicas=20 directly on the production cluster to handle a traffic spike. What happens?