DeployU
Interviews / Cloud & DevOps / Your Azure bill increased 50% last month. Identify waste and implement cost controls.

Your Azure bill increased 50% last month. Identify waste and implement cost controls.

practical Cost Optimization Interactive Quiz Code Examples

The Scenario

Your Azure spending is out of control:

Monthly Bill Breakdown:
├── Virtual Machines: $45,000 (30%)
│   └── Many D-series running 24/7 including dev/test
├── Azure SQL: $30,000 (20%)
│   └── Premium tier for all databases
├── Storage: $22,500 (15%)
│   └── All data in Hot tier
├── AKS: $25,500 (17%)
│   └── Overprovisioned node pools
├── App Services: $15,000 (10%)
│   └── Premium V3 for all environments
└── Other: $12,000 (8%)

Total: $150,000/month
YoY Growth: 40%
Reserved Instance Coverage: 0%

Finance is asking for a 30% cost reduction without impacting performance.

The Challenge

Implement a comprehensive cost optimization strategy using Azure Cost Management, reservations, right-sizing, and architectural improvements.

Wrong Approach

A junior engineer might delete resources randomly, downgrade everything to the smallest size, skip reserved instances because of commitment fear, or ignore the problem hoping it goes away. These approaches break applications, cause performance issues, or don't address the root causes.

Right Approach

A senior engineer analyzes usage patterns with Azure Advisor and Cost Management, implements reserved instances for stable workloads, right-sizes resources based on metrics, uses auto-scaling, implements proper resource lifecycle management, and sets up budgets with alerts.

Step 1: Analyze Current Spending

# Get cost breakdown by resource group
az consumption usage list \
  --start-date 2024-01-01 \
  --end-date 2024-01-31 \
  --query "[].{ResourceGroup:resourceGroup,Cost:pretaxCost}" \
  --output table

# Get Azure Advisor recommendations
az advisor recommendation list \
  --category Cost \
  --output table

# Export cost data for analysis
az costmanagement query \
  --type Usage \
  --scope "/subscriptions/{subscription-id}" \
  --timeframe MonthToDate \
  --dataset-grouping name=ResourceGroup type=Dimension \
  --dataset-aggregation '{"totalCost":{"name":"Cost","function":"Sum"}}'

Step 2: Implement Reserved Instances

// Reserved Instance savings calculator
// Standard D4s_v5 (4 vCPU, 16GB) in East US:
// - Pay-as-you-go: $140.16/month
// - 1-year reserved: $89.79/month (36% savings)
// - 3-year reserved: $57.67/month (59% savings)

// For 10 production VMs running 24/7:
// - Current: 10 × $140.16 = $1,401.60/month
// - With 3-year RI: 10 × $57.67 = $576.70/month
// - Annual savings: $9,899/year

// Purchase recommendations based on usage patterns
resource reservationOrder 'Microsoft.Capacity/reservationOrders@2022-11-01' = {
  name: 'ro-production-vms'
  location: 'global'
  properties: {
    reservedResourceType: 'VirtualMachines'
    billingScopeId: subscription().id
    term: 'P3Y'  // 3-year term
    billingPlan: 'Monthly'
    quantity: 10
    displayName: 'Production VM Reservations'
    appliedScopes: [subscription().id]
    appliedScopeType: 'Shared'  // Apply across subscriptions
    renew: true
  }
}

// Azure SQL Database reservations
// vCore reservations apply across all SQL products
resource sqlReservation 'Microsoft.Capacity/reservationOrders@2022-11-01' = {
  name: 'ro-sql-vcores'
  location: 'global'
  properties: {
    reservedResourceType: 'SqlDatabases'
    term: 'P1Y'
    quantity: 24  // Total vCores
    displayName: 'SQL vCore Reservations'
    appliedScopeType: 'Shared'
  }
}

Step 3: Right-Size Virtual Machines

// Enable Azure Monitor for VM metrics analysis
resource vmInsights 'Microsoft.Insights/dataCollectionRules@2022-06-01' = {
  name: 'dcr-vm-performance'
  location: location
  properties: {
    dataSources: {
      performanceCounters: [
        {
          name: 'VMPerformance'
          streams: ['Microsoft-Perf']
          samplingFrequencyInSeconds: 60
          counterSpecifiers: [
            '\\Processor Information(_Total)\\% Processor Time'
            '\\Memory\\% Committed Bytes In Use'
            '\\LogicalDisk(_Total)\\% Disk Read Time'
            '\\LogicalDisk(_Total)\\% Disk Write Time'
          ]
        }
      ]
    }
    destinations: {
      logAnalytics: [
        {
          workspaceResourceId: logAnalytics.id
          name: 'vmLogs'
        }
      ]
    }
  }
}
// KQL query to find oversized VMs
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where TimeGenerated > ago(30d)
| summarize AvgCPU = avg(CounterValue),
            MaxCPU = max(CounterValue),
            P95CPU = percentile(CounterValue, 95)
by Computer
| where P95CPU < 20  // VMs with P95 CPU < 20% are oversized
| order by AvgCPU asc

// Memory utilization
Perf
| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
| where TimeGenerated > ago(30d)
| summarize AvgMemory = avg(CounterValue),
            MaxMemory = max(CounterValue),
            P95Memory = percentile(CounterValue, 95)
by Computer
| where P95Memory < 40  // VMs with P95 Memory < 40% can be downsized
# Resize VM based on analysis
az vm resize \
  --resource-group rg-production \
  --name vm-web-01 \
  --size Standard_D2s_v5  # Downsize from D4s_v5

# Savings: D4s_v5 ($140/mo) → D2s_v5 ($70/mo) = 50% per VM

Step 4: Auto-Shutdown for Non-Production

// Auto-shutdown for dev/test VMs
resource autoShutdown 'Microsoft.DevTestLab/schedules@2018-09-15' = {
  name: 'shutdown-computevm-${vmName}'
  location: location
  properties: {
    status: 'Enabled'
    taskType: 'ComputeVmShutdownTask'
    dailyRecurrence: {
      time: '1900'  // 7 PM
    }
    timeZoneId: 'Eastern Standard Time'
    notificationSettings: {
      status: 'Enabled'
      timeInMinutes: 30
      emailRecipient: 'team@contoso.com'
    }
    targetResourceId: vm.id
  }
}

// Start VMs on schedule using Automation
resource automationRunbook 'Microsoft.Automation/automationAccounts/runbooks@2022-08-08' = {
  parent: automationAccount
  name: 'Start-DevVMs'
  location: location
  properties: {
    runbookType: 'PowerShell'
    logProgress: true
    logVerbose: false
    publishContentLink: {
      uri: 'https://raw.githubusercontent.com/contoso/runbooks/main/Start-DevVMs.ps1'
    }
  }
}

resource startSchedule 'Microsoft.Automation/automationAccounts/schedules@2022-08-08' = {
  parent: automationAccount
  name: 'StartDevVMsWeekday'
  properties: {
    startTime: '2024-01-01T08:00:00+00:00'
    frequency: 'Week'
    interval: 1
    timeZone: 'Eastern Standard Time'
    advancedSchedule: {
      weekDays: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    }
  }
}

// Savings: Dev VMs running 10hrs/day × 5 days = 50hrs vs 720hrs
// = 93% cost reduction for dev VMs

Step 5: Optimize Azure SQL

-- Identify unused indexes
SELECT
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    ius.user_seeks,
    ius.user_scans,
    ius.user_lookups,
    ius.user_updates
FROM sys.indexes i
JOIN sys.dm_db_index_usage_stats ius
    ON i.object_id = ius.object_id AND i.index_id = ius.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
    AND ius.user_seeks = 0
    AND ius.user_scans = 0
    AND ius.user_lookups = 0
ORDER BY ius.user_updates DESC;

-- Check DTU/vCore utilization
SELECT
    AVG(avg_cpu_percent) as AvgCPU,
    MAX(avg_cpu_percent) as MaxCPU,
    AVG(avg_data_io_percent) as AvgIO,
    AVG(avg_memory_usage_percent) as AvgMemory
FROM sys.dm_db_resource_stats
WHERE end_time > DATEADD(day, -14, GETUTCDATE());
// Right-size based on analysis
resource sqlDatabase 'Microsoft.Sql/servers/databases@2023-05-01-preview' = {
  parent: sqlServer
  name: 'appdb'
  location: location
  sku: {
    // Before: Premium P4 (500 DTU) - $1,860/month
    // After: Standard S3 (100 DTU) - $150/month
    // Or: General Purpose 2 vCore - $370/month

    name: 'GP_S_Gen5'  // Serverless for variable workloads
    tier: 'GeneralPurpose'
    family: 'Gen5'
    capacity: 2
  }
  properties: {
    autoPauseDelay: 60  // Pause after 1 hour of inactivity
    minCapacity: 0.5     // Minimum 0.5 vCores when active
    zoneRedundant: false // Disable for non-prod
  }
}

// Use Elastic Pools for multiple databases
resource elasticPool 'Microsoft.Sql/servers/elasticPools@2023-05-01-preview' = {
  parent: sqlServer
  name: 'pool-shared'
  location: location
  sku: {
    name: 'GP_Gen5'
    tier: 'GeneralPurpose'
    family: 'Gen5'
    capacity: 4  // 4 vCores shared across databases
  }
  properties: {
    perDatabaseSettings: {
      minCapacity: 0
      maxCapacity: 2
    }
  }
}
// 10 databases × $370/month = $3,700
// vs Elastic Pool 4 vCore: $740/month = 80% savings

Step 6: Optimize AKS

// Right-size AKS node pools
resource aksCluster 'Microsoft.ContainerService/managedClusters@2023-05-01' = {
  name: aksName
  location: location
  properties: {
    agentPoolProfiles: [
      {
        name: 'system'
        count: 2  // Reduced from 3
        vmSize: 'Standard_D2s_v5'  // Reduced from D4s_v5
        mode: 'System'
        enableAutoScaling: true
        minCount: 2
        maxCount: 3
      }
      {
        name: 'workload'
        count: 3
        vmSize: 'Standard_D4s_v5'
        mode: 'User'
        enableAutoScaling: true
        minCount: 2
        maxCount: 10  // Scale up only when needed

        // Use spot instances for non-critical workloads
        scaleSetPriority: 'Spot'
        spotMaxPrice: -1  // Pay up to on-demand price
        scaleSetEvictionPolicy: 'Delete'

        nodeLabels: {
          'workload-type': 'batch'
        }
        nodeTaints: [
          'kubernetes.azure.com/scalesetpriority=spot:NoSchedule'
        ]
      }
    ]
  }
}

// Spot instance savings: ~60-90% vs regular VMs
# Kubernetes resource optimization
apiVersion: v1
kind: LimitRange
metadata:
  name: resource-limits
  namespace: production
spec:
  limits:
    - default:
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      type: Container

---
# Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Step 7: Set Up Cost Alerts and Budgets

// Budget with alerts
resource budget 'Microsoft.Consumption/budgets@2023-05-01' = {
  name: 'monthly-budget'
  properties: {
    category: 'Cost'
    amount: 120000  // Target: $120K (20% reduction)
    timeGrain: 'Monthly'
    timePeriod: {
      startDate: '2024-01-01'
      endDate: '2025-12-31'
    }
    filter: {
      dimensions: {
        name: 'ResourceGroup'
        operator: 'In'
        values: ['rg-production', 'rg-staging', 'rg-development']
      }
    }
    notifications: {
      Actual_GreaterThan_80_Percent: {
        enabled: true
        operator: 'GreaterThan'
        threshold: 80
        contactEmails: ['finance@contoso.com', 'platform@contoso.com']
        contactRoles: ['Owner', 'Contributor']
        thresholdType: 'Actual'
      }
      Forecasted_GreaterThan_100_Percent: {
        enabled: true
        operator: 'GreaterThan'
        threshold: 100
        contactEmails: ['finance@contoso.com', 'cto@contoso.com']
        thresholdType: 'Forecasted'
      }
    }
  }
}

// Resource group level budget
resource rgBudget 'Microsoft.Consumption/budgets@2023-05-01' = {
  name: 'dev-budget'
  scope: resourceGroup('rg-development')
  properties: {
    category: 'Cost'
    amount: 5000
    timeGrain: 'Monthly'
    notifications: {
      Actual_GreaterThan_90_Percent: {
        enabled: true
        operator: 'GreaterThan'
        threshold: 90
        contactEmails: ['dev-lead@contoso.com']
      }
    }
  }
}

Step 8: Implement Cost Tagging

// Enforce tagging policy
resource taggingPolicy 'Microsoft.Authorization/policyDefinitions@2021-06-01' = {
  name: 'require-cost-tags'
  properties: {
    policyType: 'Custom'
    mode: 'Indexed'
    displayName: 'Require cost center and environment tags'
    policyRule: {
      if: {
        anyOf: [
          {
            field: 'tags[CostCenter]'
            exists: 'false'
          }
          {
            field: 'tags[Environment]'
            exists: 'false'
          }
          {
            field: 'tags[Owner]'
            exists: 'false'
          }
        ]
      }
      then: {
        effect: 'deny'
      }
    }
  }
}

// Apply tags to all resources
resource tagPolicy 'Microsoft.Resources/tags@2021-04-01' = {
  name: 'default'
  properties: {
    tags: {
      Environment: environment
      CostCenter: costCenter
      Owner: ownerEmail
      Project: projectName
      CreatedBy: 'Bicep'
      CreatedDate: utcNow('yyyy-MM-dd')
    }
  }
}

Cost Optimization Summary

Optimization Results:

BEFORE ($150,000/month):
├── VMs: $45,000
├── SQL: $30,000
├── Storage: $22,500
├── AKS: $25,500
├── App Services: $15,000
└── Other: $12,000

AFTER ($100,000/month):
├── VMs: $25,000 (-44%)
│   ├── Reserved instances: -$10,000
│   ├── Right-sizing: -$5,000
│   └── Auto-shutdown dev: -$5,000
├── SQL: $18,000 (-40%)
│   ├── Elastic pools: -$8,000
│   └── Serverless: -$4,000
├── Storage: $15,000 (-33%)
│   └── Lifecycle policies: -$7,500
├── AKS: $18,000 (-29%)
│   ├── Spot instances: -$5,000
│   └── Autoscaling: -$2,500
├── App Services: $12,000 (-20%)
│   └── Right-size non-prod: -$3,000
└── Other: $12,000

TOTAL SAVINGS: $50,000/month (33% reduction)
ANNUAL SAVINGS: $600,000

Cost Optimization Strategies

StrategySavingsEffortRisk
Reserved Instances30-60%LowCommitment
Spot Instances60-90%MediumInterruption
Right-sizing20-50%MediumPerformance
Auto-shutdown50-90%LowAvailability
ServerlessVariableHighArchitecture

Quick Wins Checklist

ActionExpected Savings
Delete unattached disks$5-50/disk/month
Stop idle VMs100% of compute
Resize oversized VMs30-50% per VM
Enable auto-shutdown60% for dev/test
Use reserved instances30-60% for prod
Implement lifecycle policies30-90% on storage

Practice Question

Why should you analyze at least 14-30 days of metrics before right-sizing a VM?