DeployU
Interviews / Cloud & DevOps / Cosmos DB is consuming 50,000 RU/s and costing $3K/day. Optimize the data model and queries.

Cosmos DB is consuming 50,000 RU/s and costing $3K/day. Optimize the data model and queries.

practical Data Services Interactive Quiz Code Examples

The Scenario

Your Cosmos DB costs have exploded:

Current Usage:
├── Provisioned: 50,000 RU/s
├── Cost: ~$3,000/day (~$90,000/month)
├── Database: orders-db
├── Container: orders (1TB data)
├── Partition key: /userId
└── Most expensive query: GetOrdersByDate

Query analysis:
- GetOrdersByDate: 15,000 RU per execution, runs 100x/hour
- GetUserOrders: 500 RU per execution
- CreateOrder: 10 RU per execution

The Challenge

Reduce RU consumption by 80% through data modeling, query optimization, indexing, and partition strategy improvements.

Wrong Approach

A junior engineer might just increase RU/s when queries are slow, use SELECT * everywhere, ignore partition key design, or disable indexing to 'save RUs'. These approaches increase costs, waste bandwidth, cause cross-partition queries, or make queries even more expensive.

Right Approach

A senior engineer analyzes query patterns, redesigns the partition key for query locality, creates composite indexes, denormalizes data for common access patterns, uses change feed for materialized views, and implements proper TTL for transient data.

Step 1: Analyze Current Query Costs

// Enable query metrics in SDK
const { resource: item, requestCharge } = await container.items
    .query({
        query: "SELECT * FROM orders o WHERE o.orderDate >= @startDate",
        parameters: [{ name: "@startDate", value: "2024-01-01" }]
    }, {
        populateQueryMetrics: true
    })
    .fetchAll();

console.log(`Query cost: ${requestCharge} RUs`);

// In Azure Portal: Metrics > Total Request Units
// Filter by OperationType to find expensive queries

Step 2: Fix the Partition Key Problem

// CURRENT: Partition key is /userId
// Query: GetOrdersByDate - scans ALL partitions!

// Query that causes cross-partition fan-out (EXPENSIVE)
SELECT * FROM orders o
WHERE o.orderDate >= '2024-01-01'
// This must check EVERY partition because orderDate isn't the partition key

// SOLUTION 1: Hierarchical partition key (if using latest SDK)
{
    "partitionKey": {
        "paths": ["/tenantId", "/userId"],
        "kind": "MultiHash"
    }
}

// SOLUTION 2: Use composite partition key in data model
{
    "id": "order-123",
    "userId": "user-456",
    "orderDate": "2024-01-15",
    "pk": "2024-01-user-456",  // Composite partition key
    "yearMonth": "2024-01"     // For date-range queries
}

Step 3: Create Composite Indexes

{
    "indexingPolicy": {
        "automatic": true,
        "indexingMode": "consistent",
        "includedPaths": [
            { "path": "/*" }
        ],
        "excludedPaths": [
            { "path": "/largeDescription/*" },
            { "path": "/metadata/*" }
        ],
        "compositeIndexes": [
            [
                { "path": "/userId", "order": "ascending" },
                { "path": "/orderDate", "order": "descending" }
            ],
            [
                { "path": "/status", "order": "ascending" },
                { "path": "/orderDate", "order": "descending" }
            ]
        ]
    }
}
# Apply indexing policy
az cosmosdb sql container update \
    --account-name mycosmosdb \
    --database-name orders-db \
    --name orders \
    --idx @indexing-policy.json

Step 4: Denormalize for Read-Heavy Patterns

// BEFORE: Normalized (requires joins/lookups)
// Order document
{
    "id": "order-123",
    "userId": "user-456",
    "items": ["item-1", "item-2"]  // Just IDs, need lookup
}

// AFTER: Denormalized (single read)
{
    "id": "order-123",
    "userId": "user-456",
    "customerName": "John Doe",        // Embedded
    "customerEmail": "john@example.com", // Embedded
    "items": [
        {
            "productId": "prod-1",
            "productName": "Widget",    // Embedded
            "price": 29.99,
            "quantity": 2
        }
    ],
    "totalAmount": 59.98,
    "shippingAddress": {                // Embedded
        "street": "123 Main St",
        "city": "Seattle"
    }
}

// Trade-off: Larger documents, but single read instead of 5 reads
// 1 RU vs 5+ RUs per order retrieval

Step 5: Use Change Feed for Materialized Views

// Create materialized view container for date-based queries
// Partition key: /yearMonth for efficient date range queries

public class OrderMaterializedView
{
    public string id { get; set; }
    public string yearMonth { get; set; }  // Partition key: "2024-01"
    public string orderId { get; set; }
    public string userId { get; set; }
    public DateTime orderDate { get; set; }
    public decimal totalAmount { get; set; }
    public string status { get; set; }
}

// Change Feed processor to maintain the view
var changeFeedProcessor = container
    .GetChangeFeedProcessorBuilder<Order>("OrderViewBuilder", HandleChangesAsync)
    .WithInstanceName("OrderViewProcessor")
    .WithLeaseContainer(leaseContainer)
    .Build();

async Task HandleChangesAsync(
    IReadOnlyCollection<Order> changes,
    CancellationToken cancellationToken)
{
    foreach (var order in changes)
    {
        var view = new OrderMaterializedView
        {
            id = order.id,
            yearMonth = order.orderDate.ToString("yyyy-MM"),
            orderId = order.id,
            userId = order.userId,
            orderDate = order.orderDate,
            totalAmount = order.totalAmount,
            status = order.status
        };

        await viewContainer.UpsertItemAsync(view,
            new PartitionKey(view.yearMonth));
    }
}

// Now GetOrdersByDate queries the view container
// Single partition query: ~5 RU instead of 15,000 RU!

Step 6: Optimize Query Patterns

// BEFORE: Expensive query patterns
// 1. SELECT * (returns all properties)
SELECT * FROM orders o WHERE o.userId = 'user-123'

// 2. No partition key in query
SELECT * FROM orders o WHERE o.status = 'pending'

// 3. Inefficient pagination
SELECT * FROM orders o ORDER BY o.orderDate OFFSET 100 LIMIT 10

// AFTER: Optimized queries
// 1. Project only needed fields
SELECT o.id, o.orderDate, o.totalAmount
FROM orders o
WHERE o.userId = 'user-123'

// 2. Include partition key
SELECT o.id, o.status
FROM orders o
WHERE o.userId = 'user-123' AND o.status = 'pending'

// 3. Use continuation tokens
const { resources, continuationToken } = await container.items
    .query(query, { maxItemCount: 10, continuationToken: previousToken })
    .fetchNext();

Step 7: Implement TTL for Transient Data

// Container-level TTL
{
    "id": "orders",
    "partitionKey": { "paths": ["/userId"] },
    "defaultTtl": 7776000  // 90 days in seconds
}

// Document-level TTL override
{
    "id": "temp-cart-123",
    "userId": "user-456",
    "items": [...],
    "ttl": 3600  // This cart expires in 1 hour
}

Step 8: Switch to Autoscale

// Instead of fixed 50,000 RU/s, use autoscale
resource container 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2023-04-15' = {
  parent: database
  name: 'orders'
  properties: {
    resource: {
      id: 'orders'
      partitionKey: {
        paths: ['/userId']
        kind: 'Hash'
      }
    }
    options: {
      autoscaleSettings: {
        maxThroughput: 50000  // Scales between 5,000-50,000 RU/s
      }
    }
  }
}

// Cost savings: Only pay for what you use
// Peak: 50,000 RU/s during business hours
// Off-peak: 5,000 RU/s at night (90% cost reduction)

Cost Comparison

BEFORE:
- 50,000 RU/s provisioned (constant)
- $2,920/day × 30 = ~$87,600/month

AFTER:
- Materialized view: GetOrdersByDate 5 RU (was 15,000)
- Composite indexes: Queries 50% faster
- Denormalization: Order reads 1 RU (was 5)
- Autoscale: Average 15,000 RU/s (was 50,000 constant)

- ~$876/day × 30 = ~$26,280/month
- SAVINGS: ~$61,320/month (70% reduction)

Cosmos DB Optimization Checklist

OptimizationRU ImpactImplementation
Partition key design10-100xData model change
Composite indexes2-10xIndex policy update
Project specific fields1.5-3xQuery change
Denormalization2-5xData model change
Materialized views10-1000xNew container + change feed
TTL for temp dataCost reductionContainer setting

Practice Question

Why does a query filtering by orderDate cost 15,000 RUs when the partition key is userId?