Questions
Cosmos DB is consuming 50,000 RU/s and costing $3K/day. Optimize the data model and queries.
The Scenario
Your Cosmos DB costs have exploded:
Current Usage:
├── Provisioned: 50,000 RU/s
├── Cost: ~$3,000/day (~$90,000/month)
├── Database: orders-db
├── Container: orders (1TB data)
├── Partition key: /userId
└── Most expensive query: GetOrdersByDate
Query analysis:
- GetOrdersByDate: 15,000 RU per execution, runs 100x/hour
- GetUserOrders: 500 RU per execution
- CreateOrder: 10 RU per execution
The Challenge
Reduce RU consumption by 80% through data modeling, query optimization, indexing, and partition strategy improvements.
A junior engineer might just increase RU/s when queries are slow, use SELECT * everywhere, ignore partition key design, or disable indexing to 'save RUs'. These approaches increase costs, waste bandwidth, cause cross-partition queries, or make queries even more expensive.
A senior engineer analyzes query patterns, redesigns the partition key for query locality, creates composite indexes, denormalizes data for common access patterns, uses change feed for materialized views, and implements proper TTL for transient data.
Step 1: Analyze Current Query Costs
// Enable query metrics in SDK
const { resource: item, requestCharge } = await container.items
.query({
query: "SELECT * FROM orders o WHERE o.orderDate >= @startDate",
parameters: [{ name: "@startDate", value: "2024-01-01" }]
}, {
populateQueryMetrics: true
})
.fetchAll();
console.log(`Query cost: ${requestCharge} RUs`);
// In Azure Portal: Metrics > Total Request Units
// Filter by OperationType to find expensive queriesStep 2: Fix the Partition Key Problem
// CURRENT: Partition key is /userId
// Query: GetOrdersByDate - scans ALL partitions!
// Query that causes cross-partition fan-out (EXPENSIVE)
SELECT * FROM orders o
WHERE o.orderDate >= '2024-01-01'
// This must check EVERY partition because orderDate isn't the partition key
// SOLUTION 1: Hierarchical partition key (if using latest SDK)
{
"partitionKey": {
"paths": ["/tenantId", "/userId"],
"kind": "MultiHash"
}
}
// SOLUTION 2: Use composite partition key in data model
{
"id": "order-123",
"userId": "user-456",
"orderDate": "2024-01-15",
"pk": "2024-01-user-456", // Composite partition key
"yearMonth": "2024-01" // For date-range queries
}Step 3: Create Composite Indexes
{
"indexingPolicy": {
"automatic": true,
"indexingMode": "consistent",
"includedPaths": [
{ "path": "/*" }
],
"excludedPaths": [
{ "path": "/largeDescription/*" },
{ "path": "/metadata/*" }
],
"compositeIndexes": [
[
{ "path": "/userId", "order": "ascending" },
{ "path": "/orderDate", "order": "descending" }
],
[
{ "path": "/status", "order": "ascending" },
{ "path": "/orderDate", "order": "descending" }
]
]
}
}# Apply indexing policy
az cosmosdb sql container update \
--account-name mycosmosdb \
--database-name orders-db \
--name orders \
--idx @indexing-policy.jsonStep 4: Denormalize for Read-Heavy Patterns
// BEFORE: Normalized (requires joins/lookups)
// Order document
{
"id": "order-123",
"userId": "user-456",
"items": ["item-1", "item-2"] // Just IDs, need lookup
}
// AFTER: Denormalized (single read)
{
"id": "order-123",
"userId": "user-456",
"customerName": "John Doe", // Embedded
"customerEmail": "john@example.com", // Embedded
"items": [
{
"productId": "prod-1",
"productName": "Widget", // Embedded
"price": 29.99,
"quantity": 2
}
],
"totalAmount": 59.98,
"shippingAddress": { // Embedded
"street": "123 Main St",
"city": "Seattle"
}
}
// Trade-off: Larger documents, but single read instead of 5 reads
// 1 RU vs 5+ RUs per order retrievalStep 5: Use Change Feed for Materialized Views
// Create materialized view container for date-based queries
// Partition key: /yearMonth for efficient date range queries
public class OrderMaterializedView
{
public string id { get; set; }
public string yearMonth { get; set; } // Partition key: "2024-01"
public string orderId { get; set; }
public string userId { get; set; }
public DateTime orderDate { get; set; }
public decimal totalAmount { get; set; }
public string status { get; set; }
}
// Change Feed processor to maintain the view
var changeFeedProcessor = container
.GetChangeFeedProcessorBuilder<Order>("OrderViewBuilder", HandleChangesAsync)
.WithInstanceName("OrderViewProcessor")
.WithLeaseContainer(leaseContainer)
.Build();
async Task HandleChangesAsync(
IReadOnlyCollection<Order> changes,
CancellationToken cancellationToken)
{
foreach (var order in changes)
{
var view = new OrderMaterializedView
{
id = order.id,
yearMonth = order.orderDate.ToString("yyyy-MM"),
orderId = order.id,
userId = order.userId,
orderDate = order.orderDate,
totalAmount = order.totalAmount,
status = order.status
};
await viewContainer.UpsertItemAsync(view,
new PartitionKey(view.yearMonth));
}
}
// Now GetOrdersByDate queries the view container
// Single partition query: ~5 RU instead of 15,000 RU!Step 6: Optimize Query Patterns
// BEFORE: Expensive query patterns
// 1. SELECT * (returns all properties)
SELECT * FROM orders o WHERE o.userId = 'user-123'
// 2. No partition key in query
SELECT * FROM orders o WHERE o.status = 'pending'
// 3. Inefficient pagination
SELECT * FROM orders o ORDER BY o.orderDate OFFSET 100 LIMIT 10
// AFTER: Optimized queries
// 1. Project only needed fields
SELECT o.id, o.orderDate, o.totalAmount
FROM orders o
WHERE o.userId = 'user-123'
// 2. Include partition key
SELECT o.id, o.status
FROM orders o
WHERE o.userId = 'user-123' AND o.status = 'pending'
// 3. Use continuation tokens
const { resources, continuationToken } = await container.items
.query(query, { maxItemCount: 10, continuationToken: previousToken })
.fetchNext();Step 7: Implement TTL for Transient Data
// Container-level TTL
{
"id": "orders",
"partitionKey": { "paths": ["/userId"] },
"defaultTtl": 7776000 // 90 days in seconds
}
// Document-level TTL override
{
"id": "temp-cart-123",
"userId": "user-456",
"items": [...],
"ttl": 3600 // This cart expires in 1 hour
}Step 8: Switch to Autoscale
// Instead of fixed 50,000 RU/s, use autoscale
resource container 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2023-04-15' = {
parent: database
name: 'orders'
properties: {
resource: {
id: 'orders'
partitionKey: {
paths: ['/userId']
kind: 'Hash'
}
}
options: {
autoscaleSettings: {
maxThroughput: 50000 // Scales between 5,000-50,000 RU/s
}
}
}
}
// Cost savings: Only pay for what you use
// Peak: 50,000 RU/s during business hours
// Off-peak: 5,000 RU/s at night (90% cost reduction)Cost Comparison
BEFORE:
- 50,000 RU/s provisioned (constant)
- $2,920/day × 30 = ~$87,600/month
AFTER:
- Materialized view: GetOrdersByDate 5 RU (was 15,000)
- Composite indexes: Queries 50% faster
- Denormalization: Order reads 1 RU (was 5)
- Autoscale: Average 15,000 RU/s (was 50,000 constant)
- ~$876/day × 30 = ~$26,280/month
- SAVINGS: ~$61,320/month (70% reduction) Cosmos DB Optimization Checklist
| Optimization | RU Impact | Implementation |
|---|---|---|
| Partition key design | 10-100x | Data model change |
| Composite indexes | 2-10x | Index policy update |
| Project specific fields | 1.5-3x | Query change |
| Denormalization | 2-5x | Data model change |
| Materialized views | 10-1000x | New container + change feed |
| TTL for temp data | Cost reduction | Container setting |
Practice Question
Why does a query filtering by orderDate cost 15,000 RUs when the partition key is userId?