DeployU
Interviews / Cloud & DevOps / Design a secure VPC architecture with private GKE clusters and controlled internet access.

Design a secure VPC architecture with private GKE clusters and controlled internet access.

architecture Networking Interactive Quiz Code Examples

The Scenario

You’re designing the network architecture for a healthcare application that must:

  • Keep all compute resources private (no public IPs)
  • Allow GKE pods to pull images from gcr.io and access GCP APIs
  • Enable controlled outbound internet access for specific services
  • Support multiple environments (dev/staging/prod) with isolation
  • Meet HIPAA compliance requirements

The Challenge

Design a VPC architecture using GCP networking primitives: Private Google Access, Cloud NAT, Shared VPC, and firewall rules. Explain the tradeoffs.

Wrong Approach

A junior engineer might assign public IPs to all resources for simplicity, use default firewall rules, create separate VPCs without connectivity, or skip Private Google Access. This creates security vulnerabilities, compliance violations, and operational complexity.

Right Approach

A senior engineer designs a Shared VPC with private subnets, enables Private Google Access for GCP API calls, configures Cloud NAT for controlled outbound access, uses hierarchical firewall policies, and implements VPC Service Controls for data exfiltration prevention.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                         Host Project                             │
│                        (Shared VPC)                              │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                        VPC Network                          │ │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │ │
│  │  │ us-central1 │  │ us-east1    │  │ europe-west1│        │ │
│  │  │ 10.0.0.0/20 │  │ 10.1.0.0/20 │  │ 10.2.0.0/20 │        │ │
│  │  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘        │ │
│  │         │                │                │                │ │
│  │         └────────────────┼────────────────┘                │ │
│  │                          │                                  │ │
│  │  ┌──────────────────────┴──────────────────────┐          │ │
│  │  │              Cloud Router                    │          │ │
│  │  │         + Cloud NAT (all regions)           │          │ │
│  │  └─────────────────────────────────────────────┘          │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Dev Project  │    │Staging Project│   │ Prod Project │
│(Service Proj)│    │(Service Proj) │   │(Service Proj)│
│   GKE, GCE   │    │   GKE, GCE   │    │   GKE, GCE   │
└──────────────┘    └──────────────┘    └──────────────┘

Step 1: Create Shared VPC Host Project

# Enable Shared VPC in host project
gcloud compute shared-vpc enable host-project

# Associate service projects
gcloud compute shared-vpc associated-projects add dev-project \
  --host-project=host-project

gcloud compute shared-vpc associated-projects add prod-project \
  --host-project=host-project

Step 2: Create VPC and Subnets

# terraform/network/main.tf

resource "google_compute_network" "main" {
  name                    = "shared-vpc"
  project                 = var.host_project
  auto_create_subnetworks = false
  routing_mode            = "GLOBAL"
}

# Regional subnets with secondary ranges for GKE
resource "google_compute_subnetwork" "regional" {
  for_each = var.regions

  name                     = "subnet-${each.key}"
  project                  = var.host_project
  region                   = each.key
  network                  = google_compute_network.main.id
  ip_cidr_range            = each.value.primary_range
  private_ip_google_access = true  # Critical for private clusters!

  # Secondary ranges for GKE pods and services
  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = each.value.pods_range
  }

  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = each.value.services_range
  }

  log_config {
    aggregation_interval = "INTERVAL_5_SEC"
    flow_sampling        = 0.5
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

variable "regions" {
  default = {
    "us-central1" = {
      primary_range  = "10.0.0.0/20"
      pods_range     = "10.100.0.0/14"
      services_range = "10.104.0.0/20"
    }
    "us-east1" = {
      primary_range  = "10.1.0.0/20"
      pods_range     = "10.108.0.0/14"
      services_range = "10.112.0.0/20"
    }
  }
}

Step 3: Configure Cloud NAT for Outbound Access

# Cloud Router per region
resource "google_compute_router" "regional" {
  for_each = var.regions

  name    = "router-${each.key}"
  project = var.host_project
  region  = each.key
  network = google_compute_network.main.id
}

# Cloud NAT for outbound internet access
resource "google_compute_router_nat" "regional" {
  for_each = var.regions

  name                               = "nat-${each.key}"
  project                            = var.host_project
  router                             = google_compute_router.regional[each.key].name
  region                             = each.key
  nat_ip_allocate_option             = "AUTO_ONLY"
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"

  log_config {
    enable = true
    filter = "ERRORS_ONLY"
  }

  # Timeouts for connection tracking
  tcp_established_idle_timeout_sec = 1200
  tcp_transitory_idle_timeout_sec  = 30
  udp_idle_timeout_sec             = 30
}

Step 4: Create Private GKE Cluster

resource "google_container_cluster" "private" {
  name     = "private-cluster"
  project  = var.service_project
  location = "us-central1"

  # Use Shared VPC
  network    = "projects/${var.host_project}/global/networks/shared-vpc"
  subnetwork = "projects/${var.host_project}/regions/us-central1/subnetworks/subnet-us-central1"

  # Private cluster configuration
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false  # Allow kubectl from authorized networks
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  # Use secondary ranges for pods/services
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  # Authorized networks for master access
  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = "10.0.0.0/8"
      display_name = "Internal VPC"
    }
    cidr_blocks {
      cidr_block   = var.admin_cidr
      display_name = "Admin Access"
    }
  }

  # Workload Identity for pod service accounts
  workload_identity_config {
    workload_pool = "${var.service_project}.svc.id.goog"
  }

  # VPC-native cluster
  networking_mode = "VPC_NATIVE"
}

Step 5: Implement Hierarchical Firewall Policies

# Organization-level firewall policy
resource "google_compute_firewall_policy" "org_policy" {
  short_name = "org-security-policy"
  parent     = "organizations/${var.org_id}"
}

# Deny all ingress by default
resource "google_compute_firewall_policy_rule" "deny_ingress_default" {
  firewall_policy = google_compute_firewall_policy.org_policy.id
  priority        = 65534
  action          = "deny"
  direction       = "INGRESS"
  match {
    layer4_configs {
      ip_protocol = "all"
    }
  }
}

# Allow internal communication
resource "google_compute_firewall_policy_rule" "allow_internal" {
  firewall_policy = google_compute_firewall_policy.org_policy.id
  priority        = 1000
  action          = "allow"
  direction       = "INGRESS"
  match {
    src_ip_ranges = ["10.0.0.0/8"]
    layer4_configs {
      ip_protocol = "all"
    }
  }
}

# Allow GCP health checks
resource "google_compute_firewall_policy_rule" "allow_health_checks" {
  firewall_policy = google_compute_firewall_policy.org_policy.id
  priority        = 1001
  action          = "allow"
  direction       = "INGRESS"
  match {
    src_ip_ranges = [
      "35.191.0.0/16",    # Health check ranges
      "130.211.0.0/22"
    ]
    layer4_configs {
      ip_protocol = "tcp"
    }
  }
}

# VPC-level firewall rules for specific needs
resource "google_compute_firewall" "allow_iap" {
  name    = "allow-iap-ssh"
  project = var.host_project
  network = google_compute_network.main.name

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  source_ranges = ["35.235.240.0/20"]  # IAP range
  target_tags   = ["allow-iap"]
}

Step 6: VPC Service Controls (Data Exfiltration Prevention)

# Service perimeter for sensitive data
resource "google_access_context_manager_service_perimeter" "healthcare" {
  parent = "accessPolicies/${var.access_policy_id}"
  name   = "accessPolicies/${var.access_policy_id}/servicePerimeters/healthcare"
  title  = "Healthcare Data Perimeter"

  status {
    resources = [
      "projects/${var.prod_project_number}"
    ]

    restricted_services = [
      "storage.googleapis.com",
      "bigquery.googleapis.com",
      "healthcare.googleapis.com"
    ]

    # Allow access from VPC
    vpc_accessible_services {
      enable_restriction = true
      allowed_services   = ["RESTRICTED-SERVICES"]
    }

    ingress_policies {
      ingress_from {
        sources {
          access_level = google_access_context_manager_access_level.corp_network.name
        }
      }
      ingress_to {
        resources = ["*"]
        operations {
          service_name = "storage.googleapis.com"
          method_selectors {
            method = "*"
          }
        }
      }
    }
  }
}

Network Design Summary

ComponentPurposeConfiguration
Shared VPCCentralized network managementHost + service projects
Private Google AccessAccess GCP APIs without public IPsEnabled on subnets
Cloud NATControlled outbound internetPer-region with logging
Private GKENo public IPs on nodesPrivate nodes + authorized networks
Firewall PoliciesHierarchical security rulesOrg → Folder → Project
VPC Service ControlsData exfiltration preventionService perimeters

Practice Question

Why is Private Google Access required for private GKE clusters to function properly?