Skip to main content

Component Diagram - Multi-Tenant Backstage Plugin

High-Level System Architecture

Component Responsibilities

1. Edge Layer Components

Cloud Load Balancer

  • Purpose: Global HTTP(S) load balancing with SSL termination
  • Features:
    • Anycast IP for low-latency routing
    • SSL/TLS 1.3 termination
    • Backend health checks
    • Auto-scaling backend groups
  • Scaling: Automatic based on traffic

Cloud CDN

  • Purpose: Cache static catalog responses at edge locations
  • Cache Strategy:
    • Entity metadata: 5 minutes TTL
    • Static UI assets: 1 hour TTL
    • Cache invalidation via API
  • Hit Rate Target: > 70%

Cloud Armor WAF

  • Purpose: DDoS protection and application-layer filtering
  • Rules:
    • IP allowlist/blocklist
    • Rate limiting (100 req/sec per IP)
    • OWASP Top 10 protection
    • Bot detection
  • Logging: All blocked requests

API Gateway

  • Purpose: Request routing, authentication, rate limiting
  • Features:
    • Tenant identification (JWT, API key)
    • Per-tenant rate limiting (5 req/sec)
    • Request/response logging
    • OpenAPI spec validation
  • Scaling: Managed by GCP

2. Kubernetes Components

Backend API Pods

  • Purpose: Handle synchronous API requests from Backstage UI
  • Endpoints:
    • GET /api/catalog/entities - List entities
    • GET /api/catalog/entities/{kind}/{namespace}/{name} - Get entity
    • POST /api/webhooks/terraform-cloud - Webhook ingestion
    • GET /api/workspaces/{id} - Get workspace details
  • Scaling: HPA based on CPU (70%) and request rate

Catalog Processor Workers

  • Purpose: Asynchronous processing of Terraform state
  • Tasks:
    1. Fetch state from Terraform Cloud
    2. Sanitize sensitive data
    3. Transform to Backstage entities
    4. Persist to database
  • Scaling: HPA based on Pub/Sub queue depth (1000 messages)

Webhook Handlers

  • Purpose: Receive and validate Terraform Cloud webhooks
  • Tasks:
    1. Validate HMAC signature
    2. Check replay attack (timestamp + nonce)
    3. Enqueue to Pub/Sub
    4. Return 200 OK immediately
  • Scaling: HPA based on request rate (500 req/sec)

3. Data Layer Components

Cloud SQL PostgreSQL (Primary)

  • Purpose: Source of truth for catalog entities and tenant config
  • Configuration:
    • Instance: db-custom-8-32GB
    • High Availability: Regional (failover < 60s)
    • Backups: Automated daily, 30-day retention
    • Encryption: CMEK with Cloud KMS
  • Connections: Connection pooling (PgBouncer), max 100 connections

Read Replicas

  • Purpose: Offload read queries from primary (90% of traffic)
  • Configuration:
    • 2 replicas in different zones
    • Async replication (< 1s lag)
    • Read-only queries only
  • Load Balancing: Application-level (read replicas for GET requests)

Cloud Memorystore Redis

  • Purpose: High-speed cache for frequent queries
  • Configuration:
    • Standard tier (HA)
    • 10 GB capacity
    • 6.x version
  • Eviction Policy: LRU (Least Recently Used)
  • Cache Keys:
    • entity:{tenant_id}:{entity_ref} - Catalog entities
    • workspaces:{tenant_id}:list - Workspace lists
    • tenant:{tenant_id}:config - Tenant configuration

4. Message Queue Components

Cloud Pub/Sub

  • Purpose: Asynchronous task distribution and decoupling
  • Topics:
    • terraform-workspace-discovered - New workspace found
    • terraform-state-updated - State version changed
    • github-repository-discovered - New repo found
  • Subscriptions:
    • catalog-processor-subscription - Pull by worker pods
    • Ack deadline: 600s (10 minutes for processing)
    • Retry policy: Exponential backoff (10s to 600s)
  • Throughput: Up to 1000 messages/second

Dead Letter Queue

  • Purpose: Store messages that fail after max retries
  • Configuration:
    • Max retries: 5
    • Retention: 30 days
    • Manual review workflow
  • Alerting: Alert on DLQ depth > 100 messages

5. Security Components

Secret Manager

  • Purpose: Securely store and rotate Terraform Cloud tokens
  • Secrets:
    • tfc-token-{tenant_id} - Per-tenant TFC organization token
    • github-token-{tenant_id} - Per-tenant GitHub PAT
    • webhook-secret-{tenant_id} - HMAC secret for webhooks
  • Access: IAM-based, workload identity (no service account keys)

Cloud KMS

  • Purpose: Customer-managed encryption keys for data at rest
  • Keys:
    • backstage-db-key - Database encryption
    • backstage-backup-key - Backup encryption
  • Rotation: Automatic every 90 days

6. External Service Integrations

Terraform Cloud API

  • Endpoints Used:
    • /organizations/{org}/workspaces - List workspaces
    • /workspaces/{id}/current-state-version - Get latest state
    • /state-versions/{id}/download - Download state JSON
  • Authentication: Bearer token (organization token)
  • Rate Limiting: 30 req/sec (shared across all clients)

GitHub API

  • Endpoints Used:
    • /orgs/{org}/repos - List repositories
    • /repos/{owner}/{repo}/contents/{path} - Fetch catalog-info.yaml
  • Authentication: Personal Access Token (PAT) or GitHub App
  • Rate Limiting: 5000 req/hour (authenticated)

7. Observability Components

Managed Prometheus

  • Purpose: Metrics collection and alerting
  • Metrics:
    • API latency (histogram)
    • Request rate (counter)
    • Error rate (counter)
    • Queue depth (gauge)
    • Cache hit rate (gauge)
  • Retention: 30 days

Cloud Logging

  • Purpose: Centralized log aggregation
  • Log Types:
    • Application logs (structured JSON)
    • Audit logs (security events)
    • Error logs (exceptions)
  • Retention: 90 days (compliance)

Cloud Trace

  • Purpose: Distributed tracing for debugging
  • Sampling: 10% of requests (adjustable)
  • Trace Context: OpenTelemetry format

Data Flow Diagram

Flow 1: Real-Time Webhook Update

Flow 2: Scheduled GitHub Repository Scan

Flow 3: User Catalog Query (Read Path)


Deployment Topology

Multi-Region Setup (Future)


Scaling Characteristics

ComponentScale DimensionTriggerTime to Scale
Backend APIHorizontal (pods)CPU > 70%30 seconds
Catalog ProcessorHorizontal (pods)Queue depth > 100060 seconds
Webhook HandlerHorizontal (pods)Request rate > 500/sec15 seconds
Cloud SQLVertical (CPU/Memory)CPU > 80%Manual (5 minutes)
RedisVertical (Memory)Memory > 80%Manual (5 minutes)
Pub/SubAutomatic (GCP-managed)Message rateInstant

Document Version: 1.0 Last Updated: 2024-11-13 Related Documents: