Skip to main content

Component Diagram - Multi-Tenant Backstage Plugin

High-Level System Architecture

Component Responsibilities

1. Edge Layer Components

Cloud Load Balancer

Purpose: Global HTTP(S) load balancing with SSL termination
Features:
- Anycast IP for low-latency routing
- SSL/TLS 1.3 termination
- Backend health checks
- Auto-scaling backend groups
Scaling: Automatic based on traffic

Cloud CDN

Purpose: Cache static catalog responses at edge locations
Cache Strategy:
- Entity metadata: 5 minutes TTL
- Static UI assets: 1 hour TTL
- Cache invalidation via API
Hit Rate Target: > 70%

Cloud Armor WAF

Purpose: DDoS protection and application-layer filtering
Rules:
- IP allowlist/blocklist
- Rate limiting (100 req/sec per IP)
- OWASP Top 10 protection
- Bot detection
Logging: All blocked requests

API Gateway

Purpose: Request routing, authentication, rate limiting
Features:
- Tenant identification (JWT, API key)
- Per-tenant rate limiting (5 req/sec)
- Request/response logging
- OpenAPI spec validation
Scaling: Managed by GCP

2. Kubernetes Components

Backend API Pods

Purpose: Handle synchronous API requests from Backstage UI
Endpoints:
- GET /api/catalog/entities - List entities
- GET /api/catalog/entities/{kind}/{namespace}/{name} - Get entity
- POST /api/webhooks/terraform-cloud - Webhook ingestion
- GET /api/workspaces/{id} - Get workspace details
Scaling: HPA based on CPU (70%) and request rate

Catalog Processor Workers

Purpose: Asynchronous processing of Terraform state
Tasks:
1. Fetch state from Terraform Cloud
2. Sanitize sensitive data
3. Transform to Backstage entities
4. Persist to database
Scaling: HPA based on Pub/Sub queue depth (1000 messages)

Webhook Handlers

Purpose: Receive and validate Terraform Cloud webhooks
Tasks:
1. Validate HMAC signature
2. Check replay attack (timestamp + nonce)
3. Enqueue to Pub/Sub
4. Return 200 OK immediately
Scaling: HPA based on request rate (500 req/sec)

3. Data Layer Components

Cloud SQL PostgreSQL (Primary)

Purpose: Source of truth for catalog entities and tenant config
Configuration:
- Instance: db-custom-8-32GB
- High Availability: Regional (failover < 60s)
- Backups: Automated daily, 30-day retention
- Encryption: CMEK with Cloud KMS
Connections: Connection pooling (PgBouncer), max 100 connections

Read Replicas

Purpose: Offload read queries from primary (90% of traffic)
Configuration:
- 2 replicas in different zones
- Async replication (< 1s lag)
- Read-only queries only
Load Balancing: Application-level (read replicas for GET requests)

Cloud Memorystore Redis

Purpose: High-speed cache for frequent queries
Configuration:
- Standard tier (HA)
- 10 GB capacity
- 6.x version
Eviction Policy: LRU (Least Recently Used)
Cache Keys:
- entity:{tenant_id}:{entity_ref} - Catalog entities
- workspaces:{tenant_id}:list - Workspace lists
- tenant:{tenant_id}:config - Tenant configuration

4. Message Queue Components

Cloud Pub/Sub

Purpose: Asynchronous task distribution and decoupling
Topics:
- terraform-workspace-discovered - New workspace found
- terraform-state-updated - State version changed
- github-repository-discovered - New repo found
Subscriptions:
- catalog-processor-subscription - Pull by worker pods
- Ack deadline: 600s (10 minutes for processing)
- Retry policy: Exponential backoff (10s to 600s)
Throughput: Up to 1000 messages/second

Dead Letter Queue

Purpose: Store messages that fail after max retries
Configuration:
- Max retries: 5
- Retention: 30 days
- Manual review workflow
Alerting: Alert on DLQ depth > 100 messages

5. Security Components

Secret Manager

Purpose: Securely store and rotate Terraform Cloud tokens
Secrets:
- tfc-token-{tenant_id} - Per-tenant TFC organization token
- github-token-{tenant_id} - Per-tenant GitHub PAT
- webhook-secret-{tenant_id} - HMAC secret for webhooks
Access: IAM-based, workload identity (no service account keys)

Cloud KMS

Purpose: Customer-managed encryption keys for data at rest
Keys:
- backstage-db-key - Database encryption
- backstage-backup-key - Backup encryption
Rotation: Automatic every 90 days

6. External Service Integrations

Terraform Cloud API

Endpoints Used:
- /organizations/{org}/workspaces - List workspaces
- /workspaces/{id}/current-state-version - Get latest state
- /state-versions/{id}/download - Download state JSON
Authentication: Bearer token (organization token)
Rate Limiting: 30 req/sec (shared across all clients)

GitHub API

Endpoints Used:
- /orgs/{org}/repos - List repositories
- /repos/{owner}/{repo}/contents/{path} - Fetch catalog-info.yaml
Authentication: Personal Access Token (PAT) or GitHub App
Rate Limiting: 5000 req/hour (authenticated)

7. Observability Components

Managed Prometheus

Purpose: Metrics collection and alerting
Metrics:
- API latency (histogram)
- Request rate (counter)
- Error rate (counter)
- Queue depth (gauge)
- Cache hit rate (gauge)
Retention: 30 days

Cloud Logging

Purpose: Centralized log aggregation
Log Types:
- Application logs (structured JSON)
- Audit logs (security events)
- Error logs (exceptions)
Retention: 90 days (compliance)

Cloud Trace

Purpose: Distributed tracing for debugging
Sampling: 10% of requests (adjustable)
Trace Context: OpenTelemetry format

Data Flow Diagram

Flow 1: Real-Time Webhook Update

Flow 2: Scheduled GitHub Repository Scan

Flow 3: User Catalog Query (Read Path)

Deployment Topology

Multi-Region Setup (Future)

Scaling Characteristics

Component	Scale Dimension	Trigger	Time to Scale
Backend API	Horizontal (pods)	CPU > 70%	30 seconds
Catalog Processor	Horizontal (pods)	Queue depth > 1000	60 seconds
Webhook Handler	Horizontal (pods)	Request rate > 500/sec	15 seconds
Cloud SQL	Vertical (CPU/Memory)	CPU > 80%	Manual (5 minutes)
Redis	Vertical (Memory)	Memory > 80%	Manual (5 minutes)
Pub/Sub	Automatic (GCP-managed)	Message rate	Instant

Document Version: 1.0 Last Updated: 2024-11-13 Related Documents:

Enterprise SaaS Plugin Architecture
Security Architecture (Pending)
Data Flow Diagrams (Pending)

High-Level System Architecture
Component Responsibilities
Data Flow Diagram
Deployment Topology
- Multi-Region Setup (Future)
Scaling Characteristics