Component Diagram - Multi-Tenant Backstage Plugin
High-Level System Architecture
Component Responsibilities
1. Edge Layer Components
Cloud Load Balancer
- Purpose: Global HTTP(S) load balancing with SSL termination
- Features:
- Anycast IP for low-latency routing
- SSL/TLS 1.3 termination
- Backend health checks
- Auto-scaling backend groups
- Scaling: Automatic based on traffic
Cloud CDN
- Purpose: Cache static catalog responses at edge locations
- Cache Strategy:
- Entity metadata: 5 minutes TTL
- Static UI assets: 1 hour TTL
- Cache invalidation via API
- Hit Rate Target: > 70%
Cloud Armor WAF
- Purpose: DDoS protection and application-layer filtering
- Rules:
- IP allowlist/blocklist
- Rate limiting (100 req/sec per IP)
- OWASP Top 10 protection
- Bot detection
- Logging: All blocked requests
API Gateway
- Purpose: Request routing, authentication, rate limiting
- Features:
- Tenant identification (JWT, API key)
- Per-tenant rate limiting (5 req/sec)
- Request/response logging
- OpenAPI spec validation
- Scaling: Managed by GCP
2. Kubernetes Components
Backend API Pods
- Purpose: Handle synchronous API requests from Backstage UI
- Endpoints:
GET /api/catalog/entities- List entitiesGET /api/catalog/entities/{kind}/{namespace}/{name}- Get entityPOST /api/webhooks/terraform-cloud- Webhook ingestionGET /api/workspaces/{id}- Get workspace details
- Scaling: HPA based on CPU (70%) and request rate
Catalog Processor Workers
- Purpose: Asynchronous processing of Terraform state
- Tasks:
- Fetch state from Terraform Cloud
- Sanitize sensitive data
- Transform to Backstage entities
- Persist to database
- Scaling: HPA based on Pub/Sub queue depth (1000 messages)
Webhook Handlers
- Purpose: Receive and validate Terraform Cloud webhooks
- Tasks:
- Validate HMAC signature
- Check replay attack (timestamp + nonce)
- Enqueue to Pub/Sub
- Return 200 OK immediately
- Scaling: HPA based on request rate (500 req/sec)
3. Data Layer Components
Cloud SQL PostgreSQL (Primary)
- Purpose: Source of truth for catalog entities and tenant config
- Configuration:
- Instance: db-custom-8-32GB
- High Availability: Regional (failover < 60s)
- Backups: Automated daily, 30-day retention
- Encryption: CMEK with Cloud KMS
- Connections: Connection pooling (PgBouncer), max 100 connections
Read Replicas
- Purpose: Offload read queries from primary (90% of traffic)
- Configuration:
- 2 replicas in different zones
- Async replication (< 1s lag)
- Read-only queries only
- Load Balancing: Application-level (read replicas for GET requests)
Cloud Memorystore Redis
- Purpose: High-speed cache for frequent queries
- Configuration:
- Standard tier (HA)
- 10 GB capacity
- 6.x version
- Eviction Policy: LRU (Least Recently Used)
- Cache Keys:
entity:{tenant_id}:{entity_ref}- Catalog entitiesworkspaces:{tenant_id}:list- Workspace liststenant:{tenant_id}:config- Tenant configuration
4. Message Queue Components
Cloud Pub/Sub
- Purpose: Asynchronous task distribution and decoupling
- Topics:
terraform-workspace-discovered- New workspace foundterraform-state-updated- State version changedgithub-repository-discovered- New repo found
- Subscriptions:
catalog-processor-subscription- Pull by worker pods- Ack deadline: 600s (10 minutes for processing)
- Retry policy: Exponential backoff (10s to 600s)
- Throughput: Up to 1000 messages/second
Dead Letter Queue
- Purpose: Store messages that fail after max retries
- Configuration:
- Max retries: 5
- Retention: 30 days
- Manual review workflow
- Alerting: Alert on DLQ depth > 100 messages
5. Security Components
Secret Manager
- Purpose: Securely store and rotate Terraform Cloud tokens
- Secrets:
tfc-token-{tenant_id}- Per-tenant TFC organization tokengithub-token-{tenant_id}- Per-tenant GitHub PATwebhook-secret-{tenant_id}- HMAC secret for webhooks
- Access: IAM-based, workload identity (no service account keys)
Cloud KMS
- Purpose: Customer-managed encryption keys for data at rest
- Keys:
backstage-db-key- Database encryptionbackstage-backup-key- Backup encryption
- Rotation: Automatic every 90 days
6. External Service Integrations
Terraform Cloud API
- Endpoints Used:
/organizations/{org}/workspaces- List workspaces/workspaces/{id}/current-state-version- Get latest state/state-versions/{id}/download- Download state JSON
- Authentication: Bearer token (organization token)
- Rate Limiting: 30 req/sec (shared across all clients)
GitHub API
- Endpoints Used:
/orgs/{org}/repos- List repositories/repos/{owner}/{repo}/contents/{path}- Fetch catalog-info.yaml
- Authentication: Personal Access Token (PAT) or GitHub App
- Rate Limiting: 5000 req/hour (authenticated)
7. Observability Components
Managed Prometheus
- Purpose: Metrics collection and alerting
- Metrics:
- API latency (histogram)
- Request rate (counter)
- Error rate (counter)
- Queue depth (gauge)
- Cache hit rate (gauge)
- Retention: 30 days
Cloud Logging
- Purpose: Centralized log aggregation
- Log Types:
- Application logs (structured JSON)
- Audit logs (security events)
- Error logs (exceptions)
- Retention: 90 days (compliance)
Cloud Trace
- Purpose: Distributed tracing for debugging
- Sampling: 10% of requests (adjustable)
- Trace Context: OpenTelemetry format
Data Flow Diagram
Flow 1: Real-Time Webhook Update
Flow 2: Scheduled GitHub Repository Scan
Flow 3: User Catalog Query (Read Path)
Deployment Topology
Multi-Region Setup (Future)
Scaling Characteristics
| Component | Scale Dimension | Trigger | Time to Scale |
|---|---|---|---|
| Backend API | Horizontal (pods) | CPU > 70% | 30 seconds |
| Catalog Processor | Horizontal (pods) | Queue depth > 1000 | 60 seconds |
| Webhook Handler | Horizontal (pods) | Request rate > 500/sec | 15 seconds |
| Cloud SQL | Vertical (CPU/Memory) | CPU > 80% | Manual (5 minutes) |
| Redis | Vertical (Memory) | Memory > 80% | Manual (5 minutes) |
| Pub/Sub | Automatic (GCP-managed) | Message rate | Instant |
Document Version: 1.0 Last Updated: 2024-11-13 Related Documents:
- Enterprise SaaS Plugin Architecture
- Security Architecture (Pending)
- Data Flow Diagrams (Pending)