Automated Business Unit Onboarding System
Executive Summary
This document describes the end-to-end automation for onboarding new business units (BUs) and projects into the Backstage catalog. The system automatically detects infrastructure creation, validates metadata, generates catalog entities, and sets up ongoing synchronization.
System Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Trigger Layer (Event Sources) │
├─────────────────────────────────────────────────────────────────┤
│ GitHub Webhooks │ TFC Webhooks │ Polling Jobs │ Manual │
└──────────┬──────────────────┬────────────┬──────────────┬───────┘
│ │ │ │
└──────────────────┴────────────┴──────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Event Router & Orchestrator │
│ • Deduplication • Tenant Isolation • Rate Limiting │
└──────────────────────────────┬──────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Onboarding Workflow Engine │
│ State Machine: Validation → Discovery → Generation → Registry │
└──────────────────────────────┬──────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Integration Layer │
├──────────────┬──────────────┬──────────────┬────────────────────┤
│ GitHub API │ TFC API │ GCP API │ Backstage Catalog │
└──────────────┴──────────────┴──────────────┴────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Synchronization Layer │
│ • State Watchers • Drift Detection • Reconciliation │
└─────────────────────────────────────────────────────────────────┘
Core Components
1. Trigger Mechanisms
- GitHub Webhooks: Real-time repository creation events
- Terraform Cloud Webhooks: Workspace lifecycle events
- Polling Discovery: Scheduled jobs for missed events
- Manual Trigger: UI/API for explicit onboarding
2. Workflow Engine
- State machine-based orchestration
- Idempotent operations
- Automatic retry with exponential backoff
- Rollback on failure
3. Detection & Validation
- Multi-strategy business unit identification
- Metadata extraction and validation
- Quality gates before entity creation
4. Entity Generation
- Dynamic Backstage entity creation
- Relationship mapping (Domain → System → Component)
- Template-based YAML generation
5. Synchronization
- Post-onboarding webhook setup
- Scheduled reconciliation jobs
- Drift detection and correction
Key Features
Multi-Client Isolation
- Tenant identification from GitHub org or TFC org
- Namespace-based entity segregation
- Cross-tenant access prevention
Idempotency & Resilience
- Duplicate detection using fingerprinting
- Partial onboarding recovery
- Transactional entity creation
- Automatic cleanup on failure
Observability
- Onboarding status tracking
- Audit trail for all operations
- Metrics and alerting
- Debug logs with correlation IDs
Design Principles
- Event-Driven: React to infrastructure changes in real-time
- Resilient: Handle failures gracefully with automatic recovery
- Isolated: Prevent cross-tenant data leakage
- Observable: Track all operations with detailed logging
- Extensible: Plugin architecture for custom workflows
- Secure: Validate all inputs, sanitize outputs
Success Criteria
- ✅ Zero manual intervention for standard BU onboarding
- ✅ < 5 minute onboarding time from Terraform apply
- ✅ 100% idempotency (safe to retry)
- ✅ Zero cross-tenant entity pollution
- ✅ Automatic rollback on validation failure
- ✅ Real-time status visibility
Document Structure
- Trigger Mechanisms (02-trigger-mechanisms.md)
- Workflow State Machine (03-workflow-state-machine.md)
- Detection Algorithms (04-detection-algorithms.md)
- Validation & Quality Gates (05-validation-quality-gates.md)
- Multi-Client Isolation (06-multi-client-isolation.md)
- Idempotency & Retry (07-idempotency-retry.md)
- Synchronization Setup (08-synchronization-setup.md)
- Implementation Guide (09-implementation-guide.md)