Multi-Tenant Database Architecture for Backstage Entities
Executive Summary
Recommendation: Row-Level Security (RLS) with PostgreSQL for 100-500 clients, transitioning to Horizontal Sharding at 500+ clients.
Key Metrics
- Target Performance: Entity lookup < 50ms (50% faster than requirement)
- Capacity: 100 clients × 10,000 entities = 1M entities in single DB
- Isolation: Complete tenant isolation via RLS + tenant_id partitioning
- Scalability: Horizontal sharding ready at 500+ clients
1. Multi-Tenancy Strategy Analysis
Option A: Schema-per-Tenant ❌
-- Separate schema for each client
CREATE SCHEMA client_acme;
CREATE SCHEMA client_widgets_inc;
CREATE TABLE client_acme.entities (
entity_id UUID PRIMARY KEY,
entity_ref TEXT NOT NULL,
entity_json JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Pros:
- Strong logical isolation
- Simple tenant-level backups (
pg_dump --schema=client_acme) - Independent schema evolution per tenant
- Easy to drop entire tenant
Cons:
- ❌ Connection pooling nightmare: Each query must
SET search_path = client_acme - ❌ 1000+ schemas = PostgreSQL bloat: System catalog overhead
- ❌ Cross-tenant analytics impossible: No global queries
- ❌ Schema migrations: Must run 100+ times for each client
- ❌ Limited to ~1000 schemas in production
Verdict: Only viable for < 50 clients with independent deployments.
Option B: Row-Level Security (RLS) ✅ RECOMMENDED
-- Single unified schema with tenant isolation
CREATE TABLE entities (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL,
entity_json JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (tenant_id, entity_id)
) PARTITION BY LIST (tenant_id);
-- Enable Row-Level Security
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
-- Policy: Users can only see their tenant's data
CREATE POLICY tenant_isolation ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);
-- Bypass RLS for admin operations
GRANT BYPASS RLS ON entities TO admin_role;
Pros:
- ✅ Native PostgreSQL security: RLS enforced at kernel level
- ✅ Single schema migrations: Deploy once for all tenants
- ✅ Connection pooling friendly: Set
app.tenant_idper session - ✅ Cross-tenant analytics: Admin role can query all data
- ✅ Proven at scale: GitHub, Heroku use RLS for multi-tenancy
- ✅ Horizontal scaling: Partition by tenant_id for sharding
Cons:
- ⚠️ Must set
app.tenant_idfor every connection (mitigated by middleware) - ⚠️ RLS adds ~2-5ms overhead per query (negligible for our use case)
Verdict: Best balance of security, performance, and scalability for 100-500 clients.
Option C: Database-per-Tenant ❌
-- Separate database per client
CREATE DATABASE backstage_client_acme;
CREATE DATABASE backstage_client_widgets_inc;
\c backstage_client_acme
CREATE TABLE entities (...);
Pros:
- Ultimate isolation (separate PostgreSQL process)
- Tenant-level resource limits (CPU, memory, connections)
- Independent backups and disaster recovery
Cons:
- ❌ Connection pool explosion: 100 clients × 10 connections = 1000+ connections
- ❌ Operational nightmare: 100 databases to monitor, backup, upgrade
- ❌ Cross-tenant queries impossible: No global search or analytics
- ❌ Cost: 100 databases = 100× PostgreSQL instances in managed services
- ❌ Schema migrations: Must run 100× (with zero-downtime complexity)
Verdict: Only for SaaS with < 10 enterprise clients requiring dedicated infrastructure.
2. Backstage Catalog Schema Extension
Original Backstage Schema (Single-Tenant)
-- Backstage's default catalog schema
CREATE TABLE entities (
entity_id TEXT PRIMARY KEY,
final_entity JSONB,
hash TEXT NOT NULL,
stitch_ticket TEXT
);
CREATE TABLE relations (
originating_entity_id TEXT NOT NULL,
source_entity_ref TEXT NOT NULL,
target_entity_ref TEXT NOT NULL,
type TEXT NOT NULL
);
CREATE TABLE refresh_state (
entity_id TEXT PRIMARY KEY,
entity_ref TEXT NOT NULL,
unprocessed_entity JSONB,
processed_entity JSONB,
cache JSONB,
errors TEXT
);
Multi-Tenant Extension (RLS-Based)
-- Core entities table with tenant isolation
CREATE TABLE entities (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL, -- e.g., "component:default/my-service"
entity_kind TEXT NOT NULL, -- component, api, system, resource
entity_namespace TEXT NOT NULL DEFAULT 'default',
entity_name TEXT NOT NULL,
-- Backstage fields
final_entity JSONB NOT NULL, -- Complete entity YAML as JSON
hash TEXT NOT NULL, -- Entity content hash for change detection
stitch_ticket TEXT, -- For entity stitching during refresh
-- Metadata
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ, -- Soft delete
PRIMARY KEY (tenant_id, entity_id),
UNIQUE (tenant_id, entity_ref)
) PARTITION BY LIST (tenant_id);
-- Entity relationships (multi-tenant)
CREATE TABLE relations (
tenant_id UUID NOT NULL,
relation_id UUID NOT NULL DEFAULT gen_random_uuid(),
originating_entity_id UUID NOT NULL,
source_entity_ref TEXT NOT NULL,
target_entity_ref TEXT NOT NULL,
type TEXT NOT NULL, -- ownedBy, dependsOn, apiProvidedBy, etc.
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (tenant_id, relation_id),
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE
) PARTITION BY LIST (tenant_id);
-- Refresh state for catalog ingestion
CREATE TABLE refresh_state (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL,
unprocessed_entity JSONB, -- Raw YAML from source
processed_entity JSONB, -- After validation
cache JSONB, -- Location cache
errors TEXT, -- Ingestion errors
last_refresh TIMESTAMPTZ NOT NULL DEFAULT NOW(),
next_refresh TIMESTAMPTZ,
PRIMARY KEY (tenant_id, entity_id),
FOREIGN KEY (tenant_id, entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE
) PARTITION BY LIST (tenant_id);
-- Row-Level Security Policies
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
ALTER TABLE relations ENABLE ROW LEVEL SECURITY;
ALTER TABLE refresh_state ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_entities ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);
CREATE POLICY tenant_isolation_relations ON relations
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);
CREATE POLICY tenant_isolation_refresh ON refresh_state
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);
3. Entity ID Strategy
Approach: UUID-based with Tenant Column + Backstage-Compatible Refs
-- Entity storage
INSERT INTO entities (
tenant_id,
entity_id,
entity_ref,
entity_kind,
entity_namespace,
entity_name,
final_entity,
hash
) VALUES (
'550e8400-e29b-41d4-a716-446655440000', -- tenant_id (UUID)
'c73b9df0-6c42-4f3e-9d1a-5b2e8f3a7d9b', -- entity_id (UUID)
'component:default/my-service', -- Backstage entity ref
'component',
'default',
'my-service',
'{"apiVersion": "backstage.io/v1alpha1", ...}'::jsonb,
'abc123hash'
);
Why This Strategy?
-
UUID entity_id:
- Globally unique, no collisions
- Efficient indexing (16 bytes vs TEXT)
- Native PostgreSQL support
-
Backstage-compatible entity_ref:
- Format:
{kind}:{namespace}/{name} - Example:
component:default/my-service - Preserves Backstage semantics
- No tenant prefix in refs (cleaner UI)
- Format:
-
Composite Primary Key (tenant_id, entity_id):
- Enforces uniqueness per tenant
- Enables partition pruning
- Prevents cross-tenant ID leaks
-
Namespace Collision Prevention:
- Tenant ID in partition key
- UNIQUE constraint on (tenant_id, entity_ref)
- Application-level validation
4. Performance Optimization
Index Strategy
-- Primary indexes (automatic from PRIMARY KEY)
CREATE INDEX idx_entities_pk ON entities (tenant_id, entity_id);
-- Entity lookups by ref (most common query)
CREATE INDEX idx_entities_ref ON entities (tenant_id, entity_ref)
WHERE deleted_at IS NULL;
-- Entity kind filtering (list all APIs, components, etc.)
CREATE INDEX idx_entities_kind ON entities (tenant_id, entity_kind, created_at DESC)
WHERE deleted_at IS NULL;
-- Full-text search on entity names
CREATE INDEX idx_entities_name_trgm ON entities
USING gin (entity_name gin_trgm_ops)
WHERE deleted_at IS NULL;
-- JSONB search (GIN index)
CREATE INDEX idx_entities_json ON entities
USING gin (final_entity jsonb_path_ops);
-- Relations lookup
CREATE INDEX idx_relations_source ON relations (tenant_id, source_entity_ref);
CREATE INDEX idx_relations_target ON relations (tenant_id, target_entity_ref);
CREATE INDEX idx_relations_type ON relations (tenant_id, type);
-- Refresh state queries
CREATE INDEX idx_refresh_next ON refresh_state (tenant_id, next_refresh)
WHERE next_refresh IS NOT NULL;
Partitioning Strategy
-- Create partitions for top 10 tenants (hot data)
CREATE TABLE entities_tenant_acme PARTITION OF entities
FOR VALUES IN ('550e8400-e29b-41d4-a716-446655440000');
CREATE TABLE entities_tenant_widgets PARTITION OF entities
FOR VALUES IN ('660e8400-e29b-41d4-a716-446655440001');
-- Default partition for new tenants
CREATE TABLE entities_default PARTITION OF entities DEFAULT;
-- Automatically create partitions on tenant creation
CREATE OR REPLACE FUNCTION create_tenant_partition()
RETURNS TRIGGER AS $$
BEGIN
EXECUTE format(
'CREATE TABLE IF NOT EXISTS entities_tenant_%s PARTITION OF entities FOR VALUES IN (%L)',
NEW.tenant_id,
NEW.tenant_id
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER tenant_partition_trigger
AFTER INSERT ON tenants
FOR EACH ROW
EXECUTE FUNCTION create_tenant_partition();
Query Patterns & Performance
-- Query 1: Get entity by ref (most common) - Target: < 20ms
SET app.tenant_id = '550e8400-e29b-41d4-a716-446655440000';
SELECT final_entity
FROM entities
WHERE entity_ref = 'component:default/my-service'
AND deleted_at IS NULL;
-- Index used: idx_entities_ref
-- Partition pruning: YES
-- Expected: 5-15ms
-- Query 2: List all components - Target: < 50ms
SELECT entity_ref, entity_name, created_at
FROM entities
WHERE entity_kind = 'component'
AND deleted_at IS NULL
ORDER BY created_at DESC
LIMIT 100;
-- Index used: idx_entities_kind
-- Expected: 20-40ms
-- Query 3: Search entities by name - Target: < 100ms
SELECT entity_ref, entity_name
FROM entities
WHERE entity_name ILIKE '%auth%'
AND deleted_at IS NULL
LIMIT 50;
-- Index used: idx_entities_name_trgm
-- Expected: 50-100ms
-- Query 4: Get entity with all relations - Target: < 100ms
WITH entity_data AS (
SELECT * FROM entities WHERE entity_ref = 'component:default/my-service'
)
SELECT
e.*,
json_agg(r.*) AS relations
FROM entity_data e
LEFT JOIN relations r ON r.source_entity_ref = e.entity_ref
GROUP BY e.entity_id;
-- Indexes used: idx_entities_ref, idx_relations_source
-- Expected: 30-80ms
Caching Layer
-- Redis cache strategy
Key: tenant:{tenant_id}:entity:{entity_ref}
TTL: 300 seconds (5 minutes)
Invalidation: On entity UPDATE/DELETE
-- Application-level caching
class EntityCache {
async getEntity(tenantId, entityRef) {
const cacheKey = `tenant:${tenantId}:entity:${entityRef}`;
// Try Redis first
let entity = await redis.get(cacheKey);
if (entity) return JSON.parse(entity);
// Cache miss: Query database
entity = await db.query(
'SELECT final_entity FROM entities WHERE tenant_id = $1 AND entity_ref = $2',
[tenantId, entityRef]
);
// Cache for 5 minutes
await redis.setex(cacheKey, 300, JSON.stringify(entity));
return entity;
}
}
Connection Pooling
// Per-tenant connection pooling with tenant_id session variable
const { Pool } = require('pg');
class TenantPool {
constructor() {
this.pool = new Pool({
host: 'localhost',
database: 'backstage_multi_tenant',
max: 100, // Total connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
}
async withTenant(tenantId, callback) {
const client = await this.pool.connect();
try {
// Set tenant context for RLS
await client.query('SET app.tenant_id = $1', [tenantId]);
return await callback(client);
} finally {
// Reset context before returning to pool
await client.query('RESET app.tenant_id');
client.release();
}
}
}
// Usage
const pool = new TenantPool();
const entities = await pool.withTenant(tenantId, async (client) => {
return client.query('SELECT * FROM entities WHERE entity_kind = $1', ['component']);
});
5. Scalability Projections
Scenario 1: 10 Clients, 1K Entities Each (10K Total)
- Database: Single PostgreSQL instance
- Hardware: 4 vCPUs, 16GB RAM, 100GB SSD
- Performance:
- Entity lookup: 5-10ms
- List queries: 10-20ms
- Search: 30-50ms
- Cost: ~$100/month (AWS RDS db.m5.large)
Scenario 2: 100 Clients, 10K Entities Each (1M Total)
- Database: Single PostgreSQL with read replicas
- Hardware: 8 vCPUs, 32GB RAM, 500GB SSD
- Partitioning: 100 tenant partitions + default
- Performance:
- Entity lookup: 10-20ms (partition pruning helps)
- List queries: 30-60ms
- Search: 60-120ms
- Read Replicas: 2× replicas for read scaling
- Caching: Redis cluster (3 nodes, 8GB each)
- Cost: ~$800/month (Primary + 2 replicas + Redis)
Scenario 3: 1000 Clients, 100K Entities Each (100M Total)
- Database: Horizontal sharding required
- Sharding Strategy: Hash-based on tenant_id
- Shard 1: Tenants 1-250 (25M entities)
- Shard 2: Tenants 251-500 (25M entities)
- Shard 3: Tenants 501-750 (25M entities)
- Shard 4: Tenants 751-1000 (25M entities)
- Hardware per shard: 16 vCPUs, 64GB RAM, 1TB SSD
- Routing: Application-level shard router
- Performance:
- Entity lookup: 20-40ms (shard routing + query)
- Cross-shard queries: Not supported (by design)
- Alternative: CockroachDB (auto-sharding, no app changes)
- Cost: ~$4,000/month (4 shards + orchestration)
Storage Estimation
Average entity size: 5KB (JSON)
Relations per entity: 5 (avg)
Relation size: 200 bytes
Per entity storage:
- Entity: 5KB
- Relations: 5 × 200 bytes = 1KB
- Refresh state: 2KB
- Indexes: 2KB (estimated overhead)
Total: ~10KB per entity
Scalability:
- 10K entities: 100MB
- 100K entities: 1GB
- 1M entities: 10GB
- 10M entities: 100GB
- 100M entities: 1TB (with indexes)
Read/Write Patterns
Typical workload:
- 90% reads (entity lookups, searches)
- 10% writes (catalog refreshes)
Peak load (100 clients):
- 1000 reads/sec
- 100 writes/sec
- Burst capacity: 5000 reads/sec during catalog refresh
Database sizing:
- Single instance handles: 10K QPS (with caching)
- Read replicas needed: At 2K+ sustained QPS
- Sharding needed: At 10K+ sustained QPS
6. Data Lifecycle Management
Entity TTL (Time-to-Live)
-- Stale entity detection (not refreshed in 30 days)
CREATE INDEX idx_entities_stale ON entities (tenant_id, updated_at)
WHERE deleted_at IS NULL;
-- Automated cleanup job (daily)
CREATE OR REPLACE FUNCTION cleanup_stale_entities()
RETURNS void AS $$
BEGIN
-- Soft delete entities not updated in 90 days
UPDATE entities
SET deleted_at = NOW()
WHERE updated_at < NOW() - INTERVAL '90 days'
AND deleted_at IS NULL;
-- Hard delete entities soft-deleted for 365 days
DELETE FROM entities
WHERE deleted_at < NOW() - INTERVAL '365 days';
END;
$$ LANGUAGE plpgsql;
-- Schedule with pg_cron
SELECT cron.schedule('cleanup-stale-entities', '0 2 * * *', 'SELECT cleanup_stale_entities()');
Soft Deletes
-- Soft delete implementation
UPDATE entities
SET deleted_at = NOW()
WHERE tenant_id = '550e8400-e29b-41d4-a716-446655440000'
AND entity_ref = 'component:default/deprecated-service';
-- Restore soft-deleted entity
UPDATE entities
SET deleted_at = NULL
WHERE tenant_id = '550e8400-e29b-41d4-a716-446655440000'
AND entity_ref = 'component:default/deprecated-service';
-- All queries filter out soft-deleted entities
SELECT * FROM entities
WHERE tenant_id = current_setting('app.tenant_id')::UUID
AND deleted_at IS NULL;
Entity Change History
-- Audit log for entity changes
CREATE TABLE entity_history (
history_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL,
operation TEXT NOT NULL, -- INSERT, UPDATE, DELETE
old_entity JSONB,
new_entity JSONB,
changed_by TEXT, -- User or service account
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
FOREIGN KEY (tenant_id, entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE
) PARTITION BY RANGE (changed_at);
-- Partition by month for efficient querying
CREATE TABLE entity_history_2024_01 PARTITION OF entity_history
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
-- Trigger to capture changes
CREATE OR REPLACE FUNCTION log_entity_changes()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO entity_history (tenant_id, entity_id, entity_ref, operation, new_entity, changed_by)
VALUES (NEW.tenant_id, NEW.entity_id, NEW.entity_ref, 'INSERT', NEW.final_entity, current_user);
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO entity_history (tenant_id, entity_id, entity_ref, operation, old_entity, new_entity, changed_by)
VALUES (NEW.tenant_id, NEW.entity_id, NEW.entity_ref, 'UPDATE', OLD.final_entity, NEW.final_entity, current_user);
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO entity_history (tenant_id, entity_id, entity_ref, operation, old_entity, changed_by)
VALUES (OLD.tenant_id, OLD.entity_id, OLD.entity_ref, 'DELETE', OLD.final_entity, current_user);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER entity_changes_trigger
AFTER INSERT OR UPDATE OR DELETE ON entities
FOR EACH ROW
EXECUTE FUNCTION log_entity_changes();
Backup and Restore
-- Per-tenant backup (logical backup)
pg_dump \
--host=localhost \
--dbname=backstage_multi_tenant \
--table=entities \
--where="tenant_id = '550e8400-e29b-41d4-a716-446655440000'" \
--format=custom \
--file=tenant_acme_backup_2024_01_15.dump
-- Per-tenant restore
pg_restore \
--host=localhost \
--dbname=backstage_multi_tenant \
--table=entities \
--data-only \
tenant_acme_backup_2024_01_15.dump
-- Global backup (all tenants)
pg_dump \
--host=localhost \
--dbname=backstage_multi_tenant \
--format=custom \
--file=full_backup_2024_01_15.dump
-- Continuous archiving (PITR - Point-in-Time Recovery)
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/backups/wal/%f'
# Daily base backup
pg_basebackup -D /mnt/backups/base/$(date +%Y%m%d) -Ft -z -Xs
# Restore to specific timestamp
pg_restore --target-time='2024-01-15 14:30:00' /mnt/backups/base/20240115
7. Technology Trade-offs Analysis
PostgreSQL (Recommended) ✅
Why PostgreSQL?
- Native Backstage support (default database)
- Row-Level Security (RLS) for tenant isolation
- JSONB for entity storage (fast queries)
- Mature replication (streaming, logical)
- Rich indexing (GIN, GiST, BRIN, B-tree)
- Proven at scale (Notion, Instagram, Reddit)
Limits:
- Single-node write bottleneck (mitigated by read replicas)
- Manual sharding required at 100M+ entities
- Connection limit (~500 default, tune to 1000)
Best For: 10-500 clients, 1M-50M entities
MySQL (Alternative)
Why MySQL?
- Slightly faster for simple key-value lookups
- Galera Cluster for multi-master writes
- InnoDB supports row-level locking
Why Not?
- ❌ No native RLS (must implement in app layer)
- ❌ Weaker JSON support (JSONB in Postgres is superior)
- ❌ No LISTEN/NOTIFY for real-time updates
- ❌ Backstage defaults to Postgres
Verdict: Only if team expertise strongly favors MySQL.
CockroachDB (Distributed, 1000+ Clients)
Why CockroachDB?
- Automatic horizontal sharding (no app changes)
- Built-in multi-region replication
- PostgreSQL-compatible (Backstage works)
- Survives node failures (no downtime)
Trade-offs:
- 2-3× higher latency than Postgres (cross-node coordination)
- Higher cost (~5× Postgres for same throughput)
- Learning curve (new consistency model)
Best For: 1000+ clients, 100M+ entities, multi-region
Google Cloud Spanner (Enterprise, Global Scale)
Why Spanner?
- Global consistency (TrueTime)
- 99.999% SLA (5 minutes downtime/year)
- Automatic sharding and replication
- Managed service (zero ops)
Trade-offs:
- 10-50ms latency (global coordination)
- Very high cost ($1,000+/month minimum)
- Not PostgreSQL-compatible (schema changes needed)
Best For: Global enterprises, regulated industries, unlimited budget
8. Migration Path from Single-Tenant Backstage
Phase 1: Add Tenant ID (Zero Downtime)
-- Step 1: Add tenant_id column (nullable initially)
ALTER TABLE entities ADD COLUMN tenant_id UUID;
ALTER TABLE relations ADD COLUMN tenant_id UUID;
ALTER TABLE refresh_state ADD COLUMN tenant_id UUID;
-- Step 2: Backfill tenant_id for existing data (single tenant)
UPDATE entities SET tenant_id = '00000000-0000-0000-0000-000000000001';
UPDATE relations SET tenant_id = '00000000-0000-0000-0000-000000000001';
UPDATE refresh_state SET tenant_id = '00000000-0000-0000-0000-000000000001';
-- Step 3: Make tenant_id NOT NULL
ALTER TABLE entities ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE relations ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE refresh_state ALTER COLUMN tenant_id SET NOT NULL;
Phase 2: Update Primary Keys
-- Step 4: Drop old primary key and create composite key
ALTER TABLE entities DROP CONSTRAINT entities_pkey;
ALTER TABLE entities ADD PRIMARY KEY (tenant_id, entity_id);
-- Step 5: Add indexes
CREATE INDEX idx_entities_ref ON entities (tenant_id, entity_ref);
CREATE INDEX idx_entities_kind ON entities (tenant_id, entity_kind);
Phase 3: Enable RLS
-- Step 6: Enable Row-Level Security
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);
-- Step 7: Grant bypass to admin
GRANT BYPASS RLS ON entities TO backstage_admin;
Phase 4: Partition by Tenant
-- Step 8: Create new partitioned table
CREATE TABLE entities_new (LIKE entities INCLUDING ALL)
PARTITION BY LIST (tenant_id);
-- Step 9: Migrate data
INSERT INTO entities_new SELECT * FROM entities;
-- Step 10: Swap tables (requires maintenance window)
BEGIN;
ALTER TABLE entities RENAME TO entities_old;
ALTER TABLE entities_new RENAME TO entities;
COMMIT;
9. Complete DDL Schema
See /docs/database/schema.sql for full implementation.
10. Recommendations Summary
For 100 Clients (Current Requirement)
- Use PostgreSQL with Row-Level Security
- Partition by tenant_id (100 partitions + default)
- Deploy read replicas (2×) for read scaling
- Add Redis caching (5-minute TTL)
- Index strategy: entity_ref, entity_kind, JSONB paths
- Monitor query performance: Target < 50ms for entity lookups
For 500+ Clients (Future Scaling)
- Horizontal sharding by tenant_id hash
- CockroachDB migration for auto-sharding
- Multi-region replication if global
Security Checklist
- ✅ RLS policies on all tables
- ✅ Tenant ID validation in middleware
- ✅ Audit logging of entity changes
- ✅ Encrypted connections (SSL/TLS)
- ✅ Backup encryption at rest
Performance Targets
- Entity lookup by ref: < 20ms (90th percentile)
- List entities by kind: < 50ms (90th percentile)
- Full-text search: < 100ms (90th percentile)
- Catalog refresh (1000 entities): < 30 seconds
Appendix: Query Performance Testing
-- Performance testing script
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Run 1000 entity lookups
DO $$
DECLARE
tenant UUID := '550e8400-e29b-41d4-a716-446655440000';
i INT;
BEGIN
SET app.tenant_id = tenant;
FOR i IN 1..1000 LOOP
PERFORM * FROM entities
WHERE entity_ref = 'component:default/service-' || i::TEXT
AND deleted_at IS NULL;
END LOOP;
END $$;
-- Check performance
SELECT
calls,
mean_exec_time,
max_exec_time,
query
FROM pg_stat_statements
WHERE query LIKE '%entities%'
ORDER BY mean_exec_time DESC;