Skip to main content

Multi-Tenant Database Architecture for Backstage Entities

Executive Summary

Recommendation: Row-Level Security (RLS) with PostgreSQL for 100-500 clients, transitioning to Horizontal Sharding at 500+ clients.

Key Metrics

  • Target Performance: Entity lookup < 50ms (50% faster than requirement)
  • Capacity: 100 clients × 10,000 entities = 1M entities in single DB
  • Isolation: Complete tenant isolation via RLS + tenant_id partitioning
  • Scalability: Horizontal sharding ready at 500+ clients

1. Multi-Tenancy Strategy Analysis

Option A: Schema-per-Tenant ❌

-- Separate schema for each client
CREATE SCHEMA client_acme;
CREATE SCHEMA client_widgets_inc;

CREATE TABLE client_acme.entities (
entity_id UUID PRIMARY KEY,
entity_ref TEXT NOT NULL,
entity_json JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Pros:

  • Strong logical isolation
  • Simple tenant-level backups (pg_dump --schema=client_acme)
  • Independent schema evolution per tenant
  • Easy to drop entire tenant

Cons:

  • Connection pooling nightmare: Each query must SET search_path = client_acme
  • 1000+ schemas = PostgreSQL bloat: System catalog overhead
  • Cross-tenant analytics impossible: No global queries
  • Schema migrations: Must run 100+ times for each client
  • Limited to ~1000 schemas in production

Verdict: Only viable for < 50 clients with independent deployments.


-- Single unified schema with tenant isolation
CREATE TABLE entities (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL,
entity_json JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (tenant_id, entity_id)
) PARTITION BY LIST (tenant_id);

-- Enable Row-Level Security
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;

-- Policy: Users can only see their tenant's data
CREATE POLICY tenant_isolation ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);

-- Bypass RLS for admin operations
GRANT BYPASS RLS ON entities TO admin_role;

Pros:

  • Native PostgreSQL security: RLS enforced at kernel level
  • Single schema migrations: Deploy once for all tenants
  • Connection pooling friendly: Set app.tenant_id per session
  • Cross-tenant analytics: Admin role can query all data
  • Proven at scale: GitHub, Heroku use RLS for multi-tenancy
  • Horizontal scaling: Partition by tenant_id for sharding

Cons:

  • ⚠️ Must set app.tenant_id for every connection (mitigated by middleware)
  • ⚠️ RLS adds ~2-5ms overhead per query (negligible for our use case)

Verdict: Best balance of security, performance, and scalability for 100-500 clients.


Option C: Database-per-Tenant ❌

-- Separate database per client
CREATE DATABASE backstage_client_acme;
CREATE DATABASE backstage_client_widgets_inc;

\c backstage_client_acme
CREATE TABLE entities (...);

Pros:

  • Ultimate isolation (separate PostgreSQL process)
  • Tenant-level resource limits (CPU, memory, connections)
  • Independent backups and disaster recovery

Cons:

  • Connection pool explosion: 100 clients × 10 connections = 1000+ connections
  • Operational nightmare: 100 databases to monitor, backup, upgrade
  • Cross-tenant queries impossible: No global search or analytics
  • Cost: 100 databases = 100× PostgreSQL instances in managed services
  • Schema migrations: Must run 100× (with zero-downtime complexity)

Verdict: Only for SaaS with < 10 enterprise clients requiring dedicated infrastructure.


2. Backstage Catalog Schema Extension

Original Backstage Schema (Single-Tenant)

-- Backstage's default catalog schema
CREATE TABLE entities (
entity_id TEXT PRIMARY KEY,
final_entity JSONB,
hash TEXT NOT NULL,
stitch_ticket TEXT
);

CREATE TABLE relations (
originating_entity_id TEXT NOT NULL,
source_entity_ref TEXT NOT NULL,
target_entity_ref TEXT NOT NULL,
type TEXT NOT NULL
);

CREATE TABLE refresh_state (
entity_id TEXT PRIMARY KEY,
entity_ref TEXT NOT NULL,
unprocessed_entity JSONB,
processed_entity JSONB,
cache JSONB,
errors TEXT
);

Multi-Tenant Extension (RLS-Based)

-- Core entities table with tenant isolation
CREATE TABLE entities (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL, -- e.g., "component:default/my-service"
entity_kind TEXT NOT NULL, -- component, api, system, resource
entity_namespace TEXT NOT NULL DEFAULT 'default',
entity_name TEXT NOT NULL,

-- Backstage fields
final_entity JSONB NOT NULL, -- Complete entity YAML as JSON
hash TEXT NOT NULL, -- Entity content hash for change detection
stitch_ticket TEXT, -- For entity stitching during refresh

-- Metadata
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ, -- Soft delete

PRIMARY KEY (tenant_id, entity_id),
UNIQUE (tenant_id, entity_ref)
) PARTITION BY LIST (tenant_id);

-- Entity relationships (multi-tenant)
CREATE TABLE relations (
tenant_id UUID NOT NULL,
relation_id UUID NOT NULL DEFAULT gen_random_uuid(),

originating_entity_id UUID NOT NULL,
source_entity_ref TEXT NOT NULL,
target_entity_ref TEXT NOT NULL,
type TEXT NOT NULL, -- ownedBy, dependsOn, apiProvidedBy, etc.

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

PRIMARY KEY (tenant_id, relation_id),
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE
) PARTITION BY LIST (tenant_id);

-- Refresh state for catalog ingestion
CREATE TABLE refresh_state (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL,

unprocessed_entity JSONB, -- Raw YAML from source
processed_entity JSONB, -- After validation
cache JSONB, -- Location cache
errors TEXT, -- Ingestion errors

last_refresh TIMESTAMPTZ NOT NULL DEFAULT NOW(),
next_refresh TIMESTAMPTZ,

PRIMARY KEY (tenant_id, entity_id),
FOREIGN KEY (tenant_id, entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE
) PARTITION BY LIST (tenant_id);

-- Row-Level Security Policies
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
ALTER TABLE relations ENABLE ROW LEVEL SECURITY;
ALTER TABLE refresh_state ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_entities ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);

CREATE POLICY tenant_isolation_relations ON relations
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);

CREATE POLICY tenant_isolation_refresh ON refresh_state
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);

3. Entity ID Strategy

Approach: UUID-based with Tenant Column + Backstage-Compatible Refs

-- Entity storage
INSERT INTO entities (
tenant_id,
entity_id,
entity_ref,
entity_kind,
entity_namespace,
entity_name,
final_entity,
hash
) VALUES (
'550e8400-e29b-41d4-a716-446655440000', -- tenant_id (UUID)
'c73b9df0-6c42-4f3e-9d1a-5b2e8f3a7d9b', -- entity_id (UUID)
'component:default/my-service', -- Backstage entity ref
'component',
'default',
'my-service',
'{"apiVersion": "backstage.io/v1alpha1", ...}'::jsonb,
'abc123hash'
);

Why This Strategy?

  1. UUID entity_id:

    • Globally unique, no collisions
    • Efficient indexing (16 bytes vs TEXT)
    • Native PostgreSQL support
  2. Backstage-compatible entity_ref:

    • Format: {kind}:{namespace}/{name}
    • Example: component:default/my-service
    • Preserves Backstage semantics
    • No tenant prefix in refs (cleaner UI)
  3. Composite Primary Key (tenant_id, entity_id):

    • Enforces uniqueness per tenant
    • Enables partition pruning
    • Prevents cross-tenant ID leaks
  4. Namespace Collision Prevention:

    • Tenant ID in partition key
    • UNIQUE constraint on (tenant_id, entity_ref)
    • Application-level validation

4. Performance Optimization

Index Strategy

-- Primary indexes (automatic from PRIMARY KEY)
CREATE INDEX idx_entities_pk ON entities (tenant_id, entity_id);

-- Entity lookups by ref (most common query)
CREATE INDEX idx_entities_ref ON entities (tenant_id, entity_ref)
WHERE deleted_at IS NULL;

-- Entity kind filtering (list all APIs, components, etc.)
CREATE INDEX idx_entities_kind ON entities (tenant_id, entity_kind, created_at DESC)
WHERE deleted_at IS NULL;

-- Full-text search on entity names
CREATE INDEX idx_entities_name_trgm ON entities
USING gin (entity_name gin_trgm_ops)
WHERE deleted_at IS NULL;

-- JSONB search (GIN index)
CREATE INDEX idx_entities_json ON entities
USING gin (final_entity jsonb_path_ops);

-- Relations lookup
CREATE INDEX idx_relations_source ON relations (tenant_id, source_entity_ref);
CREATE INDEX idx_relations_target ON relations (tenant_id, target_entity_ref);
CREATE INDEX idx_relations_type ON relations (tenant_id, type);

-- Refresh state queries
CREATE INDEX idx_refresh_next ON refresh_state (tenant_id, next_refresh)
WHERE next_refresh IS NOT NULL;

Partitioning Strategy

-- Create partitions for top 10 tenants (hot data)
CREATE TABLE entities_tenant_acme PARTITION OF entities
FOR VALUES IN ('550e8400-e29b-41d4-a716-446655440000');

CREATE TABLE entities_tenant_widgets PARTITION OF entities
FOR VALUES IN ('660e8400-e29b-41d4-a716-446655440001');

-- Default partition for new tenants
CREATE TABLE entities_default PARTITION OF entities DEFAULT;

-- Automatically create partitions on tenant creation
CREATE OR REPLACE FUNCTION create_tenant_partition()
RETURNS TRIGGER AS $$
BEGIN
EXECUTE format(
'CREATE TABLE IF NOT EXISTS entities_tenant_%s PARTITION OF entities FOR VALUES IN (%L)',
NEW.tenant_id,
NEW.tenant_id
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tenant_partition_trigger
AFTER INSERT ON tenants
FOR EACH ROW
EXECUTE FUNCTION create_tenant_partition();

Query Patterns & Performance

-- Query 1: Get entity by ref (most common) - Target: < 20ms
SET app.tenant_id = '550e8400-e29b-41d4-a716-446655440000';
SELECT final_entity
FROM entities
WHERE entity_ref = 'component:default/my-service'
AND deleted_at IS NULL;
-- Index used: idx_entities_ref
-- Partition pruning: YES
-- Expected: 5-15ms

-- Query 2: List all components - Target: < 50ms
SELECT entity_ref, entity_name, created_at
FROM entities
WHERE entity_kind = 'component'
AND deleted_at IS NULL
ORDER BY created_at DESC
LIMIT 100;
-- Index used: idx_entities_kind
-- Expected: 20-40ms

-- Query 3: Search entities by name - Target: < 100ms
SELECT entity_ref, entity_name
FROM entities
WHERE entity_name ILIKE '%auth%'
AND deleted_at IS NULL
LIMIT 50;
-- Index used: idx_entities_name_trgm
-- Expected: 50-100ms

-- Query 4: Get entity with all relations - Target: < 100ms
WITH entity_data AS (
SELECT * FROM entities WHERE entity_ref = 'component:default/my-service'
)
SELECT
e.*,
json_agg(r.*) AS relations
FROM entity_data e
LEFT JOIN relations r ON r.source_entity_ref = e.entity_ref
GROUP BY e.entity_id;
-- Indexes used: idx_entities_ref, idx_relations_source
-- Expected: 30-80ms

Caching Layer

-- Redis cache strategy
Key: tenant:{tenant_id}:entity:{entity_ref}
TTL: 300 seconds (5 minutes)
Invalidation: On entity UPDATE/DELETE

-- Application-level caching
class EntityCache {
async getEntity(tenantId, entityRef) {
const cacheKey = `tenant:${tenantId}:entity:${entityRef}`;

// Try Redis first
let entity = await redis.get(cacheKey);
if (entity) return JSON.parse(entity);

// Cache miss: Query database
entity = await db.query(
'SELECT final_entity FROM entities WHERE tenant_id = $1 AND entity_ref = $2',
[tenantId, entityRef]
);

// Cache for 5 minutes
await redis.setex(cacheKey, 300, JSON.stringify(entity));
return entity;
}
}

Connection Pooling

// Per-tenant connection pooling with tenant_id session variable
const { Pool } = require('pg');

class TenantPool {
constructor() {
this.pool = new Pool({
host: 'localhost',
database: 'backstage_multi_tenant',
max: 100, // Total connections
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
}

async withTenant(tenantId, callback) {
const client = await this.pool.connect();
try {
// Set tenant context for RLS
await client.query('SET app.tenant_id = $1', [tenantId]);
return await callback(client);
} finally {
// Reset context before returning to pool
await client.query('RESET app.tenant_id');
client.release();
}
}
}

// Usage
const pool = new TenantPool();
const entities = await pool.withTenant(tenantId, async (client) => {
return client.query('SELECT * FROM entities WHERE entity_kind = $1', ['component']);
});

5. Scalability Projections

Scenario 1: 10 Clients, 1K Entities Each (10K Total)

  • Database: Single PostgreSQL instance
  • Hardware: 4 vCPUs, 16GB RAM, 100GB SSD
  • Performance:
    • Entity lookup: 5-10ms
    • List queries: 10-20ms
    • Search: 30-50ms
  • Cost: ~$100/month (AWS RDS db.m5.large)

Scenario 2: 100 Clients, 10K Entities Each (1M Total)

  • Database: Single PostgreSQL with read replicas
  • Hardware: 8 vCPUs, 32GB RAM, 500GB SSD
  • Partitioning: 100 tenant partitions + default
  • Performance:
    • Entity lookup: 10-20ms (partition pruning helps)
    • List queries: 30-60ms
    • Search: 60-120ms
  • Read Replicas: 2× replicas for read scaling
  • Caching: Redis cluster (3 nodes, 8GB each)
  • Cost: ~$800/month (Primary + 2 replicas + Redis)

Scenario 3: 1000 Clients, 100K Entities Each (100M Total)

  • Database: Horizontal sharding required
  • Sharding Strategy: Hash-based on tenant_id
    • Shard 1: Tenants 1-250 (25M entities)
    • Shard 2: Tenants 251-500 (25M entities)
    • Shard 3: Tenants 501-750 (25M entities)
    • Shard 4: Tenants 751-1000 (25M entities)
  • Hardware per shard: 16 vCPUs, 64GB RAM, 1TB SSD
  • Routing: Application-level shard router
  • Performance:
    • Entity lookup: 20-40ms (shard routing + query)
    • Cross-shard queries: Not supported (by design)
  • Alternative: CockroachDB (auto-sharding, no app changes)
  • Cost: ~$4,000/month (4 shards + orchestration)

Storage Estimation

Average entity size: 5KB (JSON)
Relations per entity: 5 (avg)
Relation size: 200 bytes

Per entity storage:
- Entity: 5KB
- Relations: 5 × 200 bytes = 1KB
- Refresh state: 2KB
- Indexes: 2KB (estimated overhead)
Total: ~10KB per entity

Scalability:
- 10K entities: 100MB
- 100K entities: 1GB
- 1M entities: 10GB
- 10M entities: 100GB
- 100M entities: 1TB (with indexes)

Read/Write Patterns

Typical workload:
- 90% reads (entity lookups, searches)
- 10% writes (catalog refreshes)

Peak load (100 clients):
- 1000 reads/sec
- 100 writes/sec
- Burst capacity: 5000 reads/sec during catalog refresh

Database sizing:
- Single instance handles: 10K QPS (with caching)
- Read replicas needed: At 2K+ sustained QPS
- Sharding needed: At 10K+ sustained QPS

6. Data Lifecycle Management

Entity TTL (Time-to-Live)

-- Stale entity detection (not refreshed in 30 days)
CREATE INDEX idx_entities_stale ON entities (tenant_id, updated_at)
WHERE deleted_at IS NULL;

-- Automated cleanup job (daily)
CREATE OR REPLACE FUNCTION cleanup_stale_entities()
RETURNS void AS $$
BEGIN
-- Soft delete entities not updated in 90 days
UPDATE entities
SET deleted_at = NOW()
WHERE updated_at < NOW() - INTERVAL '90 days'
AND deleted_at IS NULL;

-- Hard delete entities soft-deleted for 365 days
DELETE FROM entities
WHERE deleted_at < NOW() - INTERVAL '365 days';
END;
$$ LANGUAGE plpgsql;

-- Schedule with pg_cron
SELECT cron.schedule('cleanup-stale-entities', '0 2 * * *', 'SELECT cleanup_stale_entities()');

Soft Deletes

-- Soft delete implementation
UPDATE entities
SET deleted_at = NOW()
WHERE tenant_id = '550e8400-e29b-41d4-a716-446655440000'
AND entity_ref = 'component:default/deprecated-service';

-- Restore soft-deleted entity
UPDATE entities
SET deleted_at = NULL
WHERE tenant_id = '550e8400-e29b-41d4-a716-446655440000'
AND entity_ref = 'component:default/deprecated-service';

-- All queries filter out soft-deleted entities
SELECT * FROM entities
WHERE tenant_id = current_setting('app.tenant_id')::UUID
AND deleted_at IS NULL;

Entity Change History

-- Audit log for entity changes
CREATE TABLE entity_history (
history_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL,
entity_ref TEXT NOT NULL,

operation TEXT NOT NULL, -- INSERT, UPDATE, DELETE
old_entity JSONB,
new_entity JSONB,
changed_by TEXT, -- User or service account
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

FOREIGN KEY (tenant_id, entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE
) PARTITION BY RANGE (changed_at);

-- Partition by month for efficient querying
CREATE TABLE entity_history_2024_01 PARTITION OF entity_history
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

-- Trigger to capture changes
CREATE OR REPLACE FUNCTION log_entity_changes()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO entity_history (tenant_id, entity_id, entity_ref, operation, new_entity, changed_by)
VALUES (NEW.tenant_id, NEW.entity_id, NEW.entity_ref, 'INSERT', NEW.final_entity, current_user);
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO entity_history (tenant_id, entity_id, entity_ref, operation, old_entity, new_entity, changed_by)
VALUES (NEW.tenant_id, NEW.entity_id, NEW.entity_ref, 'UPDATE', OLD.final_entity, NEW.final_entity, current_user);
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO entity_history (tenant_id, entity_id, entity_ref, operation, old_entity, changed_by)
VALUES (OLD.tenant_id, OLD.entity_id, OLD.entity_ref, 'DELETE', OLD.final_entity, current_user);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER entity_changes_trigger
AFTER INSERT OR UPDATE OR DELETE ON entities
FOR EACH ROW
EXECUTE FUNCTION log_entity_changes();

Backup and Restore

-- Per-tenant backup (logical backup)
pg_dump \
--host=localhost \
--dbname=backstage_multi_tenant \
--table=entities \
--where="tenant_id = '550e8400-e29b-41d4-a716-446655440000'" \
--format=custom \
--file=tenant_acme_backup_2024_01_15.dump

-- Per-tenant restore
pg_restore \
--host=localhost \
--dbname=backstage_multi_tenant \
--table=entities \
--data-only \
tenant_acme_backup_2024_01_15.dump

-- Global backup (all tenants)
pg_dump \
--host=localhost \
--dbname=backstage_multi_tenant \
--format=custom \
--file=full_backup_2024_01_15.dump

-- Continuous archiving (PITR - Point-in-Time Recovery)
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/backups/wal/%f'

# Daily base backup
pg_basebackup -D /mnt/backups/base/$(date +%Y%m%d) -Ft -z -Xs

# Restore to specific timestamp
pg_restore --target-time='2024-01-15 14:30:00' /mnt/backups/base/20240115

7. Technology Trade-offs Analysis

Why PostgreSQL?

  • Native Backstage support (default database)
  • Row-Level Security (RLS) for tenant isolation
  • JSONB for entity storage (fast queries)
  • Mature replication (streaming, logical)
  • Rich indexing (GIN, GiST, BRIN, B-tree)
  • Proven at scale (Notion, Instagram, Reddit)

Limits:

  • Single-node write bottleneck (mitigated by read replicas)
  • Manual sharding required at 100M+ entities
  • Connection limit (~500 default, tune to 1000)

Best For: 10-500 clients, 1M-50M entities


MySQL (Alternative)

Why MySQL?

  • Slightly faster for simple key-value lookups
  • Galera Cluster for multi-master writes
  • InnoDB supports row-level locking

Why Not?

  • ❌ No native RLS (must implement in app layer)
  • ❌ Weaker JSON support (JSONB in Postgres is superior)
  • ❌ No LISTEN/NOTIFY for real-time updates
  • ❌ Backstage defaults to Postgres

Verdict: Only if team expertise strongly favors MySQL.


CockroachDB (Distributed, 1000+ Clients)

Why CockroachDB?

  • Automatic horizontal sharding (no app changes)
  • Built-in multi-region replication
  • PostgreSQL-compatible (Backstage works)
  • Survives node failures (no downtime)

Trade-offs:

  • 2-3× higher latency than Postgres (cross-node coordination)
  • Higher cost (~5× Postgres for same throughput)
  • Learning curve (new consistency model)

Best For: 1000+ clients, 100M+ entities, multi-region


Google Cloud Spanner (Enterprise, Global Scale)

Why Spanner?

  • Global consistency (TrueTime)
  • 99.999% SLA (5 minutes downtime/year)
  • Automatic sharding and replication
  • Managed service (zero ops)

Trade-offs:

  • 10-50ms latency (global coordination)
  • Very high cost ($1,000+/month minimum)
  • Not PostgreSQL-compatible (schema changes needed)

Best For: Global enterprises, regulated industries, unlimited budget


8. Migration Path from Single-Tenant Backstage

Phase 1: Add Tenant ID (Zero Downtime)

-- Step 1: Add tenant_id column (nullable initially)
ALTER TABLE entities ADD COLUMN tenant_id UUID;
ALTER TABLE relations ADD COLUMN tenant_id UUID;
ALTER TABLE refresh_state ADD COLUMN tenant_id UUID;

-- Step 2: Backfill tenant_id for existing data (single tenant)
UPDATE entities SET tenant_id = '00000000-0000-0000-0000-000000000001';
UPDATE relations SET tenant_id = '00000000-0000-0000-0000-000000000001';
UPDATE refresh_state SET tenant_id = '00000000-0000-0000-0000-000000000001';

-- Step 3: Make tenant_id NOT NULL
ALTER TABLE entities ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE relations ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE refresh_state ALTER COLUMN tenant_id SET NOT NULL;

Phase 2: Update Primary Keys

-- Step 4: Drop old primary key and create composite key
ALTER TABLE entities DROP CONSTRAINT entities_pkey;
ALTER TABLE entities ADD PRIMARY KEY (tenant_id, entity_id);

-- Step 5: Add indexes
CREATE INDEX idx_entities_ref ON entities (tenant_id, entity_ref);
CREATE INDEX idx_entities_kind ON entities (tenant_id, entity_kind);

Phase 3: Enable RLS

-- Step 6: Enable Row-Level Security
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id')::UUID);

-- Step 7: Grant bypass to admin
GRANT BYPASS RLS ON entities TO backstage_admin;

Phase 4: Partition by Tenant

-- Step 8: Create new partitioned table
CREATE TABLE entities_new (LIKE entities INCLUDING ALL)
PARTITION BY LIST (tenant_id);

-- Step 9: Migrate data
INSERT INTO entities_new SELECT * FROM entities;

-- Step 10: Swap tables (requires maintenance window)
BEGIN;
ALTER TABLE entities RENAME TO entities_old;
ALTER TABLE entities_new RENAME TO entities;
COMMIT;

9. Complete DDL Schema

See /docs/database/schema.sql for full implementation.


10. Recommendations Summary

For 100 Clients (Current Requirement)

  1. Use PostgreSQL with Row-Level Security
  2. Partition by tenant_id (100 partitions + default)
  3. Deploy read replicas (2×) for read scaling
  4. Add Redis caching (5-minute TTL)
  5. Index strategy: entity_ref, entity_kind, JSONB paths
  6. Monitor query performance: Target < 50ms for entity lookups

For 500+ Clients (Future Scaling)

  1. Horizontal sharding by tenant_id hash
  2. CockroachDB migration for auto-sharding
  3. Multi-region replication if global

Security Checklist

  • ✅ RLS policies on all tables
  • ✅ Tenant ID validation in middleware
  • ✅ Audit logging of entity changes
  • ✅ Encrypted connections (SSL/TLS)
  • ✅ Backup encryption at rest

Performance Targets

  • Entity lookup by ref: < 20ms (90th percentile)
  • List entities by kind: < 50ms (90th percentile)
  • Full-text search: < 100ms (90th percentile)
  • Catalog refresh (1000 entities): < 30 seconds

Appendix: Query Performance Testing

-- Performance testing script
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Run 1000 entity lookups
DO $$
DECLARE
tenant UUID := '550e8400-e29b-41d4-a716-446655440000';
i INT;
BEGIN
SET app.tenant_id = tenant;

FOR i IN 1..1000 LOOP
PERFORM * FROM entities
WHERE entity_ref = 'component:default/service-' || i::TEXT
AND deleted_at IS NULL;
END LOOP;
END $$;

-- Check performance
SELECT
calls,
mean_exec_time,
max_exec_time,
query
FROM pg_stat_statements
WHERE query LIKE '%entities%'
ORDER BY mean_exec_time DESC;