Migration Guide: Single-Tenant to Multi-Tenant Backstage
Overview
This guide provides step-by-step instructions for migrating an existing single-tenant Backstage database to our multi-tenant architecture with minimal downtime.
Prerequisites
- PostgreSQL 14+ (with uuid-ossp, pg_trgm, pg_cron extensions)
- Backstage running on default catalog schema
- Database backup completed
- Maintenance window scheduled (optional, for final cutover)
Migration Strategy
Approach: Blue-Green Deployment with zero-downtime rollback capability.
- Phase 1: Add tenant_id column (backward compatible)
- Phase 2: Backfill tenant_id for existing data
- Phase 3: Update application code to set tenant context
- Phase 4: Enable Row-Level Security (RLS)
- Phase 5: Create partitions and optimize indexes
- Phase 6: Cutover and verify
Phase 1: Add Tenant ID Column (Backward Compatible)
Duration: 5-10 minutes Downtime: None (online schema change)
-- 1. Create tenants table
CREATE TABLE tenants (
tenant_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_slug TEXT UNIQUE NOT NULL,
tenant_name TEXT NOT NULL,
max_entities INT DEFAULT 50000,
entity_ttl_days INT DEFAULT 365,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
CHECK (tenant_slug ~ '^[a-z0-9-]+$')
);
CREATE INDEX idx_tenants_slug ON tenants (tenant_slug) WHERE deleted_at IS NULL;
-- 2. Add tenant_id column to entities (nullable initially)
ALTER TABLE entities ADD COLUMN tenant_id UUID;
-- 3. Add tenant_id to relations
ALTER TABLE relations ADD COLUMN tenant_id UUID;
-- 4. Add tenant_id to refresh_state
ALTER TABLE refresh_state ADD COLUMN tenant_id UUID;
-- 5. Create indexes (for backfill query performance)
CREATE INDEX idx_entities_tenant_temp ON entities (tenant_id);
CREATE INDEX idx_relations_tenant_temp ON relations (tenant_id);
CREATE INDEX idx_refresh_tenant_temp ON refresh_state (tenant_id);
Verification:
-- Check columns added successfully
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'entities' AND column_name = 'tenant_id';
Phase 2: Backfill Tenant ID
Duration: 10-30 minutes (depends on data size) Downtime: None (runs in background)
-- 1. Insert default tenant (represents existing single-tenant data)
INSERT INTO tenants (tenant_id, tenant_slug, tenant_name)
VALUES ('00000000-0000-0000-0000-000000000001', 'default', 'Default Tenant');
-- 2. Backfill tenant_id for existing entities
UPDATE entities
SET tenant_id = '00000000-0000-0000-0000-000000000001'
WHERE tenant_id IS NULL;
-- 3. Backfill tenant_id for relations
UPDATE relations
SET tenant_id = '00000000-0000-0000-0000-000000000001'
WHERE tenant_id IS NULL;
-- 4. Backfill tenant_id for refresh_state
UPDATE refresh_state
SET tenant_id = '00000000-0000-0000-0000-000000000001'
WHERE tenant_id IS NULL;
-- 5. Make tenant_id NOT NULL (after backfill complete)
ALTER TABLE entities ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE relations ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE refresh_state ALTER COLUMN tenant_id SET NOT NULL;
-- 6. Add foreign key constraints
ALTER TABLE entities
ADD CONSTRAINT fk_entities_tenant
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE;
ALTER TABLE relations
ADD CONSTRAINT fk_relations_tenant
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE;
ALTER TABLE refresh_state
ADD CONSTRAINT fk_refresh_tenant
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE;
Verification:
-- Check all entities have tenant_id
SELECT COUNT(*) AS total_entities,
COUNT(tenant_id) AS entities_with_tenant
FROM entities;
-- Should show: total_entities = entities_with_tenant
Phase 3: Update Application Code
Duration: 1-2 hours (development + testing) Downtime: None (deploy with feature flag)
Middleware: Set Tenant Context
// src/middleware/tenant-context.ts
import { Request, Response, NextFunction } from 'express';
import { Pool } from 'pg';
export interface TenantRequest extends Request {
tenantId: string;
}
export function tenantContextMiddleware(pool: Pool) {
return async (req: TenantRequest, res: Response, next: NextFunction) => {
try {
// Extract tenant from JWT, subdomain, or header
const tenantSlug = extractTenantSlug(req);
// Lookup tenant_id from slug
const result = await pool.query(
'SELECT tenant_id FROM tenants WHERE tenant_slug = $1 AND deleted_at IS NULL',
[tenantSlug]
);
if (result.rows.length === 0) {
return res.status(403).json({ error: 'Invalid tenant' });
}
req.tenantId = result.rows[0].tenant_id;
next();
} catch (error) {
next(error);
}
};
}
function extractTenantSlug(req: TenantRequest): string {
// Option 1: From subdomain (tenant.backstage.example.com)
const subdomain = req.hostname.split('.')[0];
if (subdomain && subdomain !== 'www') {
return subdomain;
}
// Option 2: From JWT claim
const token = req.headers.authorization?.split(' ')[1];
if (token) {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
return decoded.tenantSlug;
}
// Option 3: From header (for API clients)
const header = req.headers['x-tenant-slug'];
if (header) {
return header as string;
}
// Fallback: default tenant (for backward compatibility)
return 'default';
}
Database Connection Pool
// src/database/tenant-pool.ts
import { Pool, PoolClient } from 'pg';
export class TenantPool {
private pool: Pool;
constructor(config: any) {
this.pool = new Pool({
host: config.host,
database: config.database,
max: 100,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
}
async withTenant<T>(tenantId: string, callback: (client: PoolClient) => Promise<T>): Promise<T> {
const client = await this.pool.connect();
try {
// Set tenant context for RLS
await client.query('SET app.tenant_id = $1', [tenantId]);
// Execute callback with tenant context
const result = await callback(client);
return result;
} finally {
// Reset context before returning to pool
await client.query('RESET app.tenant_id');
client.release();
}
}
}
Update Backstage Catalog API
// src/catalog/catalog-service.ts
import { TenantPool } from '../database/tenant-pool';
export class CatalogService {
constructor(private pool: TenantPool) {}
async getEntity(tenantId: string, entityRef: string) {
return this.pool.withTenant(tenantId, async (client) => {
const result = await client.query(
'SELECT final_entity FROM entities WHERE entity_ref = $1 AND deleted_at IS NULL',
[entityRef]
);
return result.rows[0]?.final_entity;
});
}
async listEntities(tenantId: string, kind: string, limit = 100) {
return this.pool.withTenant(tenantId, async (client) => {
const result = await client.query(
'SELECT entity_ref, entity_name, created_at FROM entities ' +
'WHERE entity_kind = $1 AND deleted_at IS NULL ' +
'ORDER BY created_at DESC LIMIT $2',
[kind, limit]
);
return result.rows;
});
}
}
Deploy with Feature Flag:
// src/config.ts
export const config = {
multiTenancyEnabled: process.env.MULTI_TENANCY_ENABLED === 'true',
defaultTenantId: '00000000-0000-0000-0000-000000000001',
};
// In middleware
if (config.multiTenancyEnabled) {
app.use(tenantContextMiddleware(pool));
} else {
// Set default tenant for backward compatibility
app.use((req: TenantRequest, res, next) => {
req.tenantId = config.defaultTenantId;
next();
});
}
Phase 4: Enable Row-Level Security
Duration: 5 minutes Downtime: None (backward compatible with updated app code)
-- 1. Enable RLS on entities
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_entities ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);
-- 2. Enable RLS on relations
ALTER TABLE relations ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_relations ON relations
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);
-- 3. Enable RLS on refresh_state
ALTER TABLE refresh_state ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_refresh ON refresh_state
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);
-- 4. Create admin role (bypasses RLS)
CREATE ROLE backstage_admin;
GRANT BYPASS RLS ON entities TO backstage_admin;
GRANT BYPASS RLS ON relations TO backstage_admin;
GRANT BYPASS RLS ON refresh_state TO backstage_admin;
-- 5. Grant admin role to application user (for migrations)
GRANT backstage_admin TO backstage_app;
Verification:
-- Test RLS is working
SET app.tenant_id = '00000000-0000-0000-0000-000000000001';
SELECT COUNT(*) FROM entities; -- Should return all entities
SET app.tenant_id = '00000000-0000-0000-0000-000000000002';
SELECT COUNT(*) FROM entities; -- Should return 0 (no entities for tenant 2)
RESET app.tenant_id;
Phase 5: Update Primary Keys & Create Partitions
Duration: 30-60 minutes Downtime: Required (5-10 minutes for primary key swap)
Step 1: Create New Partitioned Table
-- 1. Create partitioned table structure
CREATE TABLE entities_new (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL DEFAULT gen_random_uuid(),
entity_ref TEXT NOT NULL,
entity_kind TEXT NOT NULL,
entity_namespace TEXT NOT NULL DEFAULT 'default',
entity_name TEXT NOT NULL,
final_entity JSONB NOT NULL,
hash TEXT NOT NULL,
stitch_ticket TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
PRIMARY KEY (tenant_id, entity_id),
UNIQUE (tenant_id, entity_ref),
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE,
CHECK (entity_ref = entity_kind || ':' || entity_namespace || '/' || entity_name)
) PARTITION BY LIST (tenant_id);
-- 2. Create default partition
CREATE TABLE entities_new_default PARTITION OF entities_new DEFAULT;
-- 3. Create partition for default tenant
CREATE TABLE entities_new_default_tenant PARTITION OF entities_new
FOR VALUES IN ('00000000-0000-0000-0000-000000000001');
-- 4. Enable RLS on new table
ALTER TABLE entities_new ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation_entities_new ON entities_new
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);
-- 5. Migrate data (this can take time for large datasets)
INSERT INTO entities_new SELECT * FROM entities;
-- 6. Create indexes on new table
CREATE INDEX idx_entities_new_ref ON entities_new (tenant_id, entity_ref)
WHERE deleted_at IS NULL;
CREATE INDEX idx_entities_new_kind ON entities_new (tenant_id, entity_kind, created_at DESC)
WHERE deleted_at IS NULL;
CREATE INDEX idx_entities_new_name_trgm ON entities_new
USING gin (entity_name gin_trgm_ops)
WHERE deleted_at IS NULL;
CREATE INDEX idx_entities_new_json ON entities_new
USING gin (final_entity jsonb_path_ops);
Step 2: Swap Tables (Maintenance Window)
Downtime Window: 5-10 minutes
-- MAINTENANCE WINDOW STARTS
BEGIN;
-- 1. Rename old table
ALTER TABLE entities RENAME TO entities_old;
-- 2. Rename new table to production name
ALTER TABLE entities_new RENAME TO entities;
-- 3. Update foreign key constraints in relations table
ALTER TABLE relations DROP CONSTRAINT IF EXISTS fk_relations_entity;
ALTER TABLE relations
ADD CONSTRAINT fk_relations_entity
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;
-- 4. Update refresh_state foreign keys
ALTER TABLE refresh_state DROP CONSTRAINT IF EXISTS fk_refresh_entity;
ALTER TABLE refresh_state
ADD CONSTRAINT fk_refresh_entity
FOREIGN KEY (tenant_id, entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;
-- 5. Commit transaction (POINT OF NO RETURN)
COMMIT;
-- MAINTENANCE WINDOW ENDS
Step 3: Clean Up Old Table
-- Wait 24-48 hours, then drop old table
-- DROP TABLE entities_old;
Repeat for relations and refresh_state tables.
Phase 6: Cutover and Verification
Duration: 1 hour Downtime: None (gradual rollout)
Step 1: Enable Multi-Tenancy
# Update environment variable
export MULTI_TENANCY_ENABLED=true
# Restart application
kubectl rollout restart deployment/backstage
Step 2: Smoke Tests
# Test 1: Query entities with tenant context
curl -H "X-Tenant-Slug: default" \
https://backstage.example.com/api/catalog/entities
# Test 2: Create new tenant
curl -X POST https://backstage.example.com/api/tenants \
-H "Content-Type: application/json" \
-d '{"tenant_slug": "acme", "tenant_name": "ACME Corp"}'
# Test 3: Query as new tenant (should return 0 entities)
curl -H "X-Tenant-Slug: acme" \
https://backstage.example.com/api/catalog/entities
Step 3: Performance Benchmarks
-- Run performance tests (see query-examples.sql)
EXPLAIN (ANALYZE, BUFFERS)
SELECT final_entity
FROM entities
WHERE entity_ref = 'component:default/test-service'
AND deleted_at IS NULL;
-- Expected: < 20ms with partition pruning
Step 4: Monitor Metrics
# Check database connections
SELECT count(*), usename, application_name
FROM pg_stat_activity
GROUP BY usename, application_name;
# Check slow queries
SELECT * FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;
# Check RLS overhead
SELECT * FROM pg_stat_user_tables WHERE tablename = 'entities';
Rollback Plan
If Issues Found (Before Primary Key Swap)
-- Simply disable feature flag
export MULTI_TENANCY_ENABLED=false
-- RLS is transparent when app.tenant_id is not set
If Issues Found (After Primary Key Swap)
Requires Maintenance Window (10-15 minutes)
BEGIN;
-- 1. Rename current table
ALTER TABLE entities RENAME TO entities_partitioned;
-- 2. Restore old table
ALTER TABLE entities_old RENAME TO entities;
-- 3. Restore foreign keys
ALTER TABLE relations DROP CONSTRAINT fk_relations_entity;
ALTER TABLE relations ADD CONSTRAINT fk_relations_entity
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;
-- 4. Disable multi-tenancy
-- export MULTI_TENANCY_ENABLED=false
COMMIT;
Post-Migration Checklist
- All entities have tenant_id populated
- RLS policies active and enforced
- Indexes created and used (check EXPLAIN ANALYZE)
- Partitions created for default tenant
- Application code sets app.tenant_id correctly
- Performance benchmarks meet targets (< 50ms entity lookups)
- Backup and restore tested
- Monitoring dashboards updated
- Documentation updated
- Old tables archived/dropped (after 48 hours)
Troubleshooting
Issue: Queries slow after migration
Cause: Missing indexes or RLS overhead
Solution:
-- Check index usage
SELECT * FROM pg_stat_user_indexes WHERE tablename = 'entities';
-- Recreate missing indexes
CREATE INDEX idx_entities_ref ON entities (tenant_id, entity_ref)
WHERE deleted_at IS NULL;
Issue: "permission denied for table entities"
Cause: RLS blocking queries without tenant context
Solution:
// Ensure app.tenant_id is set in middleware
await client.query('SET app.tenant_id = $1', [tenantId]);
Issue: Foreign key constraint violations
Cause: Composite primary key (tenant_id, entity_id) not matching foreign keys
Solution:
-- Update foreign key to include tenant_id
ALTER TABLE relations DROP CONSTRAINT fk_relations_entity;
ALTER TABLE relations ADD CONSTRAINT fk_relations_entity
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;
Next Steps
- Add More Tenants: Use
INSERT INTO tenantsto onboard new customers - Monitor Performance: Set up Prometheus metrics for query latency
- Optimize Queries: Use
EXPLAIN ANALYZEto tune slow queries - Scale Horizontally: Add read replicas at 2000+ QPS
- Plan Sharding: Consider sharding at 500+ tenants
Support
For migration assistance:
- Open GitHub issue: https://github.com/your-org/foundations-workspace/issues
- Contact: platform-team@example.com