Skip to main content

Migration Guide: Single-Tenant to Multi-Tenant Backstage

Overview

This guide provides step-by-step instructions for migrating an existing single-tenant Backstage database to our multi-tenant architecture with minimal downtime.

Prerequisites

  • PostgreSQL 14+ (with uuid-ossp, pg_trgm, pg_cron extensions)
  • Backstage running on default catalog schema
  • Database backup completed
  • Maintenance window scheduled (optional, for final cutover)

Migration Strategy

Approach: Blue-Green Deployment with zero-downtime rollback capability.

  1. Phase 1: Add tenant_id column (backward compatible)
  2. Phase 2: Backfill tenant_id for existing data
  3. Phase 3: Update application code to set tenant context
  4. Phase 4: Enable Row-Level Security (RLS)
  5. Phase 5: Create partitions and optimize indexes
  6. Phase 6: Cutover and verify

Phase 1: Add Tenant ID Column (Backward Compatible)

Duration: 5-10 minutes Downtime: None (online schema change)

-- 1. Create tenants table
CREATE TABLE tenants (
tenant_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_slug TEXT UNIQUE NOT NULL,
tenant_name TEXT NOT NULL,
max_entities INT DEFAULT 50000,
entity_ttl_days INT DEFAULT 365,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
CHECK (tenant_slug ~ '^[a-z0-9-]+$')
);

CREATE INDEX idx_tenants_slug ON tenants (tenant_slug) WHERE deleted_at IS NULL;

-- 2. Add tenant_id column to entities (nullable initially)
ALTER TABLE entities ADD COLUMN tenant_id UUID;

-- 3. Add tenant_id to relations
ALTER TABLE relations ADD COLUMN tenant_id UUID;

-- 4. Add tenant_id to refresh_state
ALTER TABLE refresh_state ADD COLUMN tenant_id UUID;

-- 5. Create indexes (for backfill query performance)
CREATE INDEX idx_entities_tenant_temp ON entities (tenant_id);
CREATE INDEX idx_relations_tenant_temp ON relations (tenant_id);
CREATE INDEX idx_refresh_tenant_temp ON refresh_state (tenant_id);

Verification:

-- Check columns added successfully
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'entities' AND column_name = 'tenant_id';

Phase 2: Backfill Tenant ID

Duration: 10-30 minutes (depends on data size) Downtime: None (runs in background)

-- 1. Insert default tenant (represents existing single-tenant data)
INSERT INTO tenants (tenant_id, tenant_slug, tenant_name)
VALUES ('00000000-0000-0000-0000-000000000001', 'default', 'Default Tenant');

-- 2. Backfill tenant_id for existing entities
UPDATE entities
SET tenant_id = '00000000-0000-0000-0000-000000000001'
WHERE tenant_id IS NULL;

-- 3. Backfill tenant_id for relations
UPDATE relations
SET tenant_id = '00000000-0000-0000-0000-000000000001'
WHERE tenant_id IS NULL;

-- 4. Backfill tenant_id for refresh_state
UPDATE refresh_state
SET tenant_id = '00000000-0000-0000-0000-000000000001'
WHERE tenant_id IS NULL;

-- 5. Make tenant_id NOT NULL (after backfill complete)
ALTER TABLE entities ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE relations ALTER COLUMN tenant_id SET NOT NULL;
ALTER TABLE refresh_state ALTER COLUMN tenant_id SET NOT NULL;

-- 6. Add foreign key constraints
ALTER TABLE entities
ADD CONSTRAINT fk_entities_tenant
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE;

ALTER TABLE relations
ADD CONSTRAINT fk_relations_tenant
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE;

ALTER TABLE refresh_state
ADD CONSTRAINT fk_refresh_tenant
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE;

Verification:

-- Check all entities have tenant_id
SELECT COUNT(*) AS total_entities,
COUNT(tenant_id) AS entities_with_tenant
FROM entities;

-- Should show: total_entities = entities_with_tenant

Phase 3: Update Application Code

Duration: 1-2 hours (development + testing) Downtime: None (deploy with feature flag)

Middleware: Set Tenant Context

// src/middleware/tenant-context.ts
import { Request, Response, NextFunction } from 'express';
import { Pool } from 'pg';

export interface TenantRequest extends Request {
tenantId: string;
}

export function tenantContextMiddleware(pool: Pool) {
return async (req: TenantRequest, res: Response, next: NextFunction) => {
try {
// Extract tenant from JWT, subdomain, or header
const tenantSlug = extractTenantSlug(req);

// Lookup tenant_id from slug
const result = await pool.query(
'SELECT tenant_id FROM tenants WHERE tenant_slug = $1 AND deleted_at IS NULL',
[tenantSlug]
);

if (result.rows.length === 0) {
return res.status(403).json({ error: 'Invalid tenant' });
}

req.tenantId = result.rows[0].tenant_id;
next();
} catch (error) {
next(error);
}
};
}

function extractTenantSlug(req: TenantRequest): string {
// Option 1: From subdomain (tenant.backstage.example.com)
const subdomain = req.hostname.split('.')[0];
if (subdomain && subdomain !== 'www') {
return subdomain;
}

// Option 2: From JWT claim
const token = req.headers.authorization?.split(' ')[1];
if (token) {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
return decoded.tenantSlug;
}

// Option 3: From header (for API clients)
const header = req.headers['x-tenant-slug'];
if (header) {
return header as string;
}

// Fallback: default tenant (for backward compatibility)
return 'default';
}

Database Connection Pool

// src/database/tenant-pool.ts
import { Pool, PoolClient } from 'pg';

export class TenantPool {
private pool: Pool;

constructor(config: any) {
this.pool = new Pool({
host: config.host,
database: config.database,
max: 100,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
}

async withTenant<T>(tenantId: string, callback: (client: PoolClient) => Promise<T>): Promise<T> {
const client = await this.pool.connect();
try {
// Set tenant context for RLS
await client.query('SET app.tenant_id = $1', [tenantId]);

// Execute callback with tenant context
const result = await callback(client);

return result;
} finally {
// Reset context before returning to pool
await client.query('RESET app.tenant_id');
client.release();
}
}
}

Update Backstage Catalog API

// src/catalog/catalog-service.ts
import { TenantPool } from '../database/tenant-pool';

export class CatalogService {
constructor(private pool: TenantPool) {}

async getEntity(tenantId: string, entityRef: string) {
return this.pool.withTenant(tenantId, async (client) => {
const result = await client.query(
'SELECT final_entity FROM entities WHERE entity_ref = $1 AND deleted_at IS NULL',
[entityRef]
);
return result.rows[0]?.final_entity;
});
}

async listEntities(tenantId: string, kind: string, limit = 100) {
return this.pool.withTenant(tenantId, async (client) => {
const result = await client.query(
'SELECT entity_ref, entity_name, created_at FROM entities ' +
'WHERE entity_kind = $1 AND deleted_at IS NULL ' +
'ORDER BY created_at DESC LIMIT $2',
[kind, limit]
);
return result.rows;
});
}
}

Deploy with Feature Flag:

// src/config.ts
export const config = {
multiTenancyEnabled: process.env.MULTI_TENANCY_ENABLED === 'true',
defaultTenantId: '00000000-0000-0000-0000-000000000001',
};

// In middleware
if (config.multiTenancyEnabled) {
app.use(tenantContextMiddleware(pool));
} else {
// Set default tenant for backward compatibility
app.use((req: TenantRequest, res, next) => {
req.tenantId = config.defaultTenantId;
next();
});
}

Phase 4: Enable Row-Level Security

Duration: 5 minutes Downtime: None (backward compatible with updated app code)

-- 1. Enable RLS on entities
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_entities ON entities
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);

-- 2. Enable RLS on relations
ALTER TABLE relations ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_relations ON relations
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);

-- 3. Enable RLS on refresh_state
ALTER TABLE refresh_state ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_refresh ON refresh_state
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);

-- 4. Create admin role (bypasses RLS)
CREATE ROLE backstage_admin;
GRANT BYPASS RLS ON entities TO backstage_admin;
GRANT BYPASS RLS ON relations TO backstage_admin;
GRANT BYPASS RLS ON refresh_state TO backstage_admin;

-- 5. Grant admin role to application user (for migrations)
GRANT backstage_admin TO backstage_app;

Verification:

-- Test RLS is working
SET app.tenant_id = '00000000-0000-0000-0000-000000000001';
SELECT COUNT(*) FROM entities; -- Should return all entities

SET app.tenant_id = '00000000-0000-0000-0000-000000000002';
SELECT COUNT(*) FROM entities; -- Should return 0 (no entities for tenant 2)

RESET app.tenant_id;

Phase 5: Update Primary Keys & Create Partitions

Duration: 30-60 minutes Downtime: Required (5-10 minutes for primary key swap)

Step 1: Create New Partitioned Table

-- 1. Create partitioned table structure
CREATE TABLE entities_new (
tenant_id UUID NOT NULL,
entity_id UUID NOT NULL DEFAULT gen_random_uuid(),
entity_ref TEXT NOT NULL,
entity_kind TEXT NOT NULL,
entity_namespace TEXT NOT NULL DEFAULT 'default',
entity_name TEXT NOT NULL,
final_entity JSONB NOT NULL,
hash TEXT NOT NULL,
stitch_ticket TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
PRIMARY KEY (tenant_id, entity_id),
UNIQUE (tenant_id, entity_ref),
FOREIGN KEY (tenant_id) REFERENCES tenants(tenant_id) ON DELETE CASCADE,
CHECK (entity_ref = entity_kind || ':' || entity_namespace || '/' || entity_name)
) PARTITION BY LIST (tenant_id);

-- 2. Create default partition
CREATE TABLE entities_new_default PARTITION OF entities_new DEFAULT;

-- 3. Create partition for default tenant
CREATE TABLE entities_new_default_tenant PARTITION OF entities_new
FOR VALUES IN ('00000000-0000-0000-0000-000000000001');

-- 4. Enable RLS on new table
ALTER TABLE entities_new ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_entities_new ON entities_new
FOR ALL
USING (tenant_id = current_setting('app.tenant_id', true)::UUID);

-- 5. Migrate data (this can take time for large datasets)
INSERT INTO entities_new SELECT * FROM entities;

-- 6. Create indexes on new table
CREATE INDEX idx_entities_new_ref ON entities_new (tenant_id, entity_ref)
WHERE deleted_at IS NULL;

CREATE INDEX idx_entities_new_kind ON entities_new (tenant_id, entity_kind, created_at DESC)
WHERE deleted_at IS NULL;

CREATE INDEX idx_entities_new_name_trgm ON entities_new
USING gin (entity_name gin_trgm_ops)
WHERE deleted_at IS NULL;

CREATE INDEX idx_entities_new_json ON entities_new
USING gin (final_entity jsonb_path_ops);

Step 2: Swap Tables (Maintenance Window)

Downtime Window: 5-10 minutes

-- MAINTENANCE WINDOW STARTS

BEGIN;

-- 1. Rename old table
ALTER TABLE entities RENAME TO entities_old;

-- 2. Rename new table to production name
ALTER TABLE entities_new RENAME TO entities;

-- 3. Update foreign key constraints in relations table
ALTER TABLE relations DROP CONSTRAINT IF EXISTS fk_relations_entity;
ALTER TABLE relations
ADD CONSTRAINT fk_relations_entity
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;

-- 4. Update refresh_state foreign keys
ALTER TABLE refresh_state DROP CONSTRAINT IF EXISTS fk_refresh_entity;
ALTER TABLE refresh_state
ADD CONSTRAINT fk_refresh_entity
FOREIGN KEY (tenant_id, entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;

-- 5. Commit transaction (POINT OF NO RETURN)
COMMIT;

-- MAINTENANCE WINDOW ENDS

Step 3: Clean Up Old Table

-- Wait 24-48 hours, then drop old table
-- DROP TABLE entities_old;

Repeat for relations and refresh_state tables.


Phase 6: Cutover and Verification

Duration: 1 hour Downtime: None (gradual rollout)

Step 1: Enable Multi-Tenancy

# Update environment variable
export MULTI_TENANCY_ENABLED=true

# Restart application
kubectl rollout restart deployment/backstage

Step 2: Smoke Tests

# Test 1: Query entities with tenant context
curl -H "X-Tenant-Slug: default" \
https://backstage.example.com/api/catalog/entities

# Test 2: Create new tenant
curl -X POST https://backstage.example.com/api/tenants \
-H "Content-Type: application/json" \
-d '{"tenant_slug": "acme", "tenant_name": "ACME Corp"}'

# Test 3: Query as new tenant (should return 0 entities)
curl -H "X-Tenant-Slug: acme" \
https://backstage.example.com/api/catalog/entities

Step 3: Performance Benchmarks

-- Run performance tests (see query-examples.sql)
EXPLAIN (ANALYZE, BUFFERS)
SELECT final_entity
FROM entities
WHERE entity_ref = 'component:default/test-service'
AND deleted_at IS NULL;

-- Expected: < 20ms with partition pruning

Step 4: Monitor Metrics

# Check database connections
SELECT count(*), usename, application_name
FROM pg_stat_activity
GROUP BY usename, application_name;

# Check slow queries
SELECT * FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC;

# Check RLS overhead
SELECT * FROM pg_stat_user_tables WHERE tablename = 'entities';

Rollback Plan

If Issues Found (Before Primary Key Swap)

-- Simply disable feature flag
export MULTI_TENANCY_ENABLED=false

-- RLS is transparent when app.tenant_id is not set

If Issues Found (After Primary Key Swap)

Requires Maintenance Window (10-15 minutes)

BEGIN;

-- 1. Rename current table
ALTER TABLE entities RENAME TO entities_partitioned;

-- 2. Restore old table
ALTER TABLE entities_old RENAME TO entities;

-- 3. Restore foreign keys
ALTER TABLE relations DROP CONSTRAINT fk_relations_entity;
ALTER TABLE relations ADD CONSTRAINT fk_relations_entity
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;

-- 4. Disable multi-tenancy
-- export MULTI_TENANCY_ENABLED=false

COMMIT;

Post-Migration Checklist

  • All entities have tenant_id populated
  • RLS policies active and enforced
  • Indexes created and used (check EXPLAIN ANALYZE)
  • Partitions created for default tenant
  • Application code sets app.tenant_id correctly
  • Performance benchmarks meet targets (< 50ms entity lookups)
  • Backup and restore tested
  • Monitoring dashboards updated
  • Documentation updated
  • Old tables archived/dropped (after 48 hours)

Troubleshooting

Issue: Queries slow after migration

Cause: Missing indexes or RLS overhead

Solution:

-- Check index usage
SELECT * FROM pg_stat_user_indexes WHERE tablename = 'entities';

-- Recreate missing indexes
CREATE INDEX idx_entities_ref ON entities (tenant_id, entity_ref)
WHERE deleted_at IS NULL;

Issue: "permission denied for table entities"

Cause: RLS blocking queries without tenant context

Solution:

// Ensure app.tenant_id is set in middleware
await client.query('SET app.tenant_id = $1', [tenantId]);

Issue: Foreign key constraint violations

Cause: Composite primary key (tenant_id, entity_id) not matching foreign keys

Solution:

-- Update foreign key to include tenant_id
ALTER TABLE relations DROP CONSTRAINT fk_relations_entity;
ALTER TABLE relations ADD CONSTRAINT fk_relations_entity
FOREIGN KEY (tenant_id, originating_entity_id)
REFERENCES entities(tenant_id, entity_id) ON DELETE CASCADE;

Next Steps

  1. Add More Tenants: Use INSERT INTO tenants to onboard new customers
  2. Monitor Performance: Set up Prometheus metrics for query latency
  3. Optimize Queries: Use EXPLAIN ANALYZE to tune slow queries
  4. Scale Horizontally: Add read replicas at 2000+ QPS
  5. Plan Sharding: Consider sharding at 500+ tenants

Support

For migration assistance: