Research: Terraform State to Backstage Catalog Entity Mapping
Research Date: 2025-11-15
Researcher: Claude (Research Agent)
Project: Terraform State Backstage Plugin
Related Specs: /specs/001-terraform-state-plugin/spec.md
Executive Summary
This research investigates how to map Terraform state files (v4 JSON format) to Backstage catalog entities, focusing on GCP infrastructure resources. The analysis covers:
- Terraform State Schema Structure (v4)
- Resource Dependency Representation
- Backstage Entity Kinds Mapping
- Entity Relationship Modeling
- Sensitive Data Detection & Filtering
- Resource Addressing & Unique Identification
Key Findings
- Terraform state v4 uses a flat
resources[]array with instance-leveldependencies[]arrays - GCP resources map primarily to Resource kind, with System for organizational units
- Dependencies are represented via
dependencies[](Terraform addresses) → dependsOn relations - Sensitive data is marked at output level (
sensitive: true) and via naming patterns - Resource addressing follows:
<type>.<name>[<index>]or module-qualified paths
1. Terraform State Schema (v4 JSON Format)
1.1 Top-Level Structure
{
"version": 4, // State format version (current: 4)
"terraform_version": "1.5.0", // Terraform CLI version
"serial": 42, // State serial number (increments on change)
"lineage": "uuid", // Unique state file lineage identifier
"outputs": { ... }, // Terraform outputs
"resources": [ ... ] // Flat array of all resources
}
Key Characteristics:
- Flat Structure: All resources in single
resources[]array (no nesting by module) - Version: Always check
version: 4for schema compatibility - Serial: Increments on each state update (use for change detection)
- Lineage: Unique UUID for state file identity (same across state versions)
1.2 Resource Object Schema
Each resource in resources[] array:
{
"mode": "managed", // "managed" | "data" (data sources)
"type": "google_compute_network", // Resource type (provider-specific)
"name": "main_vpc", // Resource name from Terraform config
"provider": "provider[\"registry.terraform.io/hashicorp/google\"]",
"module": "module.networking", // (Optional) Module path if in module
"instances": [ // Array for count/for_each resources
{
"schema_version": 1, // Resource schema version
"index_key": 0, // (Optional) Index for count/for_each
"attributes": { // All resource attributes
"id": "projects/my-project/global/networks/main-vpc",
"name": "main-vpc",
"auto_create_subnetworks": false,
"self_link": "https://www.googleapis.com/compute/v1/..."
},
"sensitive_attributes": [], // List of sensitive attribute paths
"private": "base64data", // Provider-specific private state
"dependencies": [ // Terraform addresses this depends on
"google_project_service.compute_api"
]
}
]
}
Critical Fields for Backstage Mapping:
type→ Entity metadata annotation (terraform.io/resource-type)name→ Part of entity name generationmode→ Filter out"data"sources (read-only, not managed infrastructure)instances[].attributes→ Entity spec propertiesinstances[].dependencies[]→ Backstage entity relations
1.3 Outputs Structure
{
"outputs": {
"vpc_id": {
"value": "vpc-12345",
"type": "string",
"sensitive": false // CRITICAL: marks if output is sensitive
},
"db_password": {
"value": "[REDACTED]", // Often redacted in state
"type": "string",
"sensitive": true // Flag for filtering
}
}
}
Sensitive Data Handling:
- Detection:
sensitive: trueflag at output level - Action: Exclude from Backstage entity annotations or redact value
- Note: State may already redact sensitive values (not guaranteed)
2. Resource Dependencies in Terraform State
2.1 Dependency Representation
Dependencies are stored as Terraform addresses in instances[].dependencies[]:
{
"type": "google_compute_subnetwork",
"name": "subnet",
"instances": [
{
"dependencies": [
"google_compute_network.main_vpc", // Direct resource reference
"module.common.google_project_service.compute_api" // Module-qualified
]
}
]
}
Terraform Address Format:
- Simple:
<resource_type>.<resource_name> - With count:
<resource_type>.<resource_name>[<index>] - Module:
module.<module_name>.<resource_type>.<resource_name> - Module with count:
module.<module_name>[<index>].<resource_type>.<resource_name>
2.2 Dependency Types
| Type | Example | Meaning |
|---|---|---|
| Explicit | depends_on in Terraform config | Forced dependency |
| Implicit | Reference to .id or .name | Inferred from attribute usage |
| Data Source | data.google_project.current | Read-only dependency (filter out) |
Key Insight: State only stores final dependency graph (no distinction between explicit/implicit)
2.3 Example Dependency Chain
Terraform Config:
resource "google_compute_network" "vpc" {
name = "main-vpc"
}
resource "google_compute_subnetwork" "subnet" {
name = "subnet-1"
network = google_compute_network.vpc.id # Implicit dependency
}
resource "google_compute_instance" "vm" {
name = "vm-1"
subnetwork = google_compute_subnetwork.subnet.id
depends_on = [google_project_service.compute_api] # Explicit dependency
}
Resulting State Dependencies:
// VPC (no dependencies)
{
"type": "google_compute_network",
"name": "vpc",
"instances": [{ "dependencies": [] }]
}
// Subnet (depends on VPC)
{
"type": "google_compute_subnetwork",
"name": "subnet",
"instances": [{
"dependencies": ["google_compute_network.vpc"]
}]
}
// VM (depends on subnet AND API enablement)
{
"type": "google_compute_instance",
"name": "vm",
"instances": [{
"dependencies": [
"google_compute_subnetwork.subnet",
"google_project_service.compute_api"
]
}]
}
Hierarchical Relationship:
google_compute_network.vpc
└── google_compute_subnetwork.subnet
└── google_compute_instance.vm
3. Backstage Entity Kinds for Infrastructure
3.1 Entity Kind Selection Matrix
| Terraform Resource Type | Backstage Kind | Rationale |
|---|---|---|
| Organizational Units | ||
google_project | System | High-level boundary grouping resources |
google_folder | System | Organizational hierarchy |
| Network Infrastructure | ||
google_compute_network | Resource | Infrastructure component |
google_compute_subnetwork | Resource | Network subdivision |
google_compute_firewall | Resource | Security resource |
google_compute_router | Resource | Network routing |
| Compute Resources | ||
google_compute_instance | Resource | Virtual machine |
google_compute_instance_group | Resource | VM grouping |
google_container_cluster | Resource | GKE cluster (could be System if large) |
| Storage | ||
google_storage_bucket | Resource | Object storage |
google_sql_database_instance | Resource | Managed database |
| IAM & Security | ||
google_service_account | Resource | Identity resource |
google_kms_key_ring | Resource | Encryption keys |
| APIs & Services | ||
google_project_service | Resource | Enabled API (consider filtering) |
3.2 Entity Kind Definitions
Resource (Primary Kind for Infrastructure)
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: vpc-main-vpc-prod
description: Main VPC network for production environment
annotations:
terraform.io/resource-type: "google_compute_network"
terraform.io/resource-address: "google_compute_network.main_vpc"
terraform.io/state-source: "gs://my-bucket/terraform.tfstate"
terraform.io/environment: "production"
cloud.google.com/project-id: "my-gcp-project"
labels:
environment: production
managed-by: terraform
cloud-provider: gcp
spec:
type: network # Resource subtype
owner: platform-team
system: gcp-production-system # Link to System entity
dependsOn:
- resource:default/gcp-project-my-project
Spec Fields:
type: Sub-categorize resources (network,compute,storage,database,iam)owner: Team/group owning the resource (from Terraform tags or config)system: Optional link to parent System entitydependsOn: Relations to other resources (from Terraform dependencies)
System (For Organizational Units)
apiVersion: backstage.io/v1alpha1
kind: System
metadata:
name: gcp-production-system
description: Production GCP project infrastructure
annotations:
terraform.io/state-source: "gs://prod-tfstate/terraform.tfstate"
cloud.google.com/project-id: "my-gcp-project"
spec:
owner: platform-team
domain: infrastructure # Optional domain grouping
Use Cases:
- GCP Projects → System (groups all resources in project)
- GCP Folders → System (organizational hierarchy)
- Large infrastructure units (e.g., entire VPC with subnets, firewall, NAT)
API (For Exposed Services)
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: cloud-sql-instance-connection
annotations:
terraform.io/resource-type: "google_sql_database_instance"
cloud.google.com/connection-name: "my-project:region:instance"
spec:
type: database-connection # API type
lifecycle: production
owner: data-team
system: gcp-production-system
definition:
host: 10.1.0.5
port: 5432
database: backstage
Use Cases:
- Cloud SQL instances with connection endpoints
- Cloud Run services with public URLs
- Load balancers with external IPs
4. Entity Relationships from Terraform Dependencies
4.1 Backstage Relation Types
| Relation Type | Direction | Terraform Mapping |
|---|---|---|
dependsOn | Forward | Resource A depends on Resource B (A → B) |
dependencyOf | Reverse | Inverse of dependsOn (auto-generated) |
partOf | Containment | Resource is part of System (e.g., Subnet part of VPC) |
hasPart | Inverse | System has Resource (auto-generated) |
4.2 Mapping Terraform Dependencies to Relations
Terraform State:
{
"type": "google_compute_subnetwork",
"name": "subnet",
"instances": [{
"dependencies": ["google_compute_network.main_vpc"]
}]
}
Backstage Entity (Subnet):
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: subnet-subnet-1
spec:
type: subnetwork
relations:
- type: dependsOn # Forward dependency
targetRef: resource:default/vpc-main-vpc
- type: partOf # Logical containment
targetRef: resource:default/vpc-main-vpc
Backstage Entity (VPC) - Auto-Generated Relations:
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: vpc-main-vpc
spec:
type: network
relations:
- type: dependencyOf # Reverse (auto-generated)
targetRef: resource:default/subnet-subnet-1
- type: hasPart # Reverse containment
targetRef: resource:default/subnet-subnet-1
4.3 Relation Creation Algorithm
interface TerraformResource {
type: string;
name: string;
instances: Array<{
dependencies: string[]; // Terraform addresses
}>;
}
interface BackstageRelation {
type: 'dependsOn' | 'partOf' | 'dependencyOf' | 'hasPart';
targetRef: string; // Format: "resource:default/<entity-name>"
}
function createRelations(resource: TerraformResource): BackstageRelation[] {
const relations: BackstageRelation[] = [];
for (const instance of resource.instances) {
for (const dep of instance.dependencies) {
// Parse Terraform address: "google_compute_network.main_vpc"
const [depType, depName] = parseTerraformAddress(dep);
// Skip data sources (not managed infrastructure)
if (dep.startsWith('data.')) continue;
// Create dependsOn relation
relations.push({
type: 'dependsOn',
targetRef: `resource:default/${generateEntityName(depType, depName)}`
});
// Create partOf relation for logical hierarchies
if (isLogicalChild(resource.type, depType)) {
relations.push({
type: 'partOf',
targetRef: `resource:default/${generateEntityName(depType, depName)}`
});
}
}
}
return relations;
}
function isLogicalChild(childType: string, parentType: string): boolean {
const hierarchies: Record<string, string[]> = {
'google_compute_network': [
'google_compute_subnetwork',
'google_compute_firewall',
'google_compute_router'
],
'google_compute_subnetwork': [
'google_compute_instance',
'google_compute_address'
],
'google_project': [
'google_compute_network',
'google_storage_bucket',
'google_sql_database_instance'
]
};
return hierarchies[parentType]?.includes(childType) || false;
}
4.4 Handling Circular Dependencies
Problem: Terraform prevents true circular dependencies, but state may have complex graphs.
Solution: Backstage relations are directional and non-blocking:
- Create
dependsOnrelations in both directions if needed - UI graph rendering handles cycles gracefully (shows as bidirectional edges)
- No validation errors on circular relations
5. Sensitive Data Detection & Filtering
5.1 Detection Strategies
Strategy 1: Terraform Sensitive Flag
{
"outputs": {
"db_password": {
"value": "secret123",
"sensitive": true // ✅ Explicit marker
}
},
"resources": [{
"instances": [{
"sensitive_attributes": [ // ✅ List of sensitive paths
"password",
"private_key"
]
}]
}]
}
Filtering Logic:
// Filter outputs
if (output.sensitive === true) {
return '[REDACTED]'; // Don't expose in Backstage
}
// Filter resource attributes
for (const sensitivePath of instance.sensitive_attributes) {
delete instance.attributes[sensitivePath];
}
Strategy 2: Attribute Name Pattern Matching
From existing codebase (/feature-terraform-state-plugin/security/sensitive-data-taxonomy.md):
const SENSITIVE_PATTERNS = [
// Credentials (CRITICAL)
/private_key/i,
/password/i,
/secret/i,
/api_key/i,
/access_token/i,
/auth_token/i,
/service_account_key/i,
// Network (HIGH - configurable)
/private_ip_address/i,
/internal_ip/i,
/connection_string/i,
// Crypto (CRITICAL)
/encryption_key/i,
/master_key/i,
/kms_key/i,
// Provider-specific (CRITICAL)
/AIza[0-9A-Za-z-_]{35}/, // Google API key
/AKIA[0-9A-Z]{16}/, // AWS access key
/-----BEGIN.*PRIVATE KEY-----/ // PEM private key
];
function isSensitiveAttribute(attrName: string, attrValue: any): boolean {
return SENSITIVE_PATTERNS.some(pattern =>
pattern.test(attrName) || pattern.test(String(attrValue))
);
}
Strategy 3: Resource Type Allowlist
const SENSITIVE_RESOURCE_TYPES = [
'google_service_account_key', // Always contains private keys
'google_secret_manager_secret', // Secret storage
'tls_private_key', // Crypto resources
'random_password', // Generated secrets
];
function shouldSkipResource(type: string): boolean {
return SENSITIVE_RESOURCE_TYPES.includes(type);
}
5.2 Recommended Filtering Approach
Multi-Layer Defense:
interface SanitizationResult {
sanitized: Record<string, any>;
redactedFields: string[];
warningLevel: 'none' | 'info' | 'warning' | 'critical';
}
function sanitizeResourceAttributes(
resource: TerraformResource,
instance: ResourceInstance
): SanitizationResult {
const sanitized = { ...instance.attributes };
const redacted: string[] = [];
let warningLevel: 'none' | 'info' | 'warning' | 'critical' = 'none';
// Layer 1: Terraform-marked sensitive attributes
for (const sensitivePath of instance.sensitive_attributes || []) {
delete sanitized[sensitivePath];
redacted.push(sensitivePath);
warningLevel = 'critical';
}
// Layer 2: Pattern-based detection
for (const [key, value] of Object.entries(sanitized)) {
if (isSensitiveAttribute(key, value)) {
sanitized[key] = '[REDACTED]';
redacted.push(key);
warningLevel = warningLevel === 'critical' ? 'critical' : 'warning';
}
}
// Layer 3: Private IP masking (configurable)
for (const [key, value] of Object.entries(sanitized)) {
if (typeof value === 'string' && isPrivateIP(value)) {
sanitized[key] = maskPrivateIP(value); // "10.1.2.3" → "10.x.x.x"
redacted.push(key);
warningLevel = warningLevel === 'none' ? 'info' : warningLevel;
}
}
return { sanitized, redactedFields: redacted, warningLevel };
}
Audit Trail:
// Log all sanitization actions for security audit
logger.info('Resource sanitization', {
resourceType: resource.type,
resourceName: resource.name,
redactedFields: result.redactedFields,
warningLevel: result.warningLevel,
timestamp: new Date().toISOString()
});
6. Resource Addressing & Unique Identification
6.1 Terraform Resource Addressing
Standard Format:
<mode>.<type>.<name>[<index>]
Examples:
google_compute_network.main_vpc
google_compute_subnetwork.subnet[0]
module.common.google_project_service.compute_api
module.networking[0].google_compute_firewall.allow_ssh
data.google_project.current
Components:
mode:resource(managed) ordata(read-only) - omitted in state dependenciestype: Provider-specific resource type (e.g.,google_compute_network)name: User-defined name from Terraform config[index]: Optional index forcountor key forfor_eachresourcesmodule.*: Module path prefix (can be nested)
6.2 Parsing Terraform Addresses
interface TerraformAddress {
modulePath: string[]; // ["common", "networking"] or []
mode: 'managed' | 'data'; // Inferred from prefix
type: string; // "google_compute_network"
name: string; // "main_vpc"
index?: number | string; // 0 or "key" (optional)
}
function parseTerraformAddress(address: string): TerraformAddress {
const parts = address.split('.');
const modulePath: string[] = [];
// Extract module path
while (parts[0] === 'module') {
parts.shift(); // Remove 'module'
const moduleName = parts.shift()!;
// Handle module index: module.networking[0]
const [name, index] = parseNameAndIndex(moduleName);
modulePath.push(name);
}
// Check if data source
const mode = parts[0] === 'data' ? 'data' : 'managed';
if (mode === 'data') parts.shift();
// Extract type and name
const type = parts[0];
const nameWithIndex = parts[1];
const [name, index] = parseNameAndIndex(nameWithIndex);
return { modulePath, mode, type, name, index };
}
function parseNameAndIndex(str: string): [string, number | string | undefined] {
const match = str.match(/^([^\[]+)(?:\[(.+)\])?$/);
if (!match) throw new Error(`Invalid name format: ${str}`);
const [, name, index] = match;
return [name, index ? (isNaN(+index) ? index : +index) : undefined];
}
Example Usage:
const address = "module.networking.google_compute_firewall.allow_ssh[0]";
const parsed = parseTerraformAddress(address);
console.log(parsed);
// {
// modulePath: ["networking"],
// mode: "managed",
// type: "google_compute_firewall",
// name: "allow_ssh",
// index: 0
// }
6.3 Generating Backstage Entity Names
Naming Strategy:
interface EntityNamingConfig {
includeEnvironment: boolean; // Append -prod, -nonprod
includeModule: boolean; // Append module path
includeIndex: boolean; // Append -0, -1 for indexed resources
separator: string; // Default: "-"
}
function generateEntityName(
resource: TerraformResource,
instance: ResourceInstance,
environment: string,
config: EntityNamingConfig
): string {
const parts: string[] = [];
// 1. Resource type prefix (simplified)
const typePrefix = simplifyResourceType(resource.type);
parts.push(typePrefix);
// 2. Module path (optional)
if (config.includeModule && resource.module) {
const modulePath = resource.module.replace(/^module\./, '');
parts.push(modulePath.replace(/\./g, '-'));
}
// 3. Resource name
parts.push(resource.name);
// 4. Instance index (optional)
if (config.includeIndex && instance.index_key !== undefined) {
parts.push(String(instance.index_key));
}
// 5. Environment suffix (optional)
if (config.includeEnvironment) {
parts.push(environment); // "prod", "nonprod"
}
return parts.join(config.separator).toLowerCase();
}
function simplifyResourceType(type: string): string {
// Remove provider prefix: "google_compute_network" → "network"
const withoutProvider = type.replace(/^(google|aws|azurerm)_/, '');
// Simplify common types
const simplifications: Record<string, string> = {
'compute_network': 'vpc',
'compute_subnetwork': 'subnet',
'compute_instance': 'vm',
'compute_firewall': 'firewall',
'storage_bucket': 'bucket',
'sql_database_instance': 'sql-instance',
'container_cluster': 'gke-cluster',
};
return simplifications[withoutProvider] || withoutProvider;
}
Example Entity Names:
// Simple resource (no module, no index)
// Type: google_compute_network, Name: main_vpc, Env: prod
→ "vpc-main-vpc-prod"
// Module resource (with module path)
// Module: module.networking, Type: google_compute_firewall, Name: allow_ssh
→ "firewall-networking-allow-ssh-prod"
// Indexed resource (with for_each)
// Type: google_compute_subnetwork, Name: subnet, Index: "us-central1"
→ "subnet-subnet-us-central1-prod"
6.4 Handling Name Collisions
Problem: Different resources might generate same entity name.
Solution 1: Hash-Based Disambiguation
function ensureUniqueName(baseName: string, existingNames: Set<string>): string {
if (!existingNames.has(baseName)) {
return baseName;
}
// Append short hash of full Terraform address
const hash = createHash('sha256')
.update(fullTerraformAddress)
.digest('hex')
.substring(0, 8);
return `${baseName}-${hash}`;
}
Solution 2: Incremental Suffix
function ensureUniqueName(baseName: string, existingNames: Set<string>): string {
if (!existingNames.has(baseName)) {
return baseName;
}
let counter = 1;
while (existingNames.has(`${baseName}-${counter}`)) {
counter++;
}
return `${baseName}-${counter}`;
}
Recommendation: Use hash-based approach for deterministic, reproducible names across state updates.
7. Practical Mapping Examples
7.1 Example: GCP VPC Network
Terraform State:
{
"mode": "managed",
"type": "google_compute_network",
"name": "gcp-network",
"provider": "provider[\"registry.terraform.io/hashicorp/google\"]",
"module": "module.gcp-network",
"instances": [{
"schema_version": 1,
"attributes": {
"id": "projects/my-project/global/networks/private-network",
"name": "private-network",
"auto_create_subnetworks": false,
"routing_mode": "REGIONAL",
"self_link": "https://www.googleapis.com/compute/v1/projects/my-project/global/networks/private-network",
"project": "my-project"
},
"sensitive_attributes": [],
"dependencies": []
}]
}
Backstage Entity:
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: vpc-private-network-prod
description: Private VPC network (private-network) managed by Terraform
annotations:
terraform.io/resource-type: google_compute_network
terraform.io/resource-address: module.gcp-network.google_compute_network.gcp-network
terraform.io/state-source: gs://my-bucket/prod/terraform.tfstate
terraform.io/module-path: gcp-network
cloud.google.com/project-id: my-project
cloud.google.com/resource-id: projects/my-project/global/networks/private-network
labels:
environment: production
managed-by: terraform
cloud-provider: gcp
resource-category: networking
tags:
- vpc
- network
- gcp
spec:
type: network
lifecycle: production
owner: platform-team
system: gcp-production-system
# Custom properties from Terraform attributes
definition:
name: private-network
routing_mode: REGIONAL
auto_create_subnetworks: false
self_link: https://www.googleapis.com/compute/v1/projects/my-project/global/networks/private-network
7.2 Example: GCP Subnet with Dependency
Terraform State:
{
"mode": "managed",
"type": "google_compute_subnetwork",
"name": "subnet",
"module": "module.gcp-network",
"instances": [{
"attributes": {
"id": "projects/my-project/regions/us-central1/subnetworks/shared-subnet",
"name": "shared-subnet",
"network": "projects/my-project/global/networks/private-network",
"ip_cidr_range": "10.1.0.0/28",
"region": "us-central1",
"private_ip_google_access": true
},
"dependencies": [
"module.gcp-network.google_compute_network.gcp-network"
]
}]
}
Backstage Entity:
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: subnet-shared-subnet-prod
description: Shared subnet (10.1.0.0/28) in us-central1
annotations:
terraform.io/resource-type: google_compute_subnetwork
terraform.io/resource-address: module.gcp-network.google_compute_subnetwork.subnet
cloud.google.com/region: us-central1
cloud.google.com/cidr: 10.x.x.x/28 # Masked private IP
spec:
type: subnetwork
lifecycle: production
owner: platform-team
system: gcp-production-system
dependsOn:
- resource:default/vpc-private-network-prod # From dependencies[]
partOf:
- resource:default/vpc-private-network-prod # Logical hierarchy
definition:
name: shared-subnet
region: us-central1
ip_cidr_range: "[REDACTED]" # Or "10.x.x.x/28"
private_ip_google_access: true
7.3 Example: Cloud SQL Instance (with Sensitive Data)
Terraform State:
{
"mode": "managed",
"type": "google_sql_database_instance",
"name": "cloud_sql_instance_backstage",
"instances": [{
"attributes": {
"id": "cloud-sql-instance-backstage",
"name": "cloud-sql-instance-backstage",
"database_version": "POSTGRES_15",
"region": "northamerica-northeast1",
"connection_name": "my-project:northamerica-northeast1:cloud-sql-instance-backstage",
"ip_address": [
{
"ip_address": "10.1.0.5",
"type": "PRIVATE"
}
],
"root_password": "super-secret-password" // ⚠️ Sensitive!
},
"sensitive_attributes": ["root_password"],
"dependencies": [
"google_service_networking_connection.private_vpc_connection"
]
}]
}
Backstage Entity (Sanitized):
apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: sql-instance-backstage-prod
description: PostgreSQL 15 database instance for Backstage
annotations:
terraform.io/resource-type: google_sql_database_instance
cloud.google.com/connection-name: my-project:northamerica-northeast1:cloud-sql-instance-backstage
backstage.io/sanitized-fields: root_password,ip_address # Audit trail
spec:
type: database
lifecycle: production
owner: data-team
dependsOn:
- resource:default/vpc-connection-private
definition:
name: cloud-sql-instance-backstage
database_version: POSTGRES_15
region: northamerica-northeast1
connection_name: my-project:northamerica-northeast1:cloud-sql-instance-backstage
ip_address: "[PRIVATE_IP]" # ✅ Redacted
# root_password field removed entirely # ✅ Filtered
8. Recommendations & Best Practices
8.1 State Parsing Recommendations
-
Always Validate State Version
if (state.version !== 4) {
throw new Error(`Unsupported Terraform state version: ${state.version}`);
} -
Filter Out Data Sources
const managedResources = state.resources.filter(r => r.mode === 'managed'); -
Handle Missing Fields Gracefully
const dependencies = instance.dependencies || [];
const attributes = instance.attributes || {}; -
Use Serial for Change Detection
// Compare serial numbers to detect state changes
if (newState.serial > lastProcessedSerial) {
// Process incremental update
}
8.2 Entity Creation Recommendations
-
Standardize Entity Naming
- Use consistent separator (hyphen
-) - Always lowercase
- Include environment suffix for multi-env setups
- Hash-based disambiguation for collisions
- Use consistent separator (hyphen
-
Rich Annotations
annotations:
terraform.io/resource-type: google_compute_network
terraform.io/resource-address: module.gcp-network.google_compute_network.gcp-network
terraform.io/state-source: gs://my-bucket/prod/terraform.tfstate
terraform.io/state-serial: "42"
terraform.io/module-path: gcp-network
terraform.io/last-updated: "2025-11-15T10:30:00Z"
cloud.google.com/project-id: my-project
cloud.google.com/resource-id: projects/my-project/global/networks/private-network -
Preserve Original Terraform Context
- Store full Terraform address in annotations
- Keep module path information
- Link to state source (GCS bucket or TFC workspace)
8.3 Relationship Mapping Recommendations
-
Create Both Logical and Dependency Relations
dependsOn: Technical dependency (Terraform requires)partOf: Logical containment (Network contains Subnet)
-
Auto-Generate Reverse Relations
- Backstage supports bidirectional relations
- Create
dependencyOfandhasPartautomatically
-
Handle Multi-Level Hierarchies
System: GCP Project
├── Resource: VPC Network
│ ├── Resource: Subnet 1
│ │ └── Resource: VM Instance 1
│ └── Resource: Subnet 2
└── Resource: Cloud SQL Instance
8.4 Sensitive Data Filtering Recommendations
-
Multi-Layer Defense
- Layer 1: Terraform
sensitive_attributesflag - Layer 2: Attribute name pattern matching
- Layer 3: Value pattern matching (API keys, IPs)
- Layer 1: Terraform
-
Configurable Sensitivity Levels
enum SensitivityLevel {
CRITICAL = 'critical', // Always redact (passwords, keys)
HIGH = 'high', // Redact by default (private IPs)
MEDIUM = 'medium', // Configurable (connection strings)
LOW = 'low' // Info only (public IPs)
} -
Audit All Sanitization
- Log every redacted field
- Track warning levels
- Generate security reports
8.5 GCP Resource Type Mapping
| GCP Resource | Backstage Kind | Type | Example Name |
|---|---|---|---|
google_project | System | project | project-my-gcp-project |
google_compute_network | Resource | network | vpc-main-network-prod |
google_compute_subnetwork | Resource | subnetwork | subnet-shared-prod |
google_compute_instance | Resource | compute | vm-web-server-prod |
google_compute_firewall | Resource | firewall | fw-allow-ssh-prod |
google_storage_bucket | Resource | storage | bucket-tfstate-prod |
google_sql_database_instance | Resource | database | sql-backstage-prod |
google_container_cluster | Resource/System | container-cluster | gke-production |
google_service_account | Resource | iam | sa-backstage-app |
google_kms_key_ring | Resource | kms | kms-vault-keyring |
9. Implementation Roadmap
Phase 1: Core Parsing (MVP)
- ✅ Parse Terraform state v4 JSON
- ✅ Extract resources, dependencies, outputs
- ✅ Filter managed resources (exclude data sources)
- ✅ Generate unique entity names
Phase 2: Entity Generation
- ✅ Create Backstage Resource entities
- ✅ Map Terraform attributes to entity spec
- ✅ Add annotations (resource type, address, state source)
- ✅ Apply environment labels
Phase 3: Relationship Mapping
- ✅ Parse Terraform addresses from dependencies
- ✅ Create
dependsOnrelations - ✅ Implement logical hierarchy (
partOffor networks/subnets) - ✅ Auto-generate reverse relations
Phase 4: Sensitive Data Filtering
- ✅ Implement multi-layer sanitization
- ✅ Pattern-based detection (passwords, keys, IPs)
- ✅ Terraform
sensitive_attributesfiltering - ✅ Audit logging for all redactions
Phase 5: Advanced Features
- ⬜ GCS bucket state ingestion
- ⬜ Terraform Cloud API integration
- ⬜ Incremental updates (serial-based)
- ⬜ Multi-environment support (prod/nonprod)
10. Related Documentation
- Terraform State Spec: Terraform JSON State Format
- Backstage Catalog: Software Catalog Overview
- Entity Kinds: Backstage Entity Kinds
- Project Spec:
/specs/001-terraform-state-plugin/spec.md - Sensitive Data Taxonomy:
/feature-terraform-state-plugin/security/sensitive-data-taxonomy.md - Architecture ADRs:
/feature-terraform-state-plugin/architecture/adr-summary.md
Appendix A: Terraform State v4 Complete Example
File: /Users/liam.helmer/repos/badal-io/repo-devex-backstage/terraform/common/main.tf (Current Infrastructure)
State Structure (Inferred from Terraform config):
{
"version": 4,
"terraform_version": "1.5.0",
"serial": 123,
"lineage": "abc123-def456",
"outputs": {
"vault_internal_url": {
"value": "https://10.1.0.10:8200",
"type": "string",
"sensitive": false
},
"backstage_db_password_secret_id": {
"value": "backstage-db-password",
"type": "string",
"sensitive": true
}
},
"resources": [
{
"mode": "managed",
"type": "google_compute_network",
"name": "gcp-network",
"module": "module.gcp-network",
"instances": [{
"attributes": {
"id": "projects/my-project/global/networks/private-network",
"name": "private-network"
},
"dependencies": []
}]
},
{
"mode": "managed",
"type": "google_compute_subnetwork",
"name": "subnet",
"module": "module.gcp-network",
"instances": [{
"attributes": {
"name": "shared-subnet",
"network": "projects/my-project/global/networks/private-network",
"ip_cidr_range": "10.1.0.0/28"
},
"dependencies": [
"module.gcp-network.google_compute_network.gcp-network"
]
}]
},
{
"mode": "managed",
"type": "google_sql_database_instance",
"name": "cloud_sql_instance_backstage",
"instances": [{
"attributes": {
"name": "cloud-sql-instance-backstage",
"connection_name": "my-project:region:instance"
},
"sensitive_attributes": ["root_password"],
"dependencies": [
"google_service_networking_connection.private_vpc_connection"
]
}]
}
]
}
End of Research Document