Skip to main content

Research: Terraform State to Backstage Catalog Entity Mapping

Research Date: 2025-11-15 Researcher: Claude (Research Agent) Project: Terraform State Backstage Plugin Related Specs: /specs/001-terraform-state-plugin/spec.md


Executive Summary

This research investigates how to map Terraform state files (v4 JSON format) to Backstage catalog entities, focusing on GCP infrastructure resources. The analysis covers:

  1. Terraform State Schema Structure (v4)
  2. Resource Dependency Representation
  3. Backstage Entity Kinds Mapping
  4. Entity Relationship Modeling
  5. Sensitive Data Detection & Filtering
  6. Resource Addressing & Unique Identification

Key Findings

  • Terraform state v4 uses a flat resources[] array with instance-level dependencies[] arrays
  • GCP resources map primarily to Resource kind, with System for organizational units
  • Dependencies are represented via dependencies[] (Terraform addresses) → dependsOn relations
  • Sensitive data is marked at output level (sensitive: true) and via naming patterns
  • Resource addressing follows: <type>.<name>[<index>] or module-qualified paths

1. Terraform State Schema (v4 JSON Format)

1.1 Top-Level Structure

{
"version": 4, // State format version (current: 4)
"terraform_version": "1.5.0", // Terraform CLI version
"serial": 42, // State serial number (increments on change)
"lineage": "uuid", // Unique state file lineage identifier
"outputs": { ... }, // Terraform outputs
"resources": [ ... ] // Flat array of all resources
}

Key Characteristics:

  • Flat Structure: All resources in single resources[] array (no nesting by module)
  • Version: Always check version: 4 for schema compatibility
  • Serial: Increments on each state update (use for change detection)
  • Lineage: Unique UUID for state file identity (same across state versions)

1.2 Resource Object Schema

Each resource in resources[] array:

{
"mode": "managed", // "managed" | "data" (data sources)
"type": "google_compute_network", // Resource type (provider-specific)
"name": "main_vpc", // Resource name from Terraform config
"provider": "provider[\"registry.terraform.io/hashicorp/google\"]",
"module": "module.networking", // (Optional) Module path if in module
"instances": [ // Array for count/for_each resources
{
"schema_version": 1, // Resource schema version
"index_key": 0, // (Optional) Index for count/for_each
"attributes": { // All resource attributes
"id": "projects/my-project/global/networks/main-vpc",
"name": "main-vpc",
"auto_create_subnetworks": false,
"self_link": "https://www.googleapis.com/compute/v1/..."
},
"sensitive_attributes": [], // List of sensitive attribute paths
"private": "base64data", // Provider-specific private state
"dependencies": [ // Terraform addresses this depends on
"google_project_service.compute_api"
]
}
]
}

Critical Fields for Backstage Mapping:

  • type → Entity metadata annotation (terraform.io/resource-type)
  • name → Part of entity name generation
  • mode → Filter out "data" sources (read-only, not managed infrastructure)
  • instances[].attributes → Entity spec properties
  • instances[].dependencies[] → Backstage entity relations

1.3 Outputs Structure

{
"outputs": {
"vpc_id": {
"value": "vpc-12345",
"type": "string",
"sensitive": false // CRITICAL: marks if output is sensitive
},
"db_password": {
"value": "[REDACTED]", // Often redacted in state
"type": "string",
"sensitive": true // Flag for filtering
}
}
}

Sensitive Data Handling:

  • Detection: sensitive: true flag at output level
  • Action: Exclude from Backstage entity annotations or redact value
  • Note: State may already redact sensitive values (not guaranteed)

2. Resource Dependencies in Terraform State

2.1 Dependency Representation

Dependencies are stored as Terraform addresses in instances[].dependencies[]:

{
"type": "google_compute_subnetwork",
"name": "subnet",
"instances": [
{
"dependencies": [
"google_compute_network.main_vpc", // Direct resource reference
"module.common.google_project_service.compute_api" // Module-qualified
]
}
]
}

Terraform Address Format:

  • Simple: <resource_type>.<resource_name>
  • With count: <resource_type>.<resource_name>[<index>]
  • Module: module.<module_name>.<resource_type>.<resource_name>
  • Module with count: module.<module_name>[<index>].<resource_type>.<resource_name>

2.2 Dependency Types

TypeExampleMeaning
Explicitdepends_on in Terraform configForced dependency
ImplicitReference to .id or .nameInferred from attribute usage
Data Sourcedata.google_project.currentRead-only dependency (filter out)

Key Insight: State only stores final dependency graph (no distinction between explicit/implicit)

2.3 Example Dependency Chain

Terraform Config:

resource "google_compute_network" "vpc" {
name = "main-vpc"
}

resource "google_compute_subnetwork" "subnet" {
name = "subnet-1"
network = google_compute_network.vpc.id # Implicit dependency
}

resource "google_compute_instance" "vm" {
name = "vm-1"
subnetwork = google_compute_subnetwork.subnet.id
depends_on = [google_project_service.compute_api] # Explicit dependency
}

Resulting State Dependencies:

// VPC (no dependencies)
{
"type": "google_compute_network",
"name": "vpc",
"instances": [{ "dependencies": [] }]
}

// Subnet (depends on VPC)
{
"type": "google_compute_subnetwork",
"name": "subnet",
"instances": [{
"dependencies": ["google_compute_network.vpc"]
}]
}

// VM (depends on subnet AND API enablement)
{
"type": "google_compute_instance",
"name": "vm",
"instances": [{
"dependencies": [
"google_compute_subnetwork.subnet",
"google_project_service.compute_api"
]
}]
}

Hierarchical Relationship:

google_compute_network.vpc
└── google_compute_subnetwork.subnet
└── google_compute_instance.vm

3. Backstage Entity Kinds for Infrastructure

3.1 Entity Kind Selection Matrix

Terraform Resource TypeBackstage KindRationale
Organizational Units
google_projectSystemHigh-level boundary grouping resources
google_folderSystemOrganizational hierarchy
Network Infrastructure
google_compute_networkResourceInfrastructure component
google_compute_subnetworkResourceNetwork subdivision
google_compute_firewallResourceSecurity resource
google_compute_routerResourceNetwork routing
Compute Resources
google_compute_instanceResourceVirtual machine
google_compute_instance_groupResourceVM grouping
google_container_clusterResourceGKE cluster (could be System if large)
Storage
google_storage_bucketResourceObject storage
google_sql_database_instanceResourceManaged database
IAM & Security
google_service_accountResourceIdentity resource
google_kms_key_ringResourceEncryption keys
APIs & Services
google_project_serviceResourceEnabled API (consider filtering)

3.2 Entity Kind Definitions

Resource (Primary Kind for Infrastructure)

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: vpc-main-vpc-prod
description: Main VPC network for production environment
annotations:
terraform.io/resource-type: "google_compute_network"
terraform.io/resource-address: "google_compute_network.main_vpc"
terraform.io/state-source: "gs://my-bucket/terraform.tfstate"
terraform.io/environment: "production"
cloud.google.com/project-id: "my-gcp-project"
labels:
environment: production
managed-by: terraform
cloud-provider: gcp
spec:
type: network # Resource subtype
owner: platform-team
system: gcp-production-system # Link to System entity
dependsOn:
- resource:default/gcp-project-my-project

Spec Fields:

  • type: Sub-categorize resources (network, compute, storage, database, iam)
  • owner: Team/group owning the resource (from Terraform tags or config)
  • system: Optional link to parent System entity
  • dependsOn: Relations to other resources (from Terraform dependencies)

System (For Organizational Units)

apiVersion: backstage.io/v1alpha1
kind: System
metadata:
name: gcp-production-system
description: Production GCP project infrastructure
annotations:
terraform.io/state-source: "gs://prod-tfstate/terraform.tfstate"
cloud.google.com/project-id: "my-gcp-project"
spec:
owner: platform-team
domain: infrastructure # Optional domain grouping

Use Cases:

  • GCP Projects → System (groups all resources in project)
  • GCP Folders → System (organizational hierarchy)
  • Large infrastructure units (e.g., entire VPC with subnets, firewall, NAT)

API (For Exposed Services)

apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: cloud-sql-instance-connection
annotations:
terraform.io/resource-type: "google_sql_database_instance"
cloud.google.com/connection-name: "my-project:region:instance"
spec:
type: database-connection # API type
lifecycle: production
owner: data-team
system: gcp-production-system
definition:
host: 10.1.0.5
port: 5432
database: backstage

Use Cases:

  • Cloud SQL instances with connection endpoints
  • Cloud Run services with public URLs
  • Load balancers with external IPs

4. Entity Relationships from Terraform Dependencies

4.1 Backstage Relation Types

Relation TypeDirectionTerraform Mapping
dependsOnForwardResource A depends on Resource B (A → B)
dependencyOfReverseInverse of dependsOn (auto-generated)
partOfContainmentResource is part of System (e.g., Subnet part of VPC)
hasPartInverseSystem has Resource (auto-generated)

4.2 Mapping Terraform Dependencies to Relations

Terraform State:

{
"type": "google_compute_subnetwork",
"name": "subnet",
"instances": [{
"dependencies": ["google_compute_network.main_vpc"]
}]
}

Backstage Entity (Subnet):

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: subnet-subnet-1
spec:
type: subnetwork
relations:
- type: dependsOn # Forward dependency
targetRef: resource:default/vpc-main-vpc
- type: partOf # Logical containment
targetRef: resource:default/vpc-main-vpc

Backstage Entity (VPC) - Auto-Generated Relations:

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: vpc-main-vpc
spec:
type: network
relations:
- type: dependencyOf # Reverse (auto-generated)
targetRef: resource:default/subnet-subnet-1
- type: hasPart # Reverse containment
targetRef: resource:default/subnet-subnet-1

4.3 Relation Creation Algorithm

interface TerraformResource {
type: string;
name: string;
instances: Array<{
dependencies: string[]; // Terraform addresses
}>;
}

interface BackstageRelation {
type: 'dependsOn' | 'partOf' | 'dependencyOf' | 'hasPart';
targetRef: string; // Format: "resource:default/<entity-name>"
}

function createRelations(resource: TerraformResource): BackstageRelation[] {
const relations: BackstageRelation[] = [];

for (const instance of resource.instances) {
for (const dep of instance.dependencies) {
// Parse Terraform address: "google_compute_network.main_vpc"
const [depType, depName] = parseTerraformAddress(dep);

// Skip data sources (not managed infrastructure)
if (dep.startsWith('data.')) continue;

// Create dependsOn relation
relations.push({
type: 'dependsOn',
targetRef: `resource:default/${generateEntityName(depType, depName)}`
});

// Create partOf relation for logical hierarchies
if (isLogicalChild(resource.type, depType)) {
relations.push({
type: 'partOf',
targetRef: `resource:default/${generateEntityName(depType, depName)}`
});
}
}
}

return relations;
}

function isLogicalChild(childType: string, parentType: string): boolean {
const hierarchies: Record<string, string[]> = {
'google_compute_network': [
'google_compute_subnetwork',
'google_compute_firewall',
'google_compute_router'
],
'google_compute_subnetwork': [
'google_compute_instance',
'google_compute_address'
],
'google_project': [
'google_compute_network',
'google_storage_bucket',
'google_sql_database_instance'
]
};

return hierarchies[parentType]?.includes(childType) || false;
}

4.4 Handling Circular Dependencies

Problem: Terraform prevents true circular dependencies, but state may have complex graphs.

Solution: Backstage relations are directional and non-blocking:

  • Create dependsOn relations in both directions if needed
  • UI graph rendering handles cycles gracefully (shows as bidirectional edges)
  • No validation errors on circular relations

5. Sensitive Data Detection & Filtering

5.1 Detection Strategies

Strategy 1: Terraform Sensitive Flag

{
"outputs": {
"db_password": {
"value": "secret123",
"sensitive": true // ✅ Explicit marker
}
},
"resources": [{
"instances": [{
"sensitive_attributes": [ // ✅ List of sensitive paths
"password",
"private_key"
]
}]
}]
}

Filtering Logic:

// Filter outputs
if (output.sensitive === true) {
return '[REDACTED]'; // Don't expose in Backstage
}

// Filter resource attributes
for (const sensitivePath of instance.sensitive_attributes) {
delete instance.attributes[sensitivePath];
}

Strategy 2: Attribute Name Pattern Matching

From existing codebase (/feature-terraform-state-plugin/security/sensitive-data-taxonomy.md):

const SENSITIVE_PATTERNS = [
// Credentials (CRITICAL)
/private_key/i,
/password/i,
/secret/i,
/api_key/i,
/access_token/i,
/auth_token/i,
/service_account_key/i,

// Network (HIGH - configurable)
/private_ip_address/i,
/internal_ip/i,
/connection_string/i,

// Crypto (CRITICAL)
/encryption_key/i,
/master_key/i,
/kms_key/i,

// Provider-specific (CRITICAL)
/AIza[0-9A-Za-z-_]{35}/, // Google API key
/AKIA[0-9A-Z]{16}/, // AWS access key
/-----BEGIN.*PRIVATE KEY-----/ // PEM private key
];

function isSensitiveAttribute(attrName: string, attrValue: any): boolean {
return SENSITIVE_PATTERNS.some(pattern =>
pattern.test(attrName) || pattern.test(String(attrValue))
);
}

Strategy 3: Resource Type Allowlist

const SENSITIVE_RESOURCE_TYPES = [
'google_service_account_key', // Always contains private keys
'google_secret_manager_secret', // Secret storage
'tls_private_key', // Crypto resources
'random_password', // Generated secrets
];

function shouldSkipResource(type: string): boolean {
return SENSITIVE_RESOURCE_TYPES.includes(type);
}

Multi-Layer Defense:

interface SanitizationResult {
sanitized: Record<string, any>;
redactedFields: string[];
warningLevel: 'none' | 'info' | 'warning' | 'critical';
}

function sanitizeResourceAttributes(
resource: TerraformResource,
instance: ResourceInstance
): SanitizationResult {
const sanitized = { ...instance.attributes };
const redacted: string[] = [];
let warningLevel: 'none' | 'info' | 'warning' | 'critical' = 'none';

// Layer 1: Terraform-marked sensitive attributes
for (const sensitivePath of instance.sensitive_attributes || []) {
delete sanitized[sensitivePath];
redacted.push(sensitivePath);
warningLevel = 'critical';
}

// Layer 2: Pattern-based detection
for (const [key, value] of Object.entries(sanitized)) {
if (isSensitiveAttribute(key, value)) {
sanitized[key] = '[REDACTED]';
redacted.push(key);
warningLevel = warningLevel === 'critical' ? 'critical' : 'warning';
}
}

// Layer 3: Private IP masking (configurable)
for (const [key, value] of Object.entries(sanitized)) {
if (typeof value === 'string' && isPrivateIP(value)) {
sanitized[key] = maskPrivateIP(value); // "10.1.2.3" → "10.x.x.x"
redacted.push(key);
warningLevel = warningLevel === 'none' ? 'info' : warningLevel;
}
}

return { sanitized, redactedFields: redacted, warningLevel };
}

Audit Trail:

// Log all sanitization actions for security audit
logger.info('Resource sanitization', {
resourceType: resource.type,
resourceName: resource.name,
redactedFields: result.redactedFields,
warningLevel: result.warningLevel,
timestamp: new Date().toISOString()
});

6. Resource Addressing & Unique Identification

6.1 Terraform Resource Addressing

Standard Format:

<mode>.<type>.<name>[<index>]

Examples:

google_compute_network.main_vpc
google_compute_subnetwork.subnet[0]
module.common.google_project_service.compute_api
module.networking[0].google_compute_firewall.allow_ssh
data.google_project.current

Components:

  • mode: resource (managed) or data (read-only) - omitted in state dependencies
  • type: Provider-specific resource type (e.g., google_compute_network)
  • name: User-defined name from Terraform config
  • [index]: Optional index for count or key for for_each resources
  • module.*: Module path prefix (can be nested)

6.2 Parsing Terraform Addresses

interface TerraformAddress {
modulePath: string[]; // ["common", "networking"] or []
mode: 'managed' | 'data'; // Inferred from prefix
type: string; // "google_compute_network"
name: string; // "main_vpc"
index?: number | string; // 0 or "key" (optional)
}

function parseTerraformAddress(address: string): TerraformAddress {
const parts = address.split('.');
const modulePath: string[] = [];

// Extract module path
while (parts[0] === 'module') {
parts.shift(); // Remove 'module'
const moduleName = parts.shift()!;

// Handle module index: module.networking[0]
const [name, index] = parseNameAndIndex(moduleName);
modulePath.push(name);
}

// Check if data source
const mode = parts[0] === 'data' ? 'data' : 'managed';
if (mode === 'data') parts.shift();

// Extract type and name
const type = parts[0];
const nameWithIndex = parts[1];
const [name, index] = parseNameAndIndex(nameWithIndex);

return { modulePath, mode, type, name, index };
}

function parseNameAndIndex(str: string): [string, number | string | undefined] {
const match = str.match(/^([^\[]+)(?:\[(.+)\])?$/);
if (!match) throw new Error(`Invalid name format: ${str}`);

const [, name, index] = match;
return [name, index ? (isNaN(+index) ? index : +index) : undefined];
}

Example Usage:

const address = "module.networking.google_compute_firewall.allow_ssh[0]";
const parsed = parseTerraformAddress(address);

console.log(parsed);
// {
// modulePath: ["networking"],
// mode: "managed",
// type: "google_compute_firewall",
// name: "allow_ssh",
// index: 0
// }

6.3 Generating Backstage Entity Names

Naming Strategy:

interface EntityNamingConfig {
includeEnvironment: boolean; // Append -prod, -nonprod
includeModule: boolean; // Append module path
includeIndex: boolean; // Append -0, -1 for indexed resources
separator: string; // Default: "-"
}

function generateEntityName(
resource: TerraformResource,
instance: ResourceInstance,
environment: string,
config: EntityNamingConfig
): string {
const parts: string[] = [];

// 1. Resource type prefix (simplified)
const typePrefix = simplifyResourceType(resource.type);
parts.push(typePrefix);

// 2. Module path (optional)
if (config.includeModule && resource.module) {
const modulePath = resource.module.replace(/^module\./, '');
parts.push(modulePath.replace(/\./g, '-'));
}

// 3. Resource name
parts.push(resource.name);

// 4. Instance index (optional)
if (config.includeIndex && instance.index_key !== undefined) {
parts.push(String(instance.index_key));
}

// 5. Environment suffix (optional)
if (config.includeEnvironment) {
parts.push(environment); // "prod", "nonprod"
}

return parts.join(config.separator).toLowerCase();
}

function simplifyResourceType(type: string): string {
// Remove provider prefix: "google_compute_network" → "network"
const withoutProvider = type.replace(/^(google|aws|azurerm)_/, '');

// Simplify common types
const simplifications: Record<string, string> = {
'compute_network': 'vpc',
'compute_subnetwork': 'subnet',
'compute_instance': 'vm',
'compute_firewall': 'firewall',
'storage_bucket': 'bucket',
'sql_database_instance': 'sql-instance',
'container_cluster': 'gke-cluster',
};

return simplifications[withoutProvider] || withoutProvider;
}

Example Entity Names:

// Simple resource (no module, no index)
// Type: google_compute_network, Name: main_vpc, Env: prod
"vpc-main-vpc-prod"

// Module resource (with module path)
// Module: module.networking, Type: google_compute_firewall, Name: allow_ssh
"firewall-networking-allow-ssh-prod"

// Indexed resource (with for_each)
// Type: google_compute_subnetwork, Name: subnet, Index: "us-central1"
"subnet-subnet-us-central1-prod"

6.4 Handling Name Collisions

Problem: Different resources might generate same entity name.

Solution 1: Hash-Based Disambiguation

function ensureUniqueName(baseName: string, existingNames: Set<string>): string {
if (!existingNames.has(baseName)) {
return baseName;
}

// Append short hash of full Terraform address
const hash = createHash('sha256')
.update(fullTerraformAddress)
.digest('hex')
.substring(0, 8);

return `${baseName}-${hash}`;
}

Solution 2: Incremental Suffix

function ensureUniqueName(baseName: string, existingNames: Set<string>): string {
if (!existingNames.has(baseName)) {
return baseName;
}

let counter = 1;
while (existingNames.has(`${baseName}-${counter}`)) {
counter++;
}

return `${baseName}-${counter}`;
}

Recommendation: Use hash-based approach for deterministic, reproducible names across state updates.


7. Practical Mapping Examples

7.1 Example: GCP VPC Network

Terraform State:

{
"mode": "managed",
"type": "google_compute_network",
"name": "gcp-network",
"provider": "provider[\"registry.terraform.io/hashicorp/google\"]",
"module": "module.gcp-network",
"instances": [{
"schema_version": 1,
"attributes": {
"id": "projects/my-project/global/networks/private-network",
"name": "private-network",
"auto_create_subnetworks": false,
"routing_mode": "REGIONAL",
"self_link": "https://www.googleapis.com/compute/v1/projects/my-project/global/networks/private-network",
"project": "my-project"
},
"sensitive_attributes": [],
"dependencies": []
}]
}

Backstage Entity:

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: vpc-private-network-prod
description: Private VPC network (private-network) managed by Terraform
annotations:
terraform.io/resource-type: google_compute_network
terraform.io/resource-address: module.gcp-network.google_compute_network.gcp-network
terraform.io/state-source: gs://my-bucket/prod/terraform.tfstate
terraform.io/module-path: gcp-network
cloud.google.com/project-id: my-project
cloud.google.com/resource-id: projects/my-project/global/networks/private-network
labels:
environment: production
managed-by: terraform
cloud-provider: gcp
resource-category: networking
tags:
- vpc
- network
- gcp
spec:
type: network
lifecycle: production
owner: platform-team
system: gcp-production-system

# Custom properties from Terraform attributes
definition:
name: private-network
routing_mode: REGIONAL
auto_create_subnetworks: false
self_link: https://www.googleapis.com/compute/v1/projects/my-project/global/networks/private-network

7.2 Example: GCP Subnet with Dependency

Terraform State:

{
"mode": "managed",
"type": "google_compute_subnetwork",
"name": "subnet",
"module": "module.gcp-network",
"instances": [{
"attributes": {
"id": "projects/my-project/regions/us-central1/subnetworks/shared-subnet",
"name": "shared-subnet",
"network": "projects/my-project/global/networks/private-network",
"ip_cidr_range": "10.1.0.0/28",
"region": "us-central1",
"private_ip_google_access": true
},
"dependencies": [
"module.gcp-network.google_compute_network.gcp-network"
]
}]
}

Backstage Entity:

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: subnet-shared-subnet-prod
description: Shared subnet (10.1.0.0/28) in us-central1
annotations:
terraform.io/resource-type: google_compute_subnetwork
terraform.io/resource-address: module.gcp-network.google_compute_subnetwork.subnet
cloud.google.com/region: us-central1
cloud.google.com/cidr: 10.x.x.x/28 # Masked private IP
spec:
type: subnetwork
lifecycle: production
owner: platform-team
system: gcp-production-system

dependsOn:
- resource:default/vpc-private-network-prod # From dependencies[]

partOf:
- resource:default/vpc-private-network-prod # Logical hierarchy

definition:
name: shared-subnet
region: us-central1
ip_cidr_range: "[REDACTED]" # Or "10.x.x.x/28"
private_ip_google_access: true

7.3 Example: Cloud SQL Instance (with Sensitive Data)

Terraform State:

{
"mode": "managed",
"type": "google_sql_database_instance",
"name": "cloud_sql_instance_backstage",
"instances": [{
"attributes": {
"id": "cloud-sql-instance-backstage",
"name": "cloud-sql-instance-backstage",
"database_version": "POSTGRES_15",
"region": "northamerica-northeast1",
"connection_name": "my-project:northamerica-northeast1:cloud-sql-instance-backstage",
"ip_address": [
{
"ip_address": "10.1.0.5",
"type": "PRIVATE"
}
],
"root_password": "super-secret-password" // ⚠️ Sensitive!
},
"sensitive_attributes": ["root_password"],
"dependencies": [
"google_service_networking_connection.private_vpc_connection"
]
}]
}

Backstage Entity (Sanitized):

apiVersion: backstage.io/v1alpha1
kind: Resource
metadata:
name: sql-instance-backstage-prod
description: PostgreSQL 15 database instance for Backstage
annotations:
terraform.io/resource-type: google_sql_database_instance
cloud.google.com/connection-name: my-project:northamerica-northeast1:cloud-sql-instance-backstage
backstage.io/sanitized-fields: root_password,ip_address # Audit trail
spec:
type: database
lifecycle: production
owner: data-team

dependsOn:
- resource:default/vpc-connection-private

definition:
name: cloud-sql-instance-backstage
database_version: POSTGRES_15
region: northamerica-northeast1
connection_name: my-project:northamerica-northeast1:cloud-sql-instance-backstage
ip_address: "[PRIVATE_IP]" # ✅ Redacted
# root_password field removed entirely # ✅ Filtered

8. Recommendations & Best Practices

8.1 State Parsing Recommendations

  1. Always Validate State Version

    if (state.version !== 4) {
    throw new Error(`Unsupported Terraform state version: ${state.version}`);
    }
  2. Filter Out Data Sources

    const managedResources = state.resources.filter(r => r.mode === 'managed');
  3. Handle Missing Fields Gracefully

    const dependencies = instance.dependencies || [];
    const attributes = instance.attributes || {};
  4. Use Serial for Change Detection

    // Compare serial numbers to detect state changes
    if (newState.serial > lastProcessedSerial) {
    // Process incremental update
    }

8.2 Entity Creation Recommendations

  1. Standardize Entity Naming

    • Use consistent separator (hyphen -)
    • Always lowercase
    • Include environment suffix for multi-env setups
    • Hash-based disambiguation for collisions
  2. Rich Annotations

    annotations:
    terraform.io/resource-type: google_compute_network
    terraform.io/resource-address: module.gcp-network.google_compute_network.gcp-network
    terraform.io/state-source: gs://my-bucket/prod/terraform.tfstate
    terraform.io/state-serial: "42"
    terraform.io/module-path: gcp-network
    terraform.io/last-updated: "2025-11-15T10:30:00Z"
    cloud.google.com/project-id: my-project
    cloud.google.com/resource-id: projects/my-project/global/networks/private-network
  3. Preserve Original Terraform Context

    • Store full Terraform address in annotations
    • Keep module path information
    • Link to state source (GCS bucket or TFC workspace)

8.3 Relationship Mapping Recommendations

  1. Create Both Logical and Dependency Relations

    • dependsOn: Technical dependency (Terraform requires)
    • partOf: Logical containment (Network contains Subnet)
  2. Auto-Generate Reverse Relations

    • Backstage supports bidirectional relations
    • Create dependencyOf and hasPart automatically
  3. Handle Multi-Level Hierarchies

    System: GCP Project
    ├── Resource: VPC Network
    │ ├── Resource: Subnet 1
    │ │ └── Resource: VM Instance 1
    │ └── Resource: Subnet 2
    └── Resource: Cloud SQL Instance

8.4 Sensitive Data Filtering Recommendations

  1. Multi-Layer Defense

    • Layer 1: Terraform sensitive_attributes flag
    • Layer 2: Attribute name pattern matching
    • Layer 3: Value pattern matching (API keys, IPs)
  2. Configurable Sensitivity Levels

    enum SensitivityLevel {
    CRITICAL = 'critical', // Always redact (passwords, keys)
    HIGH = 'high', // Redact by default (private IPs)
    MEDIUM = 'medium', // Configurable (connection strings)
    LOW = 'low' // Info only (public IPs)
    }
  3. Audit All Sanitization

    • Log every redacted field
    • Track warning levels
    • Generate security reports

8.5 GCP Resource Type Mapping

GCP ResourceBackstage KindTypeExample Name
google_projectSystemprojectproject-my-gcp-project
google_compute_networkResourcenetworkvpc-main-network-prod
google_compute_subnetworkResourcesubnetworksubnet-shared-prod
google_compute_instanceResourcecomputevm-web-server-prod
google_compute_firewallResourcefirewallfw-allow-ssh-prod
google_storage_bucketResourcestoragebucket-tfstate-prod
google_sql_database_instanceResourcedatabasesql-backstage-prod
google_container_clusterResource/Systemcontainer-clustergke-production
google_service_accountResourceiamsa-backstage-app
google_kms_key_ringResourcekmskms-vault-keyring

9. Implementation Roadmap

Phase 1: Core Parsing (MVP)

  • ✅ Parse Terraform state v4 JSON
  • ✅ Extract resources, dependencies, outputs
  • ✅ Filter managed resources (exclude data sources)
  • ✅ Generate unique entity names

Phase 2: Entity Generation

  • ✅ Create Backstage Resource entities
  • ✅ Map Terraform attributes to entity spec
  • ✅ Add annotations (resource type, address, state source)
  • ✅ Apply environment labels

Phase 3: Relationship Mapping

  • ✅ Parse Terraform addresses from dependencies
  • ✅ Create dependsOn relations
  • ✅ Implement logical hierarchy (partOf for networks/subnets)
  • ✅ Auto-generate reverse relations

Phase 4: Sensitive Data Filtering

  • ✅ Implement multi-layer sanitization
  • ✅ Pattern-based detection (passwords, keys, IPs)
  • ✅ Terraform sensitive_attributes filtering
  • ✅ Audit logging for all redactions

Phase 5: Advanced Features

  • ⬜ GCS bucket state ingestion
  • ⬜ Terraform Cloud API integration
  • ⬜ Incremental updates (serial-based)
  • ⬜ Multi-environment support (prod/nonprod)


Appendix A: Terraform State v4 Complete Example

File: /Users/liam.helmer/repos/badal-io/repo-devex-backstage/terraform/common/main.tf (Current Infrastructure)

State Structure (Inferred from Terraform config):

{
"version": 4,
"terraform_version": "1.5.0",
"serial": 123,
"lineage": "abc123-def456",
"outputs": {
"vault_internal_url": {
"value": "https://10.1.0.10:8200",
"type": "string",
"sensitive": false
},
"backstage_db_password_secret_id": {
"value": "backstage-db-password",
"type": "string",
"sensitive": true
}
},
"resources": [
{
"mode": "managed",
"type": "google_compute_network",
"name": "gcp-network",
"module": "module.gcp-network",
"instances": [{
"attributes": {
"id": "projects/my-project/global/networks/private-network",
"name": "private-network"
},
"dependencies": []
}]
},
{
"mode": "managed",
"type": "google_compute_subnetwork",
"name": "subnet",
"module": "module.gcp-network",
"instances": [{
"attributes": {
"name": "shared-subnet",
"network": "projects/my-project/global/networks/private-network",
"ip_cidr_range": "10.1.0.0/28"
},
"dependencies": [
"module.gcp-network.google_compute_network.gcp-network"
]
}]
},
{
"mode": "managed",
"type": "google_sql_database_instance",
"name": "cloud_sql_instance_backstage",
"instances": [{
"attributes": {
"name": "cloud-sql-instance-backstage",
"connection_name": "my-project:region:instance"
},
"sensitive_attributes": ["root_password"],
"dependencies": [
"google_service_networking_connection.private_vpc_connection"
]
}]
}
]
}

End of Research Document