Skip to main content

Sensitive Data Taxonomy for Terraform State Sanitization

Executive Summary

This document defines a comprehensive taxonomy of sensitive data found in Terraform state files that must be detected, filtered, and sanitized before loading into Backstage catalog. The taxonomy is organized by sensitivity level, resource type, and compliance requirements.

Sensitivity Classifications

CRITICAL (Must Always Filter)

Data that, if exposed, would immediately compromise security or violate compliance regulations.

HIGH (Filter Unless Explicitly Allowed)

Data that poses significant security risk but may be needed for specific operational use cases with proper controls.

MEDIUM (Configurable Based on Client Policy)

Data that should be evaluated based on client-specific security posture and risk tolerance.

LOW (Informational, May Be Allowed)

Data that is typically safe to expose but should be evaluated in context.


1. Authentication & Authorization Credentials

1.1 Private Keys (CRITICAL)

Terraform Attributes:

- private_key
- private_key_pem
- private_key_openssh
- tls_private_key
- service_account_key
- service_account_private_key
- certificate_private_key
- ssh_private_key
- rsa_private_key
- ecdsa_private_key
- ed25519_private_key

Resource Types Affected:

  • google_service_account_key
  • google_compute_instance (ssh keys in metadata)
  • tls_private_key
  • aws_iam_access_key
  • azurerm_key_vault_secret

Detection Patterns:

/-----BEGIN (RSA|EC|OPENSSH|ENCRYPTED) PRIVATE KEY-----/
/private_key["\s]*[:=]["\s]*[A-Za-z0-9+/=]{64,}/
/\bAIza[0-9A-Za-z-_]{35}\b/ # Google API keys

Sanitization Action:

  • REDACT: Replace entire value with [REDACTED:PRIVATE_KEY]
  • AUDIT: Log resource type, attribute path, key length, algorithm type
  • ALERT: Trigger security notification for manual review

1.2 Passwords & Secrets (CRITICAL)

Terraform Attributes:

- password
- admin_password
- root_password
- initial_password
- master_password
- db_password
- database_password
- user_password
- secret
- api_secret
- client_secret
- webhook_secret
- encryption_key
- master_key

Resource Types Affected:

  • google_sql_database_instance
  • google_secret_manager_secret
  • google_compute_instance (metadata)
  • google_cloudfunctions_function (environment variables)
  • google_cloud_run_service (environment variables)

Detection Patterns:

/(password|secret|key)["\s]*[:=]["\s]*["'][^"']{8,}["']/i
/\b[A-Za-z0-9_-]{32,}\b/ # Generic long random strings

Sanitization Action:

  • REDACT: Replace with [REDACTED:SECRET]
  • AUDIT: Log resource type, secret length, creation timestamp
  • HASH: Store SHA256 hash for change detection without exposing value

1.3 Access Tokens & API Keys (CRITICAL)

Terraform Attributes:

- access_token
- refresh_token
- auth_token
- bearer_token
- api_key
- api_token
- oauth_token
- github_token
- gitlab_token
- slack_token

Provider-Specific Patterns:

Google Cloud:
- AIza[0-9A-Za-z-_]{35} # Google API Key
- ya29\.[0-9A-Za-z_-]{68,} # OAuth 2.0 Access Token

AWS:
- AKIA[0-9A-Z]{16} # AWS Access Key ID
- [0-9a-zA-Z/+=]{40} # AWS Secret Access Key

GitHub:
- ghp_[0-9a-zA-Z]{36} # GitHub Personal Access Token
- gho_[0-9a-zA-Z]{36} # GitHub OAuth Token

Slack:
- xoxb-[0-9]{11}-[0-9]{11}-[0-9a-zA-Z]{24} # Slack Bot Token

Sanitization Action:

  • REDACT: Replace with [REDACTED:TOKEN]
  • FINGERPRINT: Store last 4 characters for identification
  • ROTATE: Trigger automated token rotation workflow

2. Network & Infrastructure Secrets

2.1 Private IP Addresses (HIGH)

Terraform Attributes:

- private_ip_address
- internal_ip
- private_cluster_config.private_endpoint
- ip_address (when in private range)
- network_interface.internal_ip

Detection Patterns:

/10\.\d{1,3}\.\d{1,3}\.\d{1,3}/          # 10.0.0.0/8
/172\.(1[6-9]|2[0-9]|3[0-1])\.\d{1,3}\.\d{1,3}/ # 172.16.0.0/12
/192\.168\.\d{1,3}\.\d{1,3}/ # 192.168.0.0/16

Resource Types Affected:

  • google_compute_instance
  • google_container_cluster (private endpoint)
  • google_compute_address (PURPOSE=INTERNAL)
  • google_sql_database_instance

Sanitization Action:

  • MASK: Replace with 10.x.x.x or [PRIVATE_IP]
  • PRESERVE_SUBNET: Optionally keep subnet info: 10.1.x.x
  • CONFIGURABLE: Allow per-client policy (some clients may allow internal IPs)

2.2 Connection Strings & URIs (CRITICAL)

Terraform Attributes:

- connection_string
- connection_url
- jdbc_url
- database_url
- redis_url
- mongodb_uri

Detection Patterns:

# PostgreSQL
/postgres(ql)?:\/\/[^:]+:[^@]+@[^\/]+\/\w+/

# MySQL
/mysql:\/\/[^:]+:[^@]+@[^\/]+\/\w+/

# MongoDB
/mongodb(\+srv)?:\/\/[^:]+:[^@]+@[^\/]+\/\w+/

# Redis
/redis:\/\/:[^@]+@[^\/]+/

Sanitization Action:

  • PARSE: Extract protocol, host, database name
  • PRESERVE: Keep protocol://[USER]:[REDACTED]@host:port/database
  • AUDIT: Log full connection details separately for troubleshooting

2.3 SSH Configuration (HIGH)

Terraform Attributes:

- ssh_keys (in metadata)
- ssh_authorized_keys
- ssh_config
- known_hosts

Resource Types Affected:

  • google_compute_instance
  • google_compute_instance_template
  • google_os_login_ssh_public_key

Sanitization Action:

  • REDACT_PRIVATE: Remove private keys entirely
  • HASH_PUBLIC: Replace public key content with fingerprint
  • PRESERVE_TYPE: Keep key type (ssh-rsa, ed25519, etc.)

3. GCP-Specific Sensitive Data

3.1 Service Account Keys (CRITICAL)

Resource Type:

google_service_account_key

Attributes:

- private_key
- private_key_type
- public_key
- valid_after
- valid_before

Sanitization Rules:

private_key:
action: REDACT
replacement: "[REDACTED:SA_KEY]"

private_key_type:
action: PRESERVE
reason: "Needed for key rotation tracking"

public_key:
action: HASH
method: SHA256
reason: "Public keys can identify service account without exposing credentials"

3.2 Cloud SQL Admin Passwords (CRITICAL)

Resource Type:

google_sql_database_instance
google_sql_user

Attributes:

- root_password
- password
- settings.backup_configuration.binary_log_enabled
- settings.ip_configuration.private_network

Sanitization Rules:

root_password:
action: REDACT
replacement: "[REDACTED:SQL_PASSWORD]"

password:
action: REDACT
replacement: "[REDACTED:SQL_PASSWORD]"

settings.ip_configuration.private_network:
action: PRESERVE
reason: "Network reference needed for topology visualization"

settings.ip_configuration.authorized_networks[].value:
action: MASK_LAST_OCTET
reason: "Show network ranges without exposing exact IPs"

3.3 KMS Crypto Keys (HIGH)

Resource Type:

google_kms_crypto_key
google_kms_secret_ciphertext

Attributes:

- ciphertext
- plaintext
- additional_authenticated_data

Sanitization Rules:

ciphertext:
action: REDACT
replacement: "[REDACTED:CIPHERTEXT]"
preserve_length: true # For size analysis

plaintext:
action: REDACT
replacement: "[REDACTED:PLAINTEXT]"
alert: CRITICAL

additional_authenticated_data:
action: EVALUATE
rules:
- if contains sensitive patterns: REDACT
- else: PRESERVE

3.4 Secret Manager Secrets (CRITICAL)

Resource Type:

google_secret_manager_secret
google_secret_manager_secret_version

Attributes:

- secret_data
- payload.data

Sanitization Rules:

secret_data:
action: REDACT_COMPLETELY
replacement: "[REDACTED:SECRET_MANAGER]"
preserve_metadata:
- name
- version
- create_time
- labels

payload.data:
action: REDACT_COMPLETELY

3.5 IAM Policy Bindings (MEDIUM)

Resource Type:

google_project_iam_*
google_organization_iam_*

Attributes:

- members[]
- role
- condition

Sanitization Rules:

members:
action: EVALUATE_PER_MEMBER
rules:
- "user:*@external.com": REDACT_DOMAIN
- "serviceAccount:*@*.iam.gserviceaccount.com": PRESERVE
- "group:*": PRESERVE_IF_INTERNAL

role:
action: PRESERVE
reason: "Needed for RBAC visualization"

condition.expression:
action: ANALYZE_FOR_SECRETS
pattern: /password|secret|token/i
if_match: REDACT_CLAUSE

4. Environment Variables (HIGH)

4.1 Cloud Run / Cloud Functions

Resource Types:

google_cloud_run_service
google_cloudfunctions_function

Attributes:

- spec.template.spec.containers[].env[]
- environment_variables

Sanitization Rules:

environment_variables:
action: EVALUATE_PER_VAR
rules:
# Secrets
- name matches /password|secret|key|token/i: REDACT_VALUE
# Configuration (safe)
- name matches /region|project|bucket|dataset/i: PRESERVE
# Connection strings
- value matches connection_string_pattern: REDACT_CREDENTIALS_ONLY

Common Secret Variable Names:

CRITICAL:
- DATABASE_PASSWORD
- API_KEY
- SECRET_KEY
- JWT_SECRET
- OAUTH_CLIENT_SECRET
- WEBHOOK_SECRET

HIGH:
- DATABASE_URL (contains credentials)
- REDIS_URL (contains password)
- SMTP_PASSWORD

SAFE:
- DATABASE_HOST
- DATABASE_NAME
- API_ENDPOINT
- LOG_LEVEL

5. Metadata & Custom Fields

5.1 Compute Instance Metadata (HIGH)

Resource Type:

google_compute_instance
google_compute_instance_template

Attributes:

- metadata
- metadata_startup_script

Sanitization Rules:

metadata:
action: DEEP_SCAN
rules:
- ssh-keys: REDACT_KEYS
- startup-script: SCAN_FOR_SECRETS
- user-data: SCAN_FOR_SECRETS
- custom fields: PATTERN_MATCH

metadata_startup_script:
action: SCAN_FOR_PATTERNS
patterns:
- /password\s*=\s*["'][^"']+["']/: REDACT
- /api_key\s*=\s*["'][^"']+["']/: REDACT
- /export\s+\w*SECRET\w*=/: REDACT_LINE

5.2 Labels & Tags (LOW)

All Resources:

- labels
- tags

Sanitization Rules:

labels:
action: EVALUATE_PER_LABEL
rules:
# Safe organizational labels
- environment: PRESERVE
- team: PRESERVE
- cost-center: PRESERVE

# Potentially sensitive
- owner: ANONYMIZE_IF_EXTERNAL
- contact: ANONYMIZE_EMAIL
- backup-key-id: REDACT

6. Networking Details

6.1 VPC & Subnets (MEDIUM)

Resource Types:

google_compute_network
google_compute_subnetwork

Attributes:

- ip_cidr_range
- secondary_ip_range[].ip_cidr_range

Sanitization Rules:

ip_cidr_range:
action: MASK_IF_PRIVATE
rules:
- if private range: "10.x.0.0/16"
- if public range: PRESERVE (infrastructure info)

secondary_ip_range:
action: PRESERVE_COUNT_ONLY
reason: "Show number of secondary ranges without exposing IPs"

6.2 Firewall Rules (MEDIUM)

Resource Type:

google_compute_firewall

Attributes:

- source_ranges[]
- source_tags[]
- allowed[].ports[]

Sanitization Rules:

source_ranges:
action: EVALUATE_PER_RANGE
rules:
- "0.0.0.0/0": FLAG_AS_PUBLIC_EXPOSURE
- private ranges: MASK_SPECIFIC_IPS

allowed:
action: PRESERVE_STRUCTURE
reason: "Firewall topology needed for security posture"
sensitive_ports: [22, 3389, 5432, 3306]
if_source_public_and_sensitive_port: ALERT

7. Database & Storage Credentials

7.1 BigQuery Dataset Access (HIGH)

Resource Type:

google_bigquery_dataset
google_bigquery_dataset_access

Attributes:

- access[].user_by_email
- access[].group_by_email
- default_encryption_configuration.kms_key_name

Sanitization Rules:

access:
action: ANONYMIZE_EXTERNAL_USERS
rules:
- internal domain: PRESERVE
- external domain: HASH_EMAIL
- service accounts: PRESERVE

default_encryption_configuration:
action: PRESERVE
reason: "Encryption key reference needed for compliance audit"

7.2 Storage Bucket Policies (MEDIUM)

Resource Type:

google_storage_bucket
google_storage_bucket_iam_*

Attributes:

- iam_configuration
- lifecycle_rule

Sanitization Rules:

iam_configuration.public_access_prevention:
action: PRESERVE
reason: "Critical security control visibility"

lifecycle_rule:
action: PRESERVE
reason: "Non-sensitive retention policy"

8. Terraform-Specific Sensitive Data

8.1 Provider Configurations (CRITICAL)

Terraform Blocks:

provider "google" {
credentials = file("account.json")
access_token = "..."
}

Sanitization Rules:

provider_config.credentials:
action: REDACT_COMPLETELY

provider_config.access_token:
action: REDACT_COMPLETELY

provider_config.project:
action: PRESERVE
reason: "Project ID needed for resource relationships"

8.2 Remote State Backends (HIGH)

Backend Configuration:

terraform {
backend "gcs" {
bucket = "tf-state-bucket"
prefix = "terraform/state"
credentials = "..."
}
}

Sanitization Rules:

backend.credentials:
action: REDACT

backend.bucket:
action: PRESERVE
reason: "State location needed for change tracking"

backend.encryption_key:
action: REDACT

9. Multi-Cloud Sensitive Data

9.1 AWS Credentials

Resource Types:

aws_iam_access_key
aws_secretsmanager_secret

Patterns:

/AKIA[0-9A-Z]{16}/        # AWS Access Key ID
/[A-Za-z0-9/+=]{40}/ # AWS Secret Access Key

9.2 Azure Credentials

Resource Types:

azurerm_key_vault_secret
azurerm_storage_account

Patterns:

/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/  # Azure Client ID

10. Detection Strategy Summary

Pattern Matching Priority

  1. Exact Attribute Match (Highest Priority)

    • Check known sensitive attribute names
    • Example: private_key, password, secret
  2. Regex Pattern Match

    • Scan all string values for credential patterns
    • Example: Private key headers, JWT tokens
  3. Semantic Analysis

    • Context-aware detection (e.g., "key" in "key_name" vs. "private_key")
    • Relationship analysis (is this a reference or actual secret?)
  4. Entropy Analysis

    • High-entropy strings (random characters) likely secrets
    • Base64-encoded data examination

11. Sanitization Action Types

Primary Actions

REDACT:
description: "Replace entire value with placeholder"
example: "[REDACTED:PRIVATE_KEY]"
use_case: "Critical secrets that must never be exposed"

MASK:
description: "Partially hide value while preserving structure"
example: "10.1.x.x" or "***@example.com"
use_case: "IPs, emails where structure is useful"

HASH:
description: "Replace with cryptographic hash"
example: "sha256:abc123def..."
use_case: "Change detection without exposure"

PRESERVE:
description: "Keep original value"
example: "us-central1"
use_case: "Non-sensitive operational data"

ANONYMIZE:
description: "Replace with generic placeholder"
example: "user-1", "team-a"
use_case: "Personal information requiring privacy"

12. Per-Client Configuration

Client Risk Profiles

High Security (FinTech, Healthcare):
- Redact all private IPs
- Redact all email addresses
- Mask all connection strings
- No environment variables exposed

Medium Security (General Enterprise):
- Preserve internal IPs (10.x.x.x shown as subnet)
- Preserve internal email domains
- Redact database passwords only
- Show sanitized environment variable names

Low Security (Development, Non-Prod):
- Preserve most infrastructure details
- Redact only critical secrets
- Show configuration structure

13. Audit Trail Requirements

What to Log for Each Sanitization

log_entry:
timestamp: "2025-01-13T10:30:00Z"
workspace_id: "ws-abc123"
resource_type: "google_sql_database_instance"
resource_name: "prod-db"
attribute_path: "settings.ip_configuration.authorized_networks[0].value"
original_value_hash: "sha256:..."
sanitization_action: "MASK"
new_value: "10.1.x.x"
rule_matched: "private_ip_masking_v2"
sensitivity_level: "HIGH"
client_policy: "medium_security"

14. Compliance Mappings

SOC2 Requirements

  • CC6.1: Redact all credentials and secrets
  • CC6.6: Log all access to sensitive data
  • CC6.7: Encrypt audit logs

GDPR Requirements

  • Article 25: Anonymize personal data (emails, names)
  • Article 32: Pseudonymization of user identifiers

HIPAA Requirements

  • §164.514: De-identification of PHI (if in infrastructure metadata)

15. Testing & Validation

Test Cases for Sensitivity Detection

Test Suite:
- Private key in google_service_account_key: MUST DETECT
- Password in SQL user resource: MUST DETECT
- Private IP in compute instance: MUST DETECT
- Environment variable "DATABASE_URL" with credentials: MUST DETECT
- Public IP in compute address: MUST NOT TRIGGER
- Label "environment: production": MUST NOT TRIGGER
- KMS key name reference: MUST NOT TRIGGER (reference, not key itself)

False Positive Prevention

Safe Patterns:
- KMS key names: "projects/.../keyRings/.../cryptoKeys/..."
- Secret Manager references: "projects/.../secrets/.../versions/latest"
- Network references: "projects/.../global/networks/..."
- Public IPs: "34.x.x.x" (safe to expose)

16. Integration with External Tools

  1. Terraform-compliance: Policy-as-code testing
  2. tfsec: Static analysis for security issues
  3. Checkov: Policy-based scanning
  4. TruffleHog: Secret detection
  5. GitLeaks: Credential scanning

Custom Detection Engine

class SensitiveDataDetector:
def __init__(self, config: ClientConfig):
self.rules = load_rules(config)
self.entropy_threshold = 4.5 # Shannon entropy

def detect(self, attribute_path: str, value: any) -> SensitivityLevel:
# 1. Exact attribute name match
# 2. Regex pattern match
# 3. Entropy analysis
# 4. Semantic context analysis
pass

Version History

  • v1.0 (2025-01-13): Initial comprehensive taxonomy
  • v1.1 (TBD): Add Kubernetes secrets detection
  • v1.2 (TBD): Add multi-region compliance rules

References