Why CloudFormation Rules Matter
CloudFormation is AWS's native infrastructure-as-code service — it manages the lifecycle of your AWS resources through declarative JSON or YAML templates. AI assistants generate templates that create resources correctly but skip every production safety pattern: no DeletionPolicy on stateful resources, IAM policies with wildcard permissions, hardcoded values instead of parameters, and no stack update protection.
The stakes are high: a CloudFormation stack update that accidentally replaces an RDS instance deletes your production database. An IAM role with Action: '*' gives any compromised service full AWS account access. And unlike application bugs that can be hotfixed, infrastructure destruction often means data loss and hours of recovery.
These rules complement the Terraform rules — many teams use both (CloudFormation for AWS-native resources, Terraform for multi-cloud). The principles overlap but the syntax and patterns are CloudFormation-specific.
Rule 1: DeletionPolicy and UpdateReplacePolicy
The rule: 'Every stateful resource (RDS, DynamoDB, S3, EFS, ElastiCache) must have DeletionPolicy: Retain or DeletionPolicy: Snapshot. Every stateful resource must also have UpdateReplacePolicy: Retain to prevent replacement on stack updates. Never rely on the default DeletionPolicy (Delete) for any resource that holds data.'
For snapshots: 'RDS instances use DeletionPolicy: Snapshot — CloudFormation takes a final snapshot before deletion. S3 buckets use DeletionPolicy: Retain — buckets must be emptied manually before deletion. DynamoDB tables use DeletionPolicy: Retain with point-in-time recovery enabled.'
This is the most critical CloudFormation rule. Without DeletionPolicy, deleting or updating a stack deletes your database with no recovery option. The AI never adds it because it's an optional property — but it's the difference between a safe operation and permanent data loss.
Without DeletionPolicy, deleting a stack deletes your database with zero recovery. DeletionPolicy: Retain on every stateful resource is the single most important CloudFormation rule.
Rule 2: IAM Least Privilege
The rule: 'Never use Action: "*" or Resource: "*" in IAM policies. Specify exact actions needed: s3:GetObject, s3:PutObject — not s3:*. Specify exact resource ARNs: arn:aws:s3:::my-bucket/* — not *. Use condition keys to further restrict: aws:SourceVpc, aws:PrincipalOrgID. Every IAM role should have a clear purpose documented in its Description property.'
For service roles: 'Lambda execution roles need only the permissions the function uses — logs:CreateLogGroup, logs:PutLogEvents, and specific service access. ECS task roles need only the services the container calls. Never attach AdministratorAccess or PowerUserAccess to service roles.'
AI assistants generate permissive IAM because it makes the template work immediately — no permission denied errors to debug. But Action: '*' in a production IAM role is a critical security vulnerability. Your rules enforce precision from the start.
- Never Action: '*' or Resource: '*' — specify exact actions and ARNs
- Condition keys: aws:SourceVpc, aws:PrincipalOrgID for additional restriction
- Service roles: minimum permissions per function/container
- No AdministratorAccess or PowerUserAccess on service roles — ever
- Description on every role explaining its purpose
Action: '*' in an IAM policy gives full access to the entire AWS service. AI uses it because it eliminates permission errors. Your rule: exact actions, exact ARNs, always.
Rule 3: Parameters Over Hardcoding
The rule: 'Never hardcode values that vary between environments: instance types, CIDR blocks, domain names, account IDs, AMI IDs. Use Parameters with AllowedValues for constrained choices, Default for common values, and Description for documentation. Use Mappings for environment-specific values: Mappings: { EnvConfig: { prod: { InstanceType: m5.large }, dev: { InstanceType: t3.small } } }.'
For secrets: 'Never put secrets in CloudFormation parameters (they appear in the console and API). Use AWS Secrets Manager or SSM Parameter Store with dynamic references: {{resolve:secretsmanager:my-secret:SecretString:password}}. For database passwords, use ManageMasterUserPassword: true (RDS manages the secret automatically).'
For reusability: 'Use Conditions for optional resources: create a NAT Gateway only in production, skip it in dev. Use Fn::If in resource properties for environment-specific values. This makes one template work across all environments — not separate templates per environment.'
Rule 4: Nested Stacks and Organization
The rule: 'Break large templates into nested stacks by domain: networking (VPC, subnets, NAT), compute (ECS, Lambda, EC2), database (RDS, DynamoDB), and monitoring (CloudWatch, alarms). Each nested stack has its own template with clear inputs (Parameters) and outputs (Outputs). The parent stack composes nested stacks and passes values between them.'
For cross-stack references: 'Use Outputs with Export for values shared across stacks. Use Fn::ImportValue in consuming stacks. Name exports with a {StackName}-{ResourceName} pattern for clarity. Document all exported values — they create coupling between stacks.'
For template size: 'CloudFormation has a 500-resource limit per stack. Nested stacks each get their own 500-resource limit. Plan your stack boundaries around domain boundaries, not the resource limit — well-organized stacks rarely hit the limit.'
Rule 5: Safe Stack Updates
The rule: 'Enable termination protection on production stacks: aws cloudformation update-termination-protection --enable. Use change sets for all production updates — never update-stack directly. Review the change set for unexpected replacements (Replace: True means the resource is destroyed and recreated). Enable stack policy to prevent updates to critical resources.'
For rollback: 'Enable automatic rollback on failure (default). Set RollbackConfiguration with monitoring alarms — if a CloudWatch alarm fires during the update, CloudFormation rolls back automatically. Use disable-rollback only in development for debugging.'
For drift detection: 'Run drift detection monthly on production stacks. Resources modified outside CloudFormation (manual console changes) create drift that causes unexpected behavior on the next update. Detect and reconcile drift before making stack changes.'
- Termination protection on all production stacks
- Change sets for all production updates — never direct update-stack
- Review for Replace: True — replacement destroys and recreates resources
- Stack policy to prevent updates to critical resources (databases, encryption keys)
- RollbackConfiguration with CloudWatch alarm monitoring
- Monthly drift detection — reconcile before updating
Never run update-stack directly on production. Use change sets to preview every modification. Review for 'Replace: True' — that means the resource is destroyed and recreated.
Complete CloudFormation Rules Template
Consolidated rules for AWS CloudFormation templates.
- DeletionPolicy: Retain/Snapshot on all stateful resources — never default Delete
- UpdateReplacePolicy: Retain on databases and storage
- IAM least privilege: exact actions, exact ARNs, condition keys — never wildcards
- Parameters for all variable values — Secrets Manager for credentials
- Nested stacks by domain: networking, compute, database, monitoring
- Change sets for production — review for unexpected replacements
- Termination protection + stack policies on production stacks
- cfn-lint + cfn-nag in CI — security and best practice scanning