AI Rules for Terraform and Infrastructure as Code

Why Terraform Rules Are the Most Critical AI Rules You'll Write

Terraform is where AI coding mistakes cost real money — immediately. An AI-generated EC2 instance with the wrong type runs up a $10,000/month bill. An S3 bucket without access controls exposes customer data. A security group with 0.0.0.0/0 ingress opens your infrastructure to the internet. And unlike application code where mistakes cause bugs, Terraform mistakes cause infrastructure incidents.

AI assistants generate Terraform that 'works' in the same way they generate application code that 'works' — it creates the resources you asked for without considering security, cost, or operational implications. The AI creates a public subnet because you said 'create a subnet,' not because you wanted it public.

These rules are the safety net between an AI suggestion and your production infrastructure. They're arguably more important than any application-level AI rules because the blast radius is larger and the consequences are immediate.

Rule 1: Security Defaults

The rule: 'All resources are private by default. S3 buckets: block_public_access = true, no public ACLs. Security groups: no ingress from 0.0.0.0/0 except port 443 for load balancers. RDS instances: publicly_accessible = false, encrypted = true. IAM policies: least privilege — never use "*" for actions or resources. KMS encryption on all data at rest.'

For network security: 'VPC with private subnets for compute, public subnets only for load balancers. NAT gateway for outbound internet from private subnets. No direct SSH access — use SSM Session Manager. All inter-service communication over private networking.'

The AI's default is the cloud provider's default — which is often permissive. Your rules override these defaults with secure baselines. Every resource the AI creates should be locked down unless explicitly opened.

S3: block_public_access, encryption, versioning enabled by default
Security groups: deny all ingress, allow only specific ports from specific sources
RDS: private, encrypted, multi-AZ, automated backups
IAM: least privilege, no wildcard actions/resources, MFA for humans
Network: private subnets for compute, public only for ALB, no direct SSH

⚠️ Private by Default

The AI's default is the cloud provider's default — which is often permissive. S3 buckets, security groups, RDS instances: all must be explicitly locked down. Your rules override permissive defaults with secure baselines.

Rule 2: Module Design and Reuse

The rule: 'Use modules for all reusable infrastructure patterns. Never inline resources that follow a common pattern — extract to a module. Modules live in modules/ directory with: main.tf (resources), variables.tf (inputs), outputs.tf (outputs), versions.tf (provider constraints). Pin module versions in the calling code.'

For module interfaces: 'Variables have descriptions, types, and validation blocks. Use defaults for optional parameters. Outputs expose only what consumers need — not internal resource attributes. Use object types for complex inputs: variable "config" { type = object({ name = string, size = number }) }.'

For sourcing: 'Use a private Terraform registry or Git-based module sources for team modules. Pin to specific tags (ref=v1.2.3), not branches. For public modules, pin exact versions in the Terraform registry.'

Rule 3: State Management and Backend

The rule: 'Never use local state in shared infrastructure. Use a remote backend: S3 + DynamoDB for AWS, GCS for GCP, Azure Blob for Azure. Enable state locking to prevent concurrent modifications. Enable state encryption at rest. Separate state files per environment (dev/staging/prod) and per domain (networking/compute/database).'

For state safety: 'Never manually edit terraform.tfstate — use terraform state commands. Use terraform plan before every apply — review the plan output for unexpected destroys or replacements. Use lifecycle { prevent_destroy = true } on critical resources (databases, storage). Use moved blocks for refactoring — never delete and recreate stateful resources.'

AI assistants generate Terraform without backend configuration — defaulting to local state. One developer running apply with local state while another has different local state creates state conflicts that can destroy resources.

💡 prevent_destroy

Add lifecycle { prevent_destroy = true } on databases, storage buckets, and any stateful resource. This stops `terraform destroy` from accidentally deleting production data.

Rule 4: Naming Conventions and Tagging

The rule: 'All resources follow the naming pattern: {project}-{environment}-{component}-{resource}. Examples: myapp-prod-api-alb, myapp-staging-db-primary. All resources must have tags: Name, Environment, Project, Team, ManagedBy=terraform. Use a common tags local and merge with resource-specific tags.'

For tagging strategy: 'Tags are required, not optional. They drive cost allocation, access control, and operational visibility. Define a minimum tag set in a locals block and apply to every resource with default_tags in the provider configuration. CI should fail if resources are missing required tags.'

Consistent naming and tagging are where Terraform rules provide the most operational value. Without them, you can't answer basic questions: 'How much does the staging API cost?' 'Which team owns this security group?' 'Is this resource managed by Terraform or created manually?'

Rule 5: Cost Control and Right-Sizing

The rule: 'Never use instance types larger than needed for the environment. Dev/staging environments use smaller instances than production. Use spot instances or preemptible VMs for non-critical workloads. Enable auto-scaling with appropriate min/max bounds. Set budget alerts on every AWS account or GCP project. Run infracost in CI to estimate cost changes before apply.'

For resource lifecycle: 'Dev environments should have auto-shutdown schedules (evenings, weekends). Use terraform destroy for temporary environments. Set TTL tags on ephemeral resources. Review and right-size monthly — cloud costs drift upward without active management.'

AI assistants default to generous instance sizes because they optimize for functionality, not cost. A rule like 'Dev uses t3.small, staging uses t3.medium, prod uses t3.large for API servers' prevents $500/month surprises from an AI that chose m5.4xlarge 'to be safe.'

Right-size per environment: dev < staging < prod instance types
Spot/preemptible for non-critical workloads — significant cost savings
Auto-scaling with min/max bounds — not fixed instance counts
infracost in CI — estimate cost changes before terraform apply
Dev auto-shutdown schedules — TTL tags on ephemeral resources
Monthly right-sizing review — cloud costs drift upward without attention

ℹ️ infracost in CI

Run infracost in CI to estimate cost changes before terraform apply. A PR that adds an m5.4xlarge 'to be safe' shows up as +$500/month in the cost comment — visible before anyone approves.

Complete Terraform Rules Template

Consolidated rules for Terraform. These apply to any cloud provider — adjust resource names for AWS, GCP, or Azure.

All resources private by default — explicit rules required to make anything public
Modules for reusable patterns — pinned versions, typed variables, minimal outputs
Remote state with locking — separate state per environment and domain
lifecycle { prevent_destroy } on databases and storage — moved blocks for refactoring
Naming: {project}-{env}-{component}-{resource} — required tags on every resource
Right-sized per environment — infracost in CI — budget alerts on accounts
terraform plan before every apply — no auto-approve except in fully automated pipelines
tflint + checkov in CI — security and best practice scanning on every PR