Terraform in Enterprise Environments: The Patterns That Scale

Terraform is easy to start with and hard to do well. In small teams with a monorepo it’s straightforward. Once 10+ teams are provisioning infrastructure, problems emerge that you didn’t see coming.

Here are the patterns I’ve learned in enterprise projects.

The State Problem

Terraform state is at the core of every problem in larger organizations. Anyone who’s run a terraform apply while a colleague was touching the same infrastructure knows the result.

The solution is state isolation — but how granular?

Too coarse: One state file for everything. Every plan takes 10 minutes, locks block half the team.

Too fine: One state file per resource. Too much overhead, dependencies become unclear.

My approach: State files by ownership:

states/
├── networking/        # VPC, Subnets, DNS — rarely changed
├── platform/          # Kubernetes clusters, databases — moderately changed
└── workloads/
    ├── team-a/        # Each team manages its own state
    ├── team-b/
    └── team-c/

Modules: When and How

The classic pattern is to wrap everything in modules. This sounds good in theory. In practice you end up with modules nobody touches because every change is unclear.

Modules make sense for:

Reused patterns (e.g. a standard EKS cluster with predefined defaults)
Compliance requirements that must be enforced centrally
Abstracting differences between cloud providers

Modules are wrong for:

One-off infrastructure
Anything that changes frequently
“Because it looks neater”

A good module has clear inputs, sensible defaults, and hides nothing important:

module "eks_cluster" {
  source  = "git::https://github.com/company/terraform-modules.git//eks?ref=v2.3.0"

  cluster_name = "production"
  node_groups = {
    general = {
      instance_types = ["m5.xlarge"]
      min_size       = 3
      max_size       = 10
    }
  }

  # Compliance defaults set in the module:
  # - encryption at rest: true
  # - private endpoint: true
  # - audit logging: true
}

Atlantis: What It Changes

The biggest quality leap in team workflows doesn’t come from better modules — it comes from Atlantis. The principle: Terraform plans and applies no longer run locally but are triggered by pull requests.

What this changes:

No “works on my machine” — everyone sees the same plan
Review before apply — a second pair of eyes on every infrastructure change
Audit trail — every apply is linked to a PR and a person
No local credentials — the team no longer needs direct AWS/GCP/Azure access

# atlantis.yaml
version: 3
projects:
  - name: platform
    dir: infrastructure/platform
    workspace: production
    apply_requirements:
      - approved
      - mergeable
    workflow: production

workflows:
  production:
    plan:
      steps:
        - init
        - plan:
            extra_args: ["-var-file=production.tfvars"]
    apply:
      steps:
        - apply

OPA for Compliance-as-Code

In regulated environments, conventions aren’t enough. Nobody voluntarily follows tagging policies when under time pressure.

OPA (Open Policy Agent) with Conftest turns policies into tests:

# policies/tagging.rego
package terraform

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_instance"
  not resource.change.after.tags.environment

  msg := sprintf(
    "Resource '%s' is missing the 'environment' tag",
    [resource.address]
  )
}

# In the CI pipeline
terraform plan -out=plan.tfplan
terraform show -json plan.tfplan > plan.json
conftest test plan.json --policy policies/

Policy violations break the build. No exceptions, no “I’ll do it next time.”

Conclusion

Terraform in enterprise doesn’t scale through more modules or better directory structures. It scales through:

Clear state isolation by ownership
Atlantis for traceable, reviewed applies
Policies as code, not documentation

Everything else is optimization.