GitOps, Terraform, ArgoCD all share the same promise: declare your desired state in code, and the tooling makes it real. Your code is the source of truth.

Except it is not. Not really.

Your HPA scaled replicas at 2am because traffic spiked. Your security scanner patched a vulnerable image. Your cost optimizer right-sized an instance. Your on-call engineer hotfixed a config during an incident. None of them opened a PR. None of them asked your Terraform code for permission.

We have managed this tension for years. Drift detection, terraform import, cultural discipline (“do not touch the console”). It worked well enough when the actors were mostly human and changes happened at human speed.

That is changing. Here is what breaks when it does.

The Mental Model We All Share

Whether you are using Terraform, Pulumi, CloudFormation, ArgoCD, or Flux, the underlying model is the same:

  • Code is authoritative. If there is disagreement between your config and what exists in the cloud, the config is right.
  • State converges to code. The reconciler’s job is to make reality match the declaration.
  • Drift is aberrant. Out-of-band changes are mistakes to be corrected.
  • Changes flow one direction. Code to cloud. The PR is the entry point.

This model is elegant. Desired state vs actual state. Idempotent operations. Version control, code review, audit trails. Reproducibility.

But it assumes a single writer—or at least a small number of coordinated writers who all go through the same PR process.

The Cracks Were Always There

Drift has always existed. We just did not talk about it much.

Console edits during incidents. Autoscalers doing their job. Operators mutating resources—cert-manager rotating certs, external-dns updating records. Cloud provider defaults that silently change. Security patches applied through a different pipeline.

That terraform import you ran last month? That was a quiet admission that state escaped your code.

We built workarounds:

  • Terraform: State locking, terraform refresh, drift detection in Terraform Cloud
  • GitOps: Continuous reconciliation—just overwrite whatever changed
  • Policy enforcement: OPA/Gatekeeper to reject “bad” configurations at admission time
  • Culture: “Do not touch the console. Always go through the PR.”

These workarounds held because the conditions allowed it. Actors were few. Velocity was human-paced. When things got tangled, someone could untangle them manually.

Here is the uncomfortable truth: the cloud provider’s API was always the real source of truth. Your IaC is a projection of intent. When terraform plan shows changes, it does not mean your code is right. It means reality diverged.

Kubernetes acknowledged this with Server-Side Apply. It introduced explicit field ownership through managedFields—tracking which actor last touched which field. When two actors try to modify the same field, you get a conflict. It is an admission that multi-actor is real, not aberrant.

What Is Changing

It is not just autoscalers anymore.

Security scanners automatically patch vulnerable images. Cost optimization platforms right-size resources. Compliance tools enforce encryption and tagging. Self-healing systems remediate issues without human intervention. Platform teams build internal abstractions that generate and apply IaC programmatically.

And increasingly, AI-assisted workflows—autonomous remediation, LLM-driven ops—add more actors to the mix.

Each of these has opinions about what the infrastructure should look like. Each makes changes.

The coordination problem scales non-linearly:

  • 2 actors: Manageable with conventions. “HPA owns replicas, you own everything else.”
  • 5 actors: Need explicit ownership boundaries and documentation.
  • 20 actors: Humans cannot track the interactions. Need automated coordination.
  • N actors at machine speed: The model breaks.

Think about what breaks:

  • PR review assumes human-speed changes. Cannot review fifty automated modifications per hour.
  • “Do not touch the console” assumes humans are the problem. Irrelevant when changes come from authorized automated systems.
  • Continuous reconciliation assumes Git should always win. Fails when overwriting undoes a valid security patch.
  • State locking assumes sequential operations. Does not help with concurrent actors.

This forces us to confront two distinct problems we have been conflating.

Problem 1: Reconciliation

The first problem is mechanical: how do we detect drift and sync changes back to IaC?

This is about observation and translation:

  • Where does authoritative state live? (The cloud API.)
  • How do we observe all changes, regardless of source?
  • How do we update our code to reflect reality without breaking its structure?

There is real progress here. A recent paper called NSync takes an interesting approach. The key insight: all infrastructure changes—console, CLI, SDK, Terraform—become cloud API calls. AWS CloudTrail, Azure Activity Logs, GCP Audit Logs see everything.

NSync uses these audit logs to detect drift, then uses LLMs to infer high-level intent from noisy API traces. A flurry of API calls becomes “someone added an S3 bucket with versioning enabled.” It then synthesizes IaC patches that preserve the existing code’s structure. The results look promising—0.97 accuracy at pass@3.

Other tools work this space too. Driftctl detects drift between Terraform state and cloud reality. Terraform Cloud has drift detection. GitOps tools reconcile continuously.

The mechanical problem—“what changed?"—is being solved.

But reconciliation answers “what changed.” It does not answer “should we accept it?”

Problem 2: Intent Arbitration

The second problem is harder: when multiple actors make conflicting changes, whose intent wins?

This is not mechanical. It is semantic. It is about policy.

Some examples:

  • HPA vs cost optimizer: HPA scales replicas up for availability. Cost optimizer scales them down for budget. Both are correct within their own logic. Who wins?
  • Security patch vs app stability: Security scanner patches a vulnerable image. The patch breaks compatibility. Whose priority?
  • Hotfix vs declared state: On-call engineer hotfixes a config at 3am. Morning’s terraform apply reverts it. Was the hotfix wrong, or the code?
  • Compliance vs developer intent: Compliance tool enforces encryption. Developer explicitly disabled it for a test environment with synthetic data. Which intent wins?

Look at what we have today:

  • Kubernetes SSA: Tracks who touched each field, not why. Conflict resolution is “force” or “fail.”
  • OPA/Gatekeeper: Binary allow/deny. Cannot arbitrate between two valid intents.
  • NSync: Assumes out-of-band changes should be accepted. Does not question validity.
  • Terraform/Pulumi: Your code is always correct. apply overwrites.
  • GitOps: Git always wins.

What is missing:

  • Intent metadata for automated changes. For human-driven changes, Git already captures intent—commit messages, PR descriptions, linked tickets. But automated changes bypass Git entirely. And even when intent is captured, it is prose for humans, not structured data a system can reason about. We need machine-readable intent that can be compared and arbitrated programmatically.
  • Arbitration logic. Priority hierarchies. Contextual rules. Negotiation protocols. Something beyond “last writer wins” or “code always wins.”
  • Feedback loops. Learning which intents matter in which contexts. Incident response might trump cost optimization at 3am but not at 3pm.

We can record who changed what. We can infer what the change was. We cannot decide whose intent should win.

What We Already Solve

It is worth acknowledging that we have solved related problems.

Resource contention is well understood. OS scheduling uses priority levels, nice values, and cgroups to decide which process gets CPU time. Kubernetes has PriorityClasses and QoS tiers to determine which pods get resources—and which get evicted first under pressure. When there is not enough CPU or memory for everyone, we have clear mechanisms to arbitrate.

Task allocation is well studied too. Multi-agent robotics research has spent decades on how autonomous agents divide work, avoid collisions, and coordinate movement. Distributed systems literature covers job scheduling, work queues, and load balancing.

Concurrent access to shared data has mature solutions. Locks, transactions, isolation levels. When two actors want to modify the same row, we have strategies—optimistic locking, pessimistic locking, conflict detection.

But none of these solve the intent arbitration problem.

Resource contention answers “who gets the CPU.” Task allocation answers “who does which job.” Concurrent access answers “how do we avoid corrupting shared data.”

None of them answer: “HPA wants replicas=10 for availability, cost optimizer wants replicas=3 for budget—which intent should the system adopt?”

That is not a scheduling problem. Both actors have valid authority. Both changes are well-formed. The conflict is semantic: two legitimate intents about what the desired state should be.

The Gap

The one domain that explicitly tackles this is Intent-Based Networking. The networking world assumed multi-actor from the start—routing protocols, SDN controllers, multiple tenants with overlapping policies. RFC 9315 defines a formal intent lifecycle, and there is active research on detecting and resolving conflicts between intents.

Infrastructure has not caught up. Our tools assume single-writer. When that assumption breaks, we do not have a fallback—just “last writer wins” or “code always wins.”

Where This Leaves Us

The point here is not to propose solutions. It is to name the problem clearly.

We have been conflating two distinct challenges:

  1. Reconciliation—the mechanical problem of detecting and syncing drift. This is being solved. Real progress.
  2. Intent arbitration—the semantic problem of deciding whose intent wins when actors conflict. This is largely unsolved for infrastructure.

We have mature solutions for resource contention, task allocation, and concurrent access. We do not have solutions for competing intents over shared desired state. The tools we have—Terraform, Pulumi, ArgoCD, SSA—were built for a world of human operators making deliberate changes through controlled processes. They do not have a theory of competing intents.

Agents are going to force the issue. When you have autonomous systems making changes at machine speed, “just overwrite” and “Git always wins” stop being viable strategies.

The source of truth is not a file. It is a negotiation.