Capabilities Can't See Your Agent's Objective

In July 2025, Jason Lemkin watched a Replit coding agent delete his production database during an active code freeze, after he had told it repeatedly not to make changes. The agent had legitimate credentials. The database write was inside its scope. Its post-incident confession was that it had “panicked instead of thinking” and “violated every principle” it had been given. The incident is catalogued as Incident 1152 in the AI Incident Database, and it sits in a growing list of agents acting destructively under credentials that were issued exactly for the surface the agent destroyed. The usual reading is that we gave the agent too much power.

I think that reading is short-sighted. Yes, the permissions were too broad, but those permissions are sometimes necessary to do the work the agent was asked to do, and arguing about whether to grant them is arguing about the wrong thing. The Replit incident is an example of a larger pattern: agent risk is a problem of intent against objective, and the industry is solving a problem of capability against task. I keep seeing conversations about whether coding agents are safe, which permissions are too dangerous to grant, how to scope an MCP server tightly enough that nothing bad can happen, all treating risk as a property of what the agent does, all answering the wrong question.

The objective is what the principal originally asked the agent to do: “debug this outage,” “prep me for my 2pm,” “build POCs against new identity standards.” Stable, stated up front, the anchor. The intent is what the agent has decided it wants to do at this exact turn: call this tool, read that file, restart this pod. Generated by the model, shifts as the agent learns. The question that decides how an agent behaves is whether its current intent still serves the original objective, in light of what it has now learned. The same action by an agent is safe, suspect, or catastrophic depending on whether you can answer that question cleanly.

Tokens answer the wrong question

A capability framing treats the agent as a credentialed user with a bounded set of allowed actions. Issue the right scopes once, check them on every call, refuse what’s out of scope. That model worked for human users and for service-to-service APIs because the objective and the intent were both, in effect, static for the lifetime of the credential. A human user wants to keep using the app. A service wants to keep doing its one job. The token answers “is this caller allowed to perform this action,” and the answer holds for as long as the token is valid.

Agents break that assumption. The agent’s intent is a runtime artifact, plausibly in service of the objective it was given, sometimes drifting into something the principal would not have authorized if asked. A capability scope can refuse an obviously-out-of-bounds action, but it can’t tell you whether an in-scope intent is still serving the original objective, because neither the objective nor the intent was ever part of the model. The token answers what the agent may do; it has nothing to say about whether the agent should still be doing it. Karl McGuinness has named this gap in detail: the credential answers the may-do question, and the runtime authority question of whether the agent should still be doing this work is exactly what the existing stack has no primitive for.

The capability framing scopes the verbs the agent can speak. The actual problem is the sentences the agent is forming, and whether those sentences still mean what the principal asked for.

How tightly that reconciliation runs depends on the archetype. A personal agent has you in the loop: if its intent drifts, you see it within seconds and you stop it. A team agent is exercising a pre-authorized class of action on behalf of a rotation or a policy, with no individual instance under review, so drift is the failure mode that creeps in unobserved. An autonomous agent has no live principal at all, so the only thing the system can check each new intent against is the original objective and whatever the agent has now learned. The further you move from a watching human, the more weight the runtime reconciliation has to carry.

Same tool, different answers

Consider meeting prep across three of my own agents. The personal one preps me for my day: it pulls from my calendar, my Granola transcripts including 1:1s, and Slack including DMs and private channels. Wide aperture, and it should be, because the principal is me, the data is mine, and the intent (“surface what we talked about last time so I can walk in informed”) is in obvious service of an objective I just stated.

The team agent that recaps the week has the same tools available and a narrower aperture. It reads public channels and team meetings, not 1:1s, not DMs, not any individual’s private Granola. Slack helps here: scopes like channels:history and im:history are separate, so the team agent can be issued the channel scope without the DM one. The objective is what makes that choice clear: the team’s objective is “surface what the team did this week,” and DM content was never in service of that, regardless of how technically convenient it would be to include it.

Intent moves, permissions move with it

We already have the right pattern for this, we just don’t use it for agents. Privileged Access Management gave up on standing admin credentials a long time ago. A database administrator doesn’t get permanent prod access; they request a just-in-time elevation tied to a specific ticket, the system grants narrow scoped access for the duration of the work, and the elevation expires when the work is done. The grant is bound to the ticket (the objective of the session), not the identity of the admin. Same admin, different ticket, different objective, different elevation, every time.

The PAM JIT pattern is the shape agent authority needs, with one wrinkle: the intent is being stated by the model, not a human, and it sharpens as the agent learns. An agent is asked to debug a production outage and starts by reading logs and looking at metrics. The metrics suggest a single pod has gone unhealthy. The agent now has a justifiable intent: restart that pod. It did not have that intent a minute ago, and would have been correctly refused if it had asked for the permission up front. The permission profile that fits the agent’s intent at turn three is not the one that fit at turn one, and the system has to be able to re-evaluate the grant against the agent’s evolving intent rather than treating the initial token as the final word. (Time-bound credentials are not enough here, but that’s a post for another time.) What agents need on top of PAM JIT is a way to check that the agent’s current intent still serves the principal’s original objective, because the agent is unreliable in a way humans are not.

Where this leaves us

The reason agent strategy conversations go in circles is that we keep arguing about the wrong question. We argue about which tasks are too dangerous to allow, which agents to ban, which capabilities to scope. Those arguments produce policies that feel decisive and accomplish very little, because they treat the agent as a credentialed user when it is something else entirely.

The thing the agent actually is, at the moment it takes an action, is the holder of a grant from a principal, generating intents that have to keep serving the objective it was originally given. The trust stack has to make that objective first-class and has to keep the agent’s evolving intent reconciled against it. AAuth and the adjacent work give every agent its own identity, which is the visible half of the problem. The harder half is the continuous reconciliation between the intent the agent now has and the objective the principal originally authorized. Capabilities are necessary. They are not sufficient. The reconciliation layer between intent and objective is the part that doesn’t exist yet, and until we build it, every agent we deploy is one ungrounded turn away from doing exactly what we authorized it to do, in service of an objective nobody ever asked for.

Agents are not credentialed users. They’re attorneys-in-fact, on a mandate, and the mandate has to hold at runtime or the rest of the stack is just paperwork.