system: OPERATIONAL
← back to all hacks
RESEARCH MEDIUM NEW

The agent-human security gap: what production ships, what papers study

A May 23, 2026 UCLA paper audits 59 academic studies, 21 production agent systems and 26 security plugins — and finds that the defenses researchers favor have zero production deployment.

2026-05-29 // 6 min affects: llm-agents, mcp-clients, ai-coding-assistants, rag-pipelines, browser-agents

What is this?

On May 23, 2026, three UCLA researchers — Peiran Wang, Ying Li and Yuan Tian — posted Reframing LLM Agent Security as an Agent–Human Interaction Problem (arXiv:2605.24309). The paper is not a new attack and not a new defense. It is a systematic audit of how the field actually defends agents in 2026, mapping 59 academic papers, 21 production agent systems and 26 security plugins as of April 2026. The result is one of the cleanest snapshots of the gap between agent-security research and shipped agent systems we have seen this year.

How it works

Wang et al. start from the observation that nearly every production agent — Claude Code, Cursor, Copilot, Gemini CLI, ChatGPT Agent, Microsoft 365 Copilot, MCP-based assistants — places a human somewhere in the loop. The paper classifies these Agent-Human Interaction (AHI) mechanisms into five categories:

  • Policy specification — the user writes upfront rules (“never push to main”, “no network egress”). Adopted by at least 14 of 21 production systems surveyed.
  • Runtime approval — the agent asks “may I run this command / send this email / call this tool?” before each sensitive action. Also adopted by 14+ of 21 systems.
  • Scope configuration — the user picks allow-lists of files, tools, hosts or domains the agent is allowed to touch. Likewise dominant.
  • Intent anchoring — the system attempts to bind every action back to a verifiable user intent before execution. Heavily studied in academia, zero production deployments in the audit.
  • Trust labeling — information-flow style trust lattices or provenance labels on every token entering the context. Also heavily studied, also zero production deployments.

The split is brutal: the three categories practitioners actually ship receive minimal research attention, while the two categories researchers prefer have not crossed the threshold into a single shipped product. The paper attributes this to cognitive load. Trust labeling in particular requires users to reason about data provenance at a granularity that does not match their mental models — every token tagged, every flow tracked. Policy specification and scope configuration, while coarser, fit how operators already think.

The authors then formalise the failure mode of the dominant approach. Runtime approval, scaled to long agent sessions, produces approval fatigue: a 2026 coding agent can fire dozens of tool calls per task, and users either rubber-stamp every prompt or disable the dialog entirely. They cite this as the root cause behind several 2025–2026 indirect injection incidents, where the agent dutifully asked for confirmation and the human dutifully clicked “yes” on a request whose context had already been poisoned.

Why it matters

The reframing has two practical consequences for anyone shipping an agent.

First, it relocates the design problem. The question is no longer can the LLM be trusted to decide? but where in the human’s intent-alignment workflow can the LLM contribute the most leverage at the lowest risk? That is a UX question with security teeth, and it lines up with what Meta’s Agents Rule of Two and Simon Willison’s lethal trifecta already implied — defense is architectural, not behavioural.

Second, it explains why so many paper-clean defenses fail in audits. Intent anchoring assumes users will articulate intent in a structured form. Trust labeling assumes users will reason about labels. Neither assumption survives a real coding agent run. A December 2025 SoK on Trust-Authorization Mismatch in LLM Agent Interactions (arXiv:2512.06914) reaches a similar conclusion from a different angle: the authorization model the user thinks they have and the one the agent actually enforces routinely diverge.

Defenses

The paper is descriptive, not prescriptive, but the audit suggests a concrete shortlist for teams shipping agents in mid-2026:

  • Default to scope configuration, not runtime approval. A correctly scoped agent reduces the number of approval prompts, which is the only way to fight fatigue.
  • Treat policy specification as a first-class artifact. Version-control it, code-review it, ship it with the agent — the same way you would treat an IAM policy.
  • Reserve runtime approval for irreversible actions. Database writes, money movement, code merges, external sends. Everything else should be policy-decidable in advance.
  • Do not rely on intent anchoring or trust labeling alone. They are useful research directions but, per the audit, have not been productized. Layer them on top of the three dominant mechanisms, not in place of them.
  • Measure approval fatigue. Log per-session approval counts and click-through rates. A 95% rubber-stamp rate is a louder security signal than any classifier output.

Status

ItemDateStatus
Paper posted (arXiv:2605.24309)May 23, 2026Public preprint
Production systems auditedApril 202621 systems, 26 plugins
Academic corpus2022–202659 papers
Related SoK (Trust-Authorization Mismatch)Dec 2025arXiv:2512.06914
Industry uptake of AHI framingPendingDiscussion stage

The paper is a preprint and has not been peer-reviewed at the time of writing. Its empirical contribution — the audit of 21 production systems — is the part most directly useful to defenders today, and the part least likely to change in a revision.

Sources