AGENTS MEDIUM NEW

The agent harness is your real privilege boundary — and most teams draw it in the wrong place

A May 26, 2026 Pillar Security write-up argues the harness — Claude Code, Cursor, Codex — holds the secrets, tools and hooks an agent never sees. Recent harness bugs and CVE-2026-22708 make the case concrete.

2026-05-28 // 7 min affects: claude-code, cursor, codex, aider, cline, ai-coding-agents, agent-runtimes

What is this?

On May 26, 2026, Pillar Security’s Dor Sarig published Your Agent Harness Has More Privilege Than Your Agent. The argument is short and load-bearing: in a modern coding agent — Claude Code, Cursor, Codex, Aider, Cline — the model is the engine, but the harness is the car. The harness holds the API keys, mediates the tool calls, owns the file-system permissions, writes the session log, and decides what the model is even shown each turn. If your security model treats the agent (the LLM in the loop) as the unit of risk, the boundary is drawn in the wrong place.

Two recent, independent data points make the case concrete. Pillar’s own January 14, 2026 Cursor disclosure (CVE-2026-22708) showed how shell built-ins outside the harness’s allowlist let an attacker poison environment variables and turn a “safe” git branch into arbitrary code execution. And Anthropic’s April 23 postmortem on Claude Code quality reports traced two months of user-visible degradation to three changes in the harness — not the model — including a caching bug that silently dropped reasoning history mid-session. The harness is where the leverage sits. It is also where the bugs sit.

How it works

A harness is the fixed scaffolding that turns a one-shot LLM into something that can act. It owns several things the agent — the model in the loop — never directly touches:

Component                 Held by harness?      Held by model?
------------------------  --------------------  ---------------------
API keys / secrets        Yes                   No
Tool implementations      Yes                   No (only descriptions)
Permission classifier     Yes                   No
File-system access        Yes                   Routed via tools
Session event log         Yes                   No
Context compaction        Yes                   No
System prompt assembly    Yes                   No
Hooks (pre / post tool)   Yes                   No
Subagent spawn / policy   Yes                   No

Sarig walks through the attack surfaces this creates, drawing on Pillar’s own Cursor research and on patterns now visible across the major coding harnesses.

Tool descriptions are a prompt-injection surface. The model is steered by descriptions the harness loads on every turn. A poisoned description — via a supply-chain compromise, a malicious MCP server, or a registry update — silently redirects which tool gets picked. There is usually no log entry that reads “the description changed last Tuesday”.

System-prompt assembly walks the file tree. Modern harnesses scan parent directories for files like CLAUDE.md, AGENTS.md, .cursor/rules and inject what they find. A malicious file dropped anywhere up the tree ends up in the system prompt. This is by design — and it is a real attack path the moment an agent runs against an untrusted repository.

Hooks are the most powerful extension point — and the most dangerous. Pre-tool hooks can allow, deny, or rewrite a tool call. A compromised hook is a silent man-in-the-middle for every tool. Post-tool hooks see every result. Enterprise harness adoption increasingly runs through hooks, which means enterprise compromise can too.

Context compaction is selective memory loss. When the context window fills, the harness summarises or drops older content. Whatever is dropped — including a malicious instruction seen at turn 12 — may still be influencing the agent at turn 47 without being available for an auditor to inspect. Compaction strategies are usually heuristic and rarely tested against adversarial input.

Permission classifiers parse strings. Bash-style allowlists are typically decided by parsing the command at dispatch time. rm goes to full-approval. ls stays read-only. What about find . -delete? What about an alias? What about export PAGER="open -a Calculator" followed by an allowlisted git branch? That last sequence is the heart of CVE-2026-22708: shell built-ins bypassed Cursor’s allowlist entirely, letting prompt injection poison the environment so an approved command became an exploit.

Subagents can escape parent policy. Subagents get their own tool lists and permissions. If the parent harness does not enforce policy consistently across the spawn boundary, an attacker who can influence subagent creation can use the child to do what the parent agent is not allowed to do.

The session log is a local secret store. Append-only event logs are the durability story of every modern harness. They are also a complete transcript of every secret that has passed through context, sitting on the developer’s disk, usually unencrypted.

Why it matters

Two shifts are worth tracking.

The first is where the bugs land. The Anthropic postmortem is unusually transparent on this point: the user-visible “Claude got worse” reports were not model regressions; they were a default-effort change, a cache-eviction bug that compounded across turns, and a verbosity instruction in the system prompt. None of those touched the API or inference layer. All three lived in the harness. Simon Willison’s reaction is worth quoting: “the kinds of bugs that affect harnesses are deeply complicated, even if you put aside the inherent non-deterministic nature of the models themselves.” For builders, that is the lesson. The bug class you have to defend against now includes harness state.

The second is where the attacks land. CVE-2026-22708 is the textbook case. The model did not do anything novel; the harness’s allowlist was the trust anchor, and it was bypassable via shell built-ins that the parser never classified. The fix in Cursor 2.3 closed that specific bypass, but Pillar’s analysis is explicit that this is a class problem: a harness that protects via allowlists rather than execution isolation will keep losing to creative input as long as the agent can be talked into typing it. Adjacent disclosures — agent-on-agent context poisoning, malicious AGENTS.md files, MCP tool-description drift — share the same shape.

Defenses

Treat the harness as the privilege boundary it actually is. Concretely:

Inventory the harness, not just the model. For every coding agent in production, write down which keys, files, sockets and tools the harness can reach. That set — not the model’s nominal capabilities — is your blast radius. Most threat models stop at “the LLM was tricked”; yours should continue to “and the harness then did X”.
Treat shell built-ins and parameter expansion as security-sensitive. The lesson of CVE-2026-22708 is generalisable: any allowlist that classifies “external command vs. built-in” leaks privilege the moment built-ins can modify the environment those external commands depend on. Audit your own permission classifier for the same class of bypass.
Move from allowlists toward execution isolation. Pillar’s own recommendation, echoed by Cursor’s post-fix guidance, is that allowlists are best-effort. The robust answer is sandboxed execution — containers, VMs, restricted process trees — so that “the agent ran a command” never means “the agent had ambient access to the developer’s home directory”.
Audit dynamic system-prompt assembly. If your harness walks parent directories for files like CLAUDE.md, AGENTS.md, .cursor/rules or .windsurfrules, treat every one of those files as untrusted input when the agent is operating in an attacker-influenced workspace. Log what was injected. Make the injection visible to the user before the first turn.
Audit tool descriptions and MCP registries. Descriptions are prompt-injection surfaces. Pin versions. Diff descriptions on update. Reject silent registry mutations. The same supply-chain hygiene you apply to dependencies applies here.
Add an independent verify step. Agents routinely report success when they have failed — and the same failure mode covers attacker-driven behaviour. A verify step that reads the trace independently of the agent (and that is itself out of the agent’s control) is the cheapest defence against “the agent says it did X” diverging from “the harness’s tool log says it did Y”.
Treat the session log as a secret. Encrypt at rest. Redact known-secret patterns at write time. Set a retention policy. Anything that lives in ~/.claude, ~/.cursor, ~/.codex or the equivalent on a shared workstation should be in your sensitive-files list.
Re-baseline incident response. When something goes wrong with a coding agent, the first question is now “which harness version, with which hooks, against which workspace” — not “which model”. Build the corresponding fields into your incident schema.

Status

Item	Reference	Date	Notes
”Your Agent Harness Has More Privilege Than Your Agent”	Pillar Security (Dor Sarig)	2026-05-26	Conceptual write-up — harness is the privilege boundary
”The Agent Security Paradox” / CVE-2026-22708	Pillar Security (Dan Lisichkin)	2026-01-14	Cursor allowlist bypass via shell built-ins
Cursor security advisory GHSA-82wg-qcm4-fp2w	GitHub	2026-01-14	Fixed in Cursor 2.3; affected versions ≤ 2.2
CVE-2026-22708 (NVD)	NIST	2026-01-14	High severity; CWE-15 / CWE-20 / CWE-74 / CWE-77 / CWE-78 / CWE-94 / CWE-269
Claude Code postmortem	Anthropic Engineering	2026-04-23	Three harness-layer bugs traced; all fixed by April 20 (v2.1.116)
Simon Willison commentary	simonwillison.net	2026-04-24	Independent reading of the postmortem

The framing to carry away is not that any one harness is broken. It is that the harness — Claude Code, Cursor, Codex, Aider, Cline and the rest — has quietly become the most privileged component in the agent stack, and the security work has to follow. The interesting questions are no longer “is the model safe?” or “are the prompts safe?” They are: what does the harness have access to, who controls it, and how do you know what it actually did?