system: OPERATIONAL
← back to all hacks
AGENTS CRITICAL NEW

AI coding agents: attackers go for the credential, not the model

Six 2026 exploits against Codex, Claude Code, Copilot and Vertex AI all bypassed model-level defenses and reached the same target — the agent's runtime credentials. The root cause is an identity governance gap, not a prompt problem.

2026-06-17 // 6 min affects: codex, claude-code, github-copilot, vertex-ai, cursor

What is this?

Over a nine-month run ending in spring 2026, six separate research teams disclosed exploits against the most widely deployed AI coding agents — OpenAI Codex, Anthropic Claude Code, GitHub Copilot, and Google Vertex AI. A VentureBeat synthesis published April 30, 2026 makes the pattern explicit: none of these attacks targeted the model’s output. Every one of them reached the same prize — the runtime credential the agent holds to authenticate to production systems.

This reframes the threat. The dominant LLM-security narrative is about manipulating what a model says (prompt injection, jailbreaks). These disclosures are about what an agent holds: an OAuth token, a service-account identity, a shell session. The agent authenticated to GitHub, Google Cloud, or a developer’s machine with no human session anchoring the request, and the credential — not the model — was the breach.

How it works

The clearest example is the Codex command-injection flaw disclosed by BeyondTrust Phantom Labs on March 30, 2026 (advisory, Decipher). Codex’s cloud container cloned repositories using a GitHub OAuth token embedded in the git remote URL, and an attacker-controlled branch name flowed unsanitized into a setup shell script. To hide the malicious branch in the web portal, researchers appended 94 Ideographic Space characters (Unicode U+3000) after main so it rendered identically to the real branch. OpenAI rated it Critical P1 and completed remediation by February 5, 2026, after disclosure that began December 16, 2025.

The same shape recurs across vendors, per the VentureBeat reporting:

  • Claude CodeCVE-2026-25723 let piped commands escape the write sandbox (patched in 2.0.55); CVE-2026-33068 loaded .claude/settings.json permissions before the workspace-trust prompt, so a malicious repo could set bypassPermissions and the prompt never appeared (patched in 2.1.53). Adversa separately found deny-rule enforcement was silently dropped once a command exceeded 50 subcommands (patched in 2.1.90).
  • GitHub CopilotCVE-2025-53773: hidden instructions in a pull-request description flipped auto-approve in .vscode/settings.json, enabling shell execution (patched in Microsoft’s August 2025 update). Orca Security separately exfiltrated a privileged GITHUB_TOKEN from a Codespaces flow via a crafted GitHub issue.
  • Vertex AI — Unit 42 found the default service identity (P4SA) attached to every agent carried excessive scopes, reaching Cloud Storage buckets and Google-owned Artifact Registry repositories.

In each case the model-layer or sandbox defense existed and was bypassed; the credential underneath stayed reachable.

Why it matters

AI coding agents are largely invisible to identity and access management. Enterprises inventory human identities and rotate human credentials, but most have no CMDB category, no PAM onboarding, and no lifecycle controls for the OAuth tokens and service accounts these agents receive at setup. The agents routinely hold more privilege than the developer would, because permissions are provisioned for speed and breadth. Two amplifiers make this acute: patches are reverse-engineered within roughly 72 hours, while an agent compresses the exploit window to seconds; and untrusted inputs the pipeline never treated as code — branch names, PR descriptions, GitHub issues, repo config files — become the injection surface.

Defenses

The mitigations are identity-governance disciplines, not model tweaks:

  • Inventory every agent identity. List each AI coding agent (Codex, Claude Code, Copilot, Cursor, Gemini Code Assist, Windsurf) and the exact OAuth scopes and service accounts it holds. Create a non-human-identity category in your CMDB if none exists.
  • Scope to least privilege and rotate. Migrate Vertex AI to a bring-your-own-service-account model; onboard agent credentials into PAM/IGA with rotation and separation of duties between the agent that writes code and the one that deploys it.
  • Collapse agent identity back to the human. An agent acting on your behalf should never exceed your own privileges, and its actions should be bound to a verified human session.
  • Treat repo metadata as untrusted input. Monitor branch names, PR descriptions, issues, and changes to .vscode/settings.json or .claude/settings.json for permission-mode flips, command chaining, and Unicode obfuscation (U+3000).
  • Patch to current builds. Claude Code ≥ 2.1.90; verify Copilot’s August 2025 fix; confirm Codex remediation.

Status

Vendor / productIssueDisclosedFix
OpenAI CodexBranch-name command injection → GitHub token theftMar 30, 2026 (BeyondTrust)Remediated Feb 5, 2026
Claude CodeSandbox escape CVE-2026-25723; trust-dialog bypass CVE-2026-33068; 50-subcommand deny bypass20262.0.55 / 2.1.53 / 2.1.90
GitHub CopilotPR-description injection CVE-2025-53773; Codespaces token exfil (Orca)2025–2026Aug 2025 update
Google Vertex AIOver-scoped default P4SA service identity2026 (Unit 42)BYO service account

The throughline: every vendor shipped a defense, and every defense was bypassed at the credential layer. Until AI coding agents are governed with the same rigor as privileged human identities, the model is not the attack surface — the agent is.

Sources