SUPPLY CHAIN CRITICAL NEW

Malicious LLM API routers: the unguarded man-in-the-middle for agents

A UC Santa Barbara study (arXiv, April 9, 2026) measured 428 third-party LLM API routers and found dozens injecting code, stealing credentials and draining a crypto wallet — all from a trust boundary developers configure voluntarily.

2026-06-15 // 7 min affects: ai-agents, llm-api-routers, openai-codex

What is this?

An LLM API router is a proxy that sits between your agent and the upstream model provider — OpenAI, Anthropic, Google and others — to handle load balancing, multi-provider failover, cost tracking and rate limiting. LiteLLM-style gateways and the “OpenAI-compatible base URL” pattern are everywhere in agent stacks, and developers point their clients at these endpoints voluntarily.

In “Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain” (arXiv:2604.08407, submitted April 9, 2026), researchers from the University of California, Santa Barbara ran the first systematic study of this attack surface. They purchased 28 paid routers from marketplaces such as Taobao, Xianyu and Shopify-hosted storefronts, and collected 400 free routers from GitHub and public communities. The finding, picked up by CybersecurityNews (April 10) and Risky Business (April 15): a meaningful share of these intermediaries were actively malicious.

How it works

The structural problem is a missing integrity guarantee. The router terminates the client’s TLS connection and re-originates a fresh one to the upstream provider, so it has full plaintext access to every in-flight JSON payload. Unlike a classic network man-in-the-middle, no certificate forgery is needed — the developer configured the proxy as their API endpoint. And no major provider cryptographically binds the tool call an agent executes to the model’s actual output. Nothing stands between a malicious router and the request/response stream.

The paper formalises two core attack classes:

Payload injection (AC-1) — the router rewrites a returned tool call, for example swapping a benign installer URL or package name for an attacker-controlled one. Because the tampered JSON stays syntactically valid, it passes schema validation; a single rewritten shell command is enough for code execution on the client.
Secret exfiltration (AC-2) — the router passively harvests API keys and credentials from plaintext traffic without altering any response, so downstream behaviour looks entirely normal.

Two adaptive variants raise the bar for detection: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b), where payloads activate only after a clean history (e.g. 50 prior requests) or only against autonomous “YOLO mode” sessions where tool execution is auto-approved.

Across the 428 routers tested, the team reported 9 actively injecting malicious code (1 paid, 8 free), 2 using adaptive evasion triggers, 17 free routers touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. No exploit payload is reproduced here, and none is needed to grasp the lesson.

Why it matters

The blast radius is larger than a single bad endpoint. Routers form a mesh of proxy nodes, and traffic bounces between them under load, so one compromised node can have other routers relay its malicious commands and reach an organisation’s entire fleet of agents. One weak node can compromise the whole cost-control layer and expose every user behind it.

Two poisoning experiments show that even benign-looking routers get pulled into the same surface. After the team deliberately leaked one researcher-owned OpenAI key on Chinese forums and messaging groups, it generated roughly 100 million GPT-5.4 tokens and surfaced credentials across multiple downstream Codex sessions. A second study deploying weak decoy proxies drew about 40,000 unauthorised access attempts, served roughly 2 billion billed tokens, exposed 99 credentials across 440 Codex sessions — and 401 of those 440 sessions were already running in autonomous YOLO mode, where commands run without per-call confirmation.

This maps directly onto the supply-chain and excessive-agency risks the field already tracks. The dangerous combination is a high-autonomy agent plus an untrusted intermediary plus auto-approved tool execution — see our coverage of the lethal trifecta and Meta’s agents rule of two.

Defenses

The researchers are explicit that no client-side control can fully authenticate the provenance of a returned tool call without provider support. Three mitigations, however, deploy today without provider cooperation:

Fail-closed policy gate. Allow only commands drawn from a local allowlist; the paper reports this blocks shell-rewrite and dependency-injection attacks at a ~1.0% false-positive rate. Caveat: it can be bypassed if an attacker hosts payloads on an already-allowlisted domain, so pair allowlisting with pinned, verified package sources.
Response-side anomaly screening. An IsolationForest model trained on benign tool-call patterns flagged ~89% of payload-injection attempts at a 6.7% false-positive budget. Treat router responses as untrusted input and screen them before any tool acts on them.
Append-only transparency logging. Record full request/response metadata, TLS data and response hashes (~1.26 KB per entry) so you can scope an incident forensically after the fact.

Beyond those, the operational hygiene is straightforward: treat every third-party router as a potential adversary. Prefer direct provider endpoints or self-hosted gateways you control; scope API keys to least privilege and rotate aggressively; never run agents in auto-approve/YOLO mode against an endpoint you don’t operate; and require human confirmation for irreversible actions (package installs, fund transfers, credential use). The authors argue the durable fix is provider-signed response envelopes — a DKIM-for-tool-calls mechanism that cryptographically binds executed tool calls to the upstream model’s real output. Until providers ship that, layered client-side controls are the defence.

Status

Item	Detail
Source	arXiv:2604.08407, UC Santa Barbara
Disclosed	April 9, 2026 (preprint)
Routers tested	28 paid + 400 free (428 total)
Malicious code injection	9 routers (1 paid, 8 free)
Adaptive evasion	2 routers
Credential access	17 routers touched AWS canaries; 1 drained ETH
Provider fix	None yet — no cryptographic client↔upstream integrity
Client-side defenses	Policy gate, anomaly screening, transparency logging

The framing is not “some sketchy proxies exist.” It is that the LLM router is a standing, developer-introduced trust boundary with no integrity guarantee, and that high-autonomy agents make its compromise unusually costly. Choose your intermediaries as carefully as you choose your dependencies — because, functionally, that is what they are.