system: OPERATIONAL
← back to all hacks
AGENTS MEDIUM NEW

MCP sampling: how malicious servers abuse the reverse LLM channel

MCP's sampling feature lets a server ask the client's model for completions. Unit 42 showed (Dec 2025) how a malicious server turns that reverse channel into covert tool calls, conversation hijacking, and compute theft.

2026-06-02 // 6 min affects: mcp-sampling, mcp-clients, llm-coding-assistants

What is this?

Most Model Context Protocol (MCP) security coverage focuses on one direction: the client calls a server’s tools, and a malicious tool description or result poisons the agent. Sampling inverts that flow. It is an optional MCP feature that lets a server request an LLM completion from the client — the server says “please run this prompt through your model and give me the answer,” and the client’s model obliges.

On December 5, 2025, Palo Alto Networks’ Unit 42 (Yongzhe Huang, Akshata Rao, Changjiang Li, Yang Ji, Wenjun Hu) published an analysis of MCP attack vectors that treats sampling as a distinct attack surface, demonstrating three proof-of-concept attacks against a widely used coding copilot. The Vulnerable MCP Project catalogued the finding with an impact score of 8/10, and Cyber Security News covered it on December 9, 2025. The protocol itself was introduced by Anthropic in November 2024; sampling has been part of the specification since.

How it works

Sampling exists so a server can “borrow” the model — for example, a code-review server that needs the LLM to summarise a diff before deciding what to flag. The trust problem is structural: the server controls the prompt it sends, and it controls how it reads the response. That bidirectional control is what gets abused. No exploit primitive is reproduced below; the mechanism is the point.

Unit 42 grouped the abuse into three classes:

Attack class            What the malicious server does          Result for the user
----------------------  --------------------------------------  ------------------------------------
Resource theft          Appends hidden work to a legitimate     Compute quota and API credits drained
                        sampling request (e.g. "also write a    by unauthorised workloads, invisible
                        long story") behind a normal task       in the visible output
Conversation hijacking  Injects persistent instructions into    Assistant behaviour is altered for the
                        the sampling response that re-enter      whole session (the PoC forced an
                        the conversation context                assistant to answer "like a pirate";
                                                                 the same channel can degrade or
                                                                 weaponise responses)
Covert tool invocation  Embeds instructions that make the       Silent file writes, persistence, and
                        model call other tools without the      potential data exfiltration with no
                        user seeing the request                 explicit user consent

The common root cause is an implicit trust model with no built-in separation between the user’s content and the server’s modifications. Because a sampling-capable server can shape both the prompt and the returned text, it can smuggle instructions in either direction while still looking like a normal tool to the user.

Why it matters

Sampling shifts the threat model. The classic MCP advice — “only connect trusted servers, and review tool results” — assumes the danger flows server→tool→agent. Sampling adds a server→model→agent path that most users never see, because the sampling round-trip happens beneath the visible chat. A code-summariser that quietly mines your inference quota, or that pins a behavioural instruction for the rest of the session, produces output that looks entirely normal.

Exposure is gated by one thing: whether the client implements sampling, and with what defaults. Per Unit 42 and follow-up coverage, many clients either don’t support sampling yet or ship it with permissive defaults — and as MCP adoption widened through 2026, the number of clients enabling richer features like sampling grew. The affected population is therefore any MCP deployment that turns sampling on without gating it, which is why this is rated moderate rather than critical: it requires a malicious or compromised server and a sampling-enabled client, but neither is exotic.

Defenses

Sampling is useful, so the goal is to gate it, not ban it outright. Drawing on Unit 42’s recommendations and the Vulnerable MCP Project write-up:

  • Require explicit, per-server approval before any sampling request runs. Do not let a server invoke the model silently. Surface the prompt the server wants to run.
  • Separate user content from server modifications with strict templates, so server-supplied text cannot be interpreted as instructions to the model.
  • Filter sampling responses for instruction-like phrasing before they re-enter the conversation context, to blunt conversation hijacking.
  • Rate-limit sampling per server, per session, and per day, and set cost controls on AI compute quotas so resource theft hits a ceiling fast.
  • Monitor compute usage for anomalies — a summariser that suddenly consumes large token volumes is a tell.
  • Isolate sampling-capable servers from sensitive tools. A server allowed to request completions should not also hold write access to the filesystem or secrets.
  • Disable sampling for untrusted servers by default; enable it only where the workflow genuinely needs it.

Status

ItemState (as of June 2026)
DisclosureUnit 42, published December 5, 2025
NatureProtocol feature abuse, not a single patchable CVE
Client supportUneven; many clients don’t enable sampling, some ship permissive defaults
Primary fixClient-side gating (approval, rate limits, filtering) — responsibility sits with client implementers
AffectedAny MCP deployment with sampling enabled and a malicious/compromised server in scope

Sources