M3Att: query-agnostic knowledge poisoning of medical multimodal RAG
A May 2026 paper poisons medical image-text RAG without knowing user queries in advance. Imperceptible image perturbations hijack retrieval; ambiguity-guided text evades the model's self-correction — and pre-filter defenses barely dent it.
What is this?
On May 11, 2026, researchers from Tsinghua University, Beijing University of Posts and Telecommunications, Northwestern Polytechnical University and ETH Zurich posted M3Att (arXiv:2605.10253), a knowledge-poisoning framework aimed at medical multimodal Retrieval-Augmented Generation — the pipelines that pair medical images (X-ray, CT, MRI) with text and feed retrieved evidence to a large vision-language model (LVLM) for report generation or clinical question answering.
The contribution that matters for defenders is the threat model, not a new payload. Earlier medical-RAG poisoning work assumed the attacker already knew the victim’s future queries and could optimise poisoned entries against them — an assumption that rarely holds in production. M3Att drops it. It assumes only limited knowledge of the knowledge base’s distribution, which the authors note can be estimated through ordinary black-box interaction with the RAG system. This makes the attack a realistic red-teaming benchmark rather than a lab-only curiosity.
How it works
M3Att splits the problem across the two stages of a RAG pipeline — retrieval and generation — and is described here at a conceptual level only; no operational parameters or payloads are reproduced.
Stage Benign RAG What M3Att targets
------------ ----------------------------- ----------------------------------------
Retrieval Embed query image+text, Make a poisoned entry get retrieved for
pull top-k nearest entries queries the attacker has never seen
Generation LVLM reads retrieved Make the poisoned text survive the
evidence, writes diagnosis model's own medical knowledge
The first mechanism, distribution-guided retrieval hijacking, exploits a property of medical imaging: scans of the same body region cluster very tightly in embedding space. The attack models that distribution, picks proxy targets, and applies imperceptible perturbations to the poisoned entry’s image so it acts as a query-agnostic trigger — surfacing in the retrieved set for a wide range of unseen queries without changing the image’s clinical appearance.
The second mechanism, clinical ambiguity-guided poisoning, addresses a defence that practitioners often assume protects them: a well-trained medical LVLM will simply correct obvious falsehoods. M3Att sidesteps this by injecting misinformation into the low-confidence, genuinely ambiguous regions of clinical reasoning — for instance hedged “cannot rule out malignancy”-style phrasing that pushes the model toward a false-positive posture. Because the injected claim is plausible rather than flatly wrong, the model does not self-correct, and the output is “clinically plausible yet incorrect.”
Across five LVLMs and five datasets, retrieval hijacking success climbs toward ~100% at a poison rate around 0.08, with meaningful gains even at low poisoning budgets.
Why it matters
This is a data-integrity attack on the knowledge base, not a prompt-injection trick, so the usual input/output guardrails do not see it. The poisoned content is already inside the trusted corpus by the time a query arrives.
The healthcare framing makes the impact concrete: a contaminated RAG store can steer a diagnosis or a treatment suggestion toward a wrong-but-believable conclusion, and the “ambiguity-guided” design specifically defeats the intuition that the model’s own training will catch bad evidence. In MITRE ATT&CK terms this sits closest to a supply-chain/data-staging concern — the corruption happens upstream of the agent’s reasoning, where most monitoring is weakest. Any organisation that ingests external or community-contributed medical knowledge into a retrieval store should treat that store as an attack surface in its own right.
Defenses
The paper’s most useful finding for blue teams is which defences failed: three pre-retrieval corpus filters — image clustering, text clustering, and image-text cross-modal consistency — left the retrieval success rate “largely unchanged,” and stronger retrieval-time checks (perplexity filtering, anomaly detection, score-based pruning on CLIP retrievers) did not reliably stop it either. Simple distributional heuristics are not enough. Practical hardening:
- Govern the corpus like code. Restrict write access to the knowledge base, require provenance and signing for every image-text pair, and review external or community contributions before ingestion rather than after.
- Corroborate before trusting. For high-stakes outputs, require agreement across multiple independent retrieved sources, and flag diagnoses that rest on a single retrieved entry.
- Keep a human in the loop for clinical decisions. Treat RAG output as decision support, never as authority; ensure a qualified clinician reviews the evidence chain, especially where the model hedges toward a serious finding.
- Watch for distribution drift and over-representation. Monitor which entries are retrieved disproportionately often across unrelated queries — a query-agnostic trigger shows up as an entry that is “always relevant.”
- Red-team your own store. Use frameworks like M3Att (code is public) against a staging copy to measure your real exposure before an adversary does.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| M3Att paper | arXiv:2605.10253v1 [cs.CR] | 2026-05-11 | Tsinghua, BUPT, NWPU, ETH Zurich |
| Code | github.com/ypr17/M3Att | 2026-05 | Public, for red-teaming |
| Scope | 5 LVLMs × 5 datasets, 4 medical tasks | — | Report generation, medical QA |
| Defenses tested | Pre-retrieval + retrieval-time filters | — | Retrieval ASR “largely unchanged” |
This is a research red-teaming result, not a disclosed product vulnerability — there is no patch to apply. The takeaway is architectural: in medical (and other high-stakes) RAG, the integrity of the retrieval corpus is a first-class security property, and the model’s own domain knowledge is not a dependable backstop against plausible misinformation.
Note: this article covers AI security research on a sensitive (healthcare) topic for defensive purposes. It is not medical advice.