ADVERSARIAL MEDIUM NEW

M3Att: query-agnostic knowledge poisoning of medical multimodal RAG

A May 2026 paper poisons medical image-text RAG without knowing user queries in advance. Imperceptible image perturbations hijack retrieval; ambiguity-guided text evades the model's self-correction — and pre-filter defenses barely dent it.

2026-06-17 // 6 min affects: medical-multimodal-rag, lvlm-rag, clinical-decision-support, vision-language-models

What is this?

On May 11, 2026, researchers from Tsinghua University, Beijing University of Posts and Telecommunications, Northwestern Polytechnical University and ETH Zurich posted M3Att (arXiv:2605.10253), a knowledge-poisoning framework aimed at medical multimodal Retrieval-Augmented Generation — the pipelines that pair medical images (X-ray, CT, MRI) with text and feed retrieved evidence to a large vision-language model (LVLM) for report generation or clinical question answering.

The contribution that matters for defenders is the threat model, not a new payload. Earlier medical-RAG poisoning work assumed the attacker already knew the victim’s future queries and could optimise poisoned entries against them — an assumption that rarely holds in production. M3Att drops it. It assumes only limited knowledge of the knowledge base’s distribution, which the authors note can be estimated through ordinary black-box interaction with the RAG system. This makes the attack a realistic red-teaming benchmark rather than a lab-only curiosity.

How it works

M3Att splits the problem across the two stages of a RAG pipeline — retrieval and generation — and is described here at a conceptual level only; no operational parameters or payloads are reproduced.

Stage         Benign RAG                     What M3Att targets
------------  -----------------------------  ----------------------------------------
Retrieval     Embed query image+text,        Make a poisoned entry get retrieved for
              pull top-k nearest entries     queries the attacker has never seen
Generation    LVLM reads retrieved           Make the poisoned text survive the
              evidence, writes diagnosis     model's own medical knowledge

The first mechanism, distribution-guided retrieval hijacking, exploits a property of medical imaging: scans of the same body region cluster very tightly in embedding space. The attack models that distribution, picks proxy targets, and applies imperceptible perturbations to the poisoned entry’s image so it acts as a query-agnostic trigger — surfacing in the retrieved set for a wide range of unseen queries without changing the image’s clinical appearance.

The second mechanism, clinical ambiguity-guided poisoning, addresses a defence that practitioners often assume protects them: a well-trained medical LVLM will simply correct obvious falsehoods. M3Att sidesteps this by injecting misinformation into the low-confidence, genuinely ambiguous regions of clinical reasoning — for instance hedged “cannot rule out malignancy”-style phrasing that pushes the model toward a false-positive posture. Because the injected claim is plausible rather than flatly wrong, the model does not self-correct, and the output is “clinically plausible yet incorrect.”

Across five LVLMs and five datasets, retrieval hijacking success climbs toward ~100% at a poison rate around 0.08, with meaningful gains even at low poisoning budgets.

Why it matters

This is a data-integrity attack on the knowledge base, not a prompt-injection trick, so the usual input/output guardrails do not see it. The poisoned content is already inside the trusted corpus by the time a query arrives.

The healthcare framing makes the impact concrete: a contaminated RAG store can steer a diagnosis or a treatment suggestion toward a wrong-but-believable conclusion, and the “ambiguity-guided” design specifically defeats the intuition that the model’s own training will catch bad evidence. In MITRE ATT&CK terms this sits closest to a supply-chain/data-staging concern — the corruption happens upstream of the agent’s reasoning, where most monitoring is weakest. Any organisation that ingests external or community-contributed medical knowledge into a retrieval store should treat that store as an attack surface in its own right.

Defenses

The paper’s most useful finding for blue teams is which defences failed: three pre-retrieval corpus filters — image clustering, text clustering, and image-text cross-modal consistency — left the retrieval success rate “largely unchanged,” and stronger retrieval-time checks (perplexity filtering, anomaly detection, score-based pruning on CLIP retrievers) did not reliably stop it either. Simple distributional heuristics are not enough. Practical hardening:

Govern the corpus like code. Restrict write access to the knowledge base, require provenance and signing for every image-text pair, and review external or community contributions before ingestion rather than after.
Corroborate before trusting. For high-stakes outputs, require agreement across multiple independent retrieved sources, and flag diagnoses that rest on a single retrieved entry.
Keep a human in the loop for clinical decisions. Treat RAG output as decision support, never as authority; ensure a qualified clinician reviews the evidence chain, especially where the model hedges toward a serious finding.
Watch for distribution drift and over-representation. Monitor which entries are retrieved disproportionately often across unrelated queries — a query-agnostic trigger shows up as an entry that is “always relevant.”
Red-team your own store. Use frameworks like M3Att (code is public) against a staging copy to measure your real exposure before an adversary does.

Status

Item	Reference	Date	Notes
M3Att paper	arXiv:2605.10253v1 [cs.CR]	2026-05-11	Tsinghua, BUPT, NWPU, ETH Zurich
Code	github.com/ypr17/M3Att	2026-05	Public, for red-teaming
Scope	5 LVLMs × 5 datasets, 4 medical tasks	—	Report generation, medical QA
Defenses tested	Pre-retrieval + retrieval-time filters	—	Retrieval ASR “largely unchanged”

This is a research red-teaming result, not a disclosed product vulnerability — there is no patch to apply. The takeaway is architectural: in medical (and other high-stakes) RAG, the integrity of the retrieval corpus is a first-class security property, and the model’s own domain knowledge is not a dependable backstop against plausible misinformation.

Note: this article covers AI security research on a sensitive (healthcare) topic for defensive purposes. It is not medical advice.