system: OPERATIONAL
← back to all hacks
ADVERSARIAL MEDIUM NEW

SilentRetrieval: fluent RAG corpus poisoning that slips past perplexity filters

A May 27, 2026 arXiv preprint introduces a two-stage attack that hides goal-hijacking triggers inside fluent documents, reaching 57% LLM-attack success on Natural Questions and MS MARCO with one poisoned record per query.

2026-05-29 // 6 min affects: rag-pipelines, dense-retrievers, natural-questions-rag, ms-marco-rag, vector-databases

What is this?

On May 27, 2026, a preprint titled SilentRetrieval: Hijacking Retrieval-Augmented Generation via Semantically-Preserving Adversarial Data Poisoning (arXiv 2605.28074) proposes a corpus-poisoning attack that survives the two filters production RAG teams usually rely on: retrieval scoring and perplexity-based anomaly detection. The poisoned documents read like ordinary text, sit naturally in the corpus, and only steer the model toward an attacker-chosen answer once they are retrieved.

The paper extends a line of work that started with PoisonedRAG in 2024 and continued with practical and black-box variants through 2025. The new ingredient is fluency: prior attacks tended to leave perplexity fingerprints a defender could grep on. SilentRetrieval explicitly co-optimises retrievability and language-model likelihood, which is what makes it interesting beyond yet-another-poisoning-result.

How it works

The attack is split in two stages.

Stage 1 — Coordinated Beam Search (CBS). Instead of mutating a host document one token at a time against a retrieval-similarity objective, CBS jointly searches multi-token edits with a combined objective that rewards both semantic similarity to the target query and low perplexity under a reference language model. The output is a host document that is still retrievable for the target query and still reads naturally.

Stage 2 — Context-Adaptive Trigger Generation (CATG). A frozen LLM is then used to fuse a small “trigger” — the manipulation the attacker wants the answering model to follow — into the fluent host content. CATG adapts the wording of the trigger to the surrounding context, so the merged document does not show the abrupt instruction-style breaks that simple injections do.

A useful way to read the design:

# Two filters defenders usually trust — what SilentRetrieval does to each

  Retrieval filter
    "Drop documents whose similarity to recent queries is suspicious."
    → CBS keeps the poisoned doc inside the top-k for the target query
      without optimising it into an obvious outlier.

  Perplexity filter
    "Drop documents whose surface text is statistically weird."
    → CBS is constrained to keep perplexity close to the corpus baseline;
      CATG fuses the trigger so the seam is not visible.

Reported numbers from the paper, in a one-poisoned-document-per-query setting on Natural Questions and MS MARCO: hit-rate at 10 of 84.6% / 81.3% and LLM attack-success rate of 57.5% / 54.8%, while keeping perplexity near the benign baseline. The attacker only needs to land a single fluent document per targeted query.

Why it matters

Three properties separate this from the usual corpus-poisoning headline.

The first is the threat model. SilentRetrieval does not assume access to the retriever weights or the answering model. The attacker only needs to be able to write into the corpus — which, in real deployments, includes any source the RAG ingests automatically: wikis, ticketing systems, public crawls, third-party documentation, customer-uploaded files, vendor knowledge bases. Each of those write paths now carries a non-trivial integrity risk.

The second is the defensive blind spot. Many production RAG stacks rely on a mix of (a) source allow-lists, (b) similarity-based retrieval scoring, and (c) perplexity or “looks-like-prompt-injection” classifiers on the retrieved chunk. SilentRetrieval is constructed to survive (b) and (c) by design. Allow-lists ((a)) only help if every ingestion path is curated — which is rarely true once the system touches user uploads or web data.

The third is the economic angle. One poisoned document per targeted query is enough to lift attack-success rates above 50% on standard benchmarks. That is a small, repeatable, low-noise write — the kind that fits comfortably inside ordinary content contributions.

This places the attack squarely inside OWASP LLM04:2025 — Data and Model Poisoning, and overlaps with LLM08:2025 (Vector and Embedding Weaknesses). It is a research result, not a 0-day against a named product, but it sharpens the question of what a RAG corpus owner can actually trust about its own index.

Defenses

No single control retires this class. The shortlist that holds up as of May 2026:

  1. Treat the RAG corpus as a write boundary, not a read boundary. Authenticate and log every ingestion path. Tag entries with source, ingested_by, ingested_at. The most common production failure is that “the corpus” is in practice a join of a dozen write paths nobody owns.
  2. Score retrieval against provenance, not just similarity. A high-similarity hit from a low-trust source should be down-weighted or held for review before it reaches the generator’s context window.
  3. Defend at the answer step, not just the retrieve step. Approaches like Traceback of Poisoning Attacks to RAG (April 2025, updated through 2026) attribute a generated answer back to specific retrieved documents, so a suspicious answer can identify the source it was led by — useful for incident response and continuous corpus cleaning.
  4. Reduce single-document leverage. Require corroboration across at least two retrieved documents from independent provenance buckets before the generator treats a fact as ground truth. SilentRetrieval’s reported numbers assume one poisoned doc per query; raising the bar to two independently sourced docs raises the attacker’s cost roughly quadratically.
  5. Watch for query-conditioned anomalies. A document that appears in the top-k for an unusually specific or sensitive query, especially one it would not have been a natural answer to a week earlier, is worth flagging — even when its surface text is clean.
  6. Cap downstream blast radius. Generated answers that drive tool calls or user-visible actions should not inherit full trust from the corpus. The same per-tool ACLs and human-in-the-loop confirmations that limit agent abuse limit RAG abuse.

Status

ItemReferenceDateNotes
SilentRetrieval paperarXiv 2605.280742026-05-2784.6%/81.3% HR@10, 57.5%/54.8% ASR-LLM on NQ / MS MARCO
PoisonedRAG (predecessor)arXiv 2402.078672024-02first widely-cited RAG corpus poisoning
Traceback defensearXiv 2504.216682025-04attributes generated answers back to retrieved docs
CategoryOWASP LLM Top 10 (2025)2025LLM04 Data and Model Poisoning + LLM08 Vector and Embedding Weaknesses

The paper is a research result, not a disclosed exploit against a named vendor. Its operational reading does not depend on any one stack: any RAG pipeline that accepts content from a path it does not own has just added an integrity surface that perplexity filtering alone will not catch.

Sources