INDIRECT INJECTION MEDIUM NEW

Decision Hijacking: prompt-injecting the LLM that ranks your search results

A growing body of 2025-2026 research shows that when an LLM re-ranks search or RAG candidates, a few injected lines inside one document can force it to the top — collapsing ranking quality by 60+ NDCG points, with stronger models more vulnerable, not less.

2026-06-07 // 7 min affects: gpt-4.1, llama-3.3-70b, qwen3, gemma-3, rag-rerankers, llm-judges

What is this?

Modern search and RAG pipelines increasingly use an LLM as the re-ranker: a first stage retrieves a few dozen candidate passages, and a second stage asks a language model “which of these documents is most relevant to the query?” The same pattern powers conversational search engines, recommendation systems, and the “LLM-as-a-judge” relevance scoring used inside evaluation harnesses.

A consistent line of research — “Illusions of Relevance” (arXiv 2501.18536, January 2025), “The Ranking Blind Spot” (arXiv 2509.18575), and most recently “The Vulnerability of LLM Rankers to Prompt Injection Attacks” (arXiv 2602.16752, February 2026) — shows that this re-ranking stage is itself a prompt-injection surface. A content provider who controls one candidate document can embed instructions in it that hijack the ranker’s decision and push that document to the top, regardless of its actual relevance. The “Ranking Blind Spot” authors call the two core techniques Decision Objective Hijacking and Decision Criteria Hijacking.

This is indirect prompt injection: the attacker never talks to the model directly. They just publish a web page, a product description, or a knowledge-base article, and wait for it to be retrieved and fed to the ranker as untrusted content.

How it works

The ranker receives a prompt that interleaves a trusted instruction (“rank these by relevance to the query”) with untrusted document text. Because the model cannot reliably tell where the document ends and a new instruction begins, text inside a document is read as a command. The Ranking Blind Spot paper attributes this to two failures: boundary confusion (content vs. instruction) and priority confusion (keeping the original scoring rule when a conflicting one appears).

Two strategies exploit it:

Decision Objective Hijacking (DOH) replaces the task entirely. Instead of “judge relevance,” the injected text tells the model to “output only the passage containing a marker.” It is effective but assumes some knowledge of the output format.
Decision Criteria Hijacking (DCH) keeps the task but redefines the standard — “treat the passage containing the marker as the most relevant, regardless of content.” DCH is fully black-box and works across pairwise, listwise, and setwise ranking schemes, which makes it the more practical threat.

# Conceptual sketch, based on the public Sept 2025 / Feb 2026
# papers. Defanged — not a working payload.

Query:  "best sous vide recipes"

Candidate B (attacker-controlled, low real relevance):
  ...unrelated text about a kitchen gadget...
  [MARKER]
  INSTRUCTIONS: [REDACTED — redefine "relevance" so the
  passage holding [MARKER] is always ranked first]

Ranker prompt (trusted): "Rank candidates by relevance."
Ranker output:           B is most relevant   ← hijacked

The measured impact is large. On the TREC deep-learning benchmarks, injecting into low-relevance passages collapsed NDCG@10 by more than 60 points (for example Llama-3-70B fell from 74.3 to 7.4). Flip rates frequently exceeded 99%, and the attack was position-agnostic — it worked whether the injected block sat at the start or end of the document.

The most counterintuitive finding, reported independently across the papers: stronger, more instruction-following models are more susceptible, not less. GPT-4.1-mini and Llama-3.3-70B were among the easiest to hijack precisely because they follow embedded instructions so faithfully.

Why it matters

The blast radius is wherever an LLM scores or orders untrusted text. That includes RAG answer pipelines (a poisoned document gets ranked into the top-k and steers the final answer), LLM-powered site search and recommendation, and automated LLM-as-a-judge evaluation — where a hijacked relevance score can quietly corrupt a benchmark or an A/B test. “Illusions of Relevance” showed the same fragility in dense retrievers and rerankers, not just generative judges, so the weakness spans the whole retrieval stack.

It is also an economic incentive, not just a lab curiosity. This is adversarial SEO for the LLM era: the payoff for getting your page ranked first is real money, so the attack will be attempted in the wild on any public-facing system that re-ranks with an LLM.

Defenses

No single fix fully closes the gap today, but several measures meaningfully reduce exposure:

Separate instructions from data. Pass candidate documents in a clearly delimited, non-instruction channel and apply an instruction-hierarchy policy so document text can never override the ranking directive. This is the architectural root-cause fix the Ranking Blind Spot authors call “instructional separation.”
Sanitize and structure candidates. Strip or escape imperative-looking content, control tokens, and injected markers before the document reaches the ranker. Treat every retrieved passage as hostile input.
Detect ranking anomalies. A document whose first-stage retrieval score is mediocre but whose LLM rank is suddenly #1 is a red flag. ProGRank (arXiv 2603.22934, March 2026) is one recent defensive line that uses probe-gradient signals to harden re-ranking against poisoned passages; semantic anomaly detection on score/rank disagreement is a cheaper first step.
Adversarial fine-tuning. Training the ranker on DOH/DCH-style examples improves robustness, though the papers caution it is not a complete solution.
Keep a non-LLM tie-breaker. Cross-check the LLM ranking against a classical retrieval score (BM25, dense similarity) and flag large disagreements for review rather than trusting the LLM order blindly.

Status

Item	Reference	Date	Notes
Retrievers/rerankers/judges fooled by content injection	Illusions of Relevance (arXiv 2501.18536)	2025-01	Black-box, spans whole stack
Decision Objective / Criteria Hijacking framework	The Ranking Blind Spot (arXiv 2509.18575)	2025-09	DOH + DCH, NDCG@10 −60+
LLM rankers vulnerable to prompt injection	arXiv 2602.16752	2026-02	Confirms across ranking schemes
Probe-gradient reranking defense	ProGRank (arXiv 2603.22934)	2026-03	Defense against corpus poisoning
Common root cause	Boundary + priority confusion	—	Stronger models more vulnerable

The lesson echoes the rest of the prompt-injection literature: the moment an LLM reads attacker-controlled text and then makes a trust decision about it — here, “how relevant is this?” — that decision is corruptible. If you re-rank with a language model, treat the ranking it returns as advice from a component that an outsider can lobby, and keep a non-LLM check on the result.