ARGUS: a provenance-graph defense for context-aware prompt injection
Published May 5, 2026, the ARGUS paper introduces influence-provenance auditing for LLM agents — dropping attack success from 28.8% to 3.8% on a new context-aware injection benchmark.
What is this?
On May 5, 2026, Shihao Weng and colleagues posted ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection on arXiv (2605.03378). The paper makes two contributions worth a closer read: a benchmark, AgentLure, that captures injection attacks tailored to runtime context, and a defense, ARGUS, built around an influence-provenance graph over the agent’s state. On AgentLure, ARGUS brings attack success rate from a 28.8% baseline down to 3.8%, while preserving 87.5% of clean task utility at 1.24× token overhead. Against an adaptive white-box adversary that knows ARGUS’s architecture and prompts, success rises only to 5.9%.
A complementary survey published the next day — A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents (arXiv:2604.23338v2, May 6, 2026) — independently confirms the framing: the upper layers of the agentic stack, including memory, tool execution and multi-agent coordination, remain sharply under-defended.
How it works
Naive prompt injection is mostly a template — “Ignore previous instructions and …” — bolted onto otherwise normal content. AgentLure’s contribution is to specify a stricter threat model. A context-aware payload is “tightly coupled with the runtime content the agent must consume” and is “written to be semantically indistinguishable from the legitimate data around it”. Concrete example from the paper: an agent is asked to pay my electricity bill. The retrieved invoice document carries valid payee and amount fields, but also an “invoice note” describing an additional processing fee — a single anomalous span inside an otherwise authentic carrier. A defense like Tool Filter authorizes the payment tool from the user’s prompt and never sees that the extra transfer is grounded only in the anomalous span.
AgentLure spans four agentic domains — Banking, Travel, Workspace, Slack — with eight attack vectors: Capability Routing Hijacking, Argument Tampering, Conditional Flow Hijacking, Reasoning Hijacking, Persistent Context Poisoning, Inter-Agent Contagion, Skill Injection, and Workflow Hijacking. Each is instantiated across six surfaces: tool documentation, tool returns, retrieved documents, memory entries, installed skills, and inter-agent messages.
ARGUS audits at the span level, not the observation level. Conceptually:
# Influence-provenance graph (sketch — illustrative, not exploit code)
nodes = system_prompt, user_query, tool_docs, tool_returns, rag_docs,
memory, skills, agent_msgs
edges = u → v when content of u influenced node v
trust = base_trust(node.type) × max(eta, fraction_benign_chars)
# base: system/user=1.0, skill=0.6, tool_doc/return=0.5,
# rag/memory=0.4, agent_msg=0.3 ; eta=0.1
audit(tool_call):
if read_only(tool_call):
execute(); append_returns_to_graph(); return
ContentSegmenter.label_spans(unsegmented_nodes)
grounding = ArgumentGrounder.map(args → spans) # copy/normalize/derive/resolve/ungrounded
if not InvariantChecker.holds(tool_call, user_query): block
if not EntailmentVerifier.entailed_by_benign(tool_call):
if EntailmentVerifier.shaped_by_anomalous(tool_call): block
return execute()
Four cooperating tools do the work: a ContentSegmenter that labels spans benign or anomalous; an ArgumentGrounder that maps each tool-call argument back to the spans that justify it (copy / normalize / derive / resolve / ungrounded); an InvariantChecker that validates the action against two or three task-level invariants extracted from the user query at session start; and an EntailmentVerifier that asks both whether the action is entailed by benign evidence and whether anomalous content could plausibly have shaped it. A block triggers a Hints module that returns benign candidate values to the agent so it can retry. The retry is re-audited.
Why it matters
Most current production defenses sit at the wrong granularity. Tool Filter operates on user prompts. Drift and Melon look at observation-level trust. Instruction-hierarchy training pushes the model to prefer system prompts over tool returns. None of them ask: which exact characters in this retrieved document justify this exact argument of this exact tool call? The AgentLure results suggest that without that question, context-aware injections — the realistic kind — slip through.
The ARGUS numbers also matter as a baseline. Token overhead of 1.24× is in striking distance of production budgets, and 87.5% utility means the defense does not cripple agents on legitimate work. The ablation in §5 (not fully captured in the HTML preprint) reports that all four sub-tools are individually necessary. That is interesting because it implies there is no single magic check — the defense is structural.
Defenses
The paper is itself a defense. The practical takeaways for teams shipping LLM agents today, even before ARGUS-style tooling lands in libraries:
- Track provenance per argument, not per observation. When an argument is
derived, log which spans contributed and what type of grounding (copy, normalize, derive, resolve, ungrounded). Ungrounded arguments to state-changing tools are a high-signal anomaly. - Extract task invariants at session start and re-check them before any irreversible action. Many context-aware attacks survive observation-level filters but violate an invariant such as “the payee must be the one in the user query”.
- Default-deny on state-changing tools. Read-only calls feed the graph cheaply. The cost of auditing should fall on the actions that touch the world.
- Treat memory and inter-agent messages as low-trust by construction. ARGUS scores them 0.4 and 0.3 respectively. The survey of 116 papers (
2604.23338) confirms long-horizon memory poisoning and inter-agent contagion are the most under-defended classes. - Test against context-aware benchmarks, not just template injections. If your evaluation suite still relies on “Ignore previous instructions”, it is measuring 2023 attacks.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| ARGUS paper (v1) | arXiv:2605.03378 | 2026-05-05 | 19 pages, CC-BY-4.0 |
| AgentLure benchmark | §3 of the paper | 2026-05-05 | 4 domains × 10 tasks × 8 vectors = 320 samples |
| LASM survey (v2) | arXiv:2604.23338 | 2026-05-06 | 7-layer × 4-timescale framework over 116 papers |
| Public implementation | Not yet released | — | Authors release per-paper coding; ARGUS code TBD |
The deeper takeaway: prompt injection defenses are finally moving from “filter the input” to “audit the decision”. That is the right level of abstraction — and it is the level where the agent actually commits to acting on the world.