PI-Hunter: auditing agents to expose and localize hidden prompt injections
A June 2026 paper from Google researchers reframes prompt-injection red-teaming as auditing — PI-Hunter evolves source-aware test cases to surface where latent injections enter and propagate through an agent, not just whether an attack lands.
What is this?
PI-Hunter is an automated agentic auditing framework for indirect prompt injection in LLM agents, published on arXiv (2606.12737) on June 10, 2026 by Pengfei He, Lesly Miculicich, Vishesh Sharma, Ash Fox, George Lee, Jiliang Tang, Tomas Pfister and Long T. Le. Its purpose is defensive: it helps developers find where an agent is exposed before an attacker does.
The framing is the contribution. As LLMs become agentic — reading documents, calling tools, browsing — untrusted external content becomes an injection channel. The authors argue that existing defenses mostly block malicious content at inference time, and that existing red-teaming mostly optimizes a single number, attack success rate. Neither tells a developer how a latent instruction entered the agent or where it propagated. PI-Hunter is built to answer those two questions.
How it works
PI-Hunter treats auditing as an iterative, agent-driven search rather than a fixed payload list. According to the paper, it constructs realistic, source-aware test cases — injections placed in the kinds of external sources an agent actually consumes (retrieved documents, tool outputs, web content) — and then evolves them through feedback-driven exploration, refining cases based on how the target agent responds.
The goal of that loop is to induce the agent to retrieve and reveal latent malicious instructions hidden in its environment, exposing the failure even when a naive single-shot attack would not trigger it. Crucially, PI-Hunter aims to localize the vulnerability — identifying the point where the injection emerges and the path along which it propagates through the agent’s reasoning and tool calls — rather than only reporting a pass/fail.
This auditing posture connects PI-Hunter to two adjacent lines of work. PromptLocate (arXiv:2510.12252) focused on localizing which retrieved segment carried an injection; PISmith (arXiv:2603.13026) showed that adaptive red-teaming keeps defeating static defenses. PI-Hunter combines the spirit of both — adaptive, evolving test generation aimed at producing actionable localization for defenders.
No exploit payload is reproduced here, and none is needed to understand the method: it is an auditing recipe, not a specific attack string.
Why it matters
The reported result, across multiple benchmarks, agent architectures, attacks and defenses, is that PI-Hunter substantially improves vulnerability exposure compared with prior red-teaming. For a defender, “exposure” is the useful currency: a test that only says “the agent was injected” leaves you guessing, while one that points to the entry source and propagation path tells you what to fix.
This matters because indirect prompt injection remains the dominant unsolved risk for agents — OWASP’s 2026 agentic guidance maps it to a majority of its top categories, and there is still no reliable model-side fix. In that environment, the practical defense is not a single guardrail but continuous, adaptive auditing wired into pre-deployment evaluation. PI-Hunter is an argument that red-teaming should be measured by what it reveals and locates, not just by how often it wins.
The realistic caveat: an auditing tool finds exposure, it does not patch it. Localization is only valuable if teams act on it — segmenting tool outputs, constraining agent actions, and re-auditing after every change.
Defenses
PI-Hunter is itself a defensive tool, but auditing only pays off when paired with structural mitigations. Concretely:
- Audit continuously and adaptively. Treat injection testing as a recurring pre-deployment and post-change gate, using source-aware, evolving test cases rather than a frozen payload list. A static benchmark a vendor “passed” says little about adaptive robustness.
- Localize, then fix the source. When an audit surfaces an injection, trace it to the entry channel (a specific retrieved document, tool response, or memory entry) and harden that boundary — sanitize, quarantine, or strip instructions from untrusted content.
- Constrain the blast radius. Apply least-privilege to tools, require confirmation for high-impact actions, and break the “lethal trifecta” (untrusted input + private data + exfiltration channel) so a successful injection cannot act freely.
- Treat tool and retrieval output as untrusted data, never instructions. Keep a strict separation between control and content in the agent’s context.
- Monitor propagation, not just inputs. Watch what the agent writes to memory and how injected instructions move across reasoning steps and tool calls — the propagation path PI-Hunter is designed to reveal.
Status
| Item | Detail |
|---|---|
| Paper | PI-Hunter, arXiv:2606.12737 |
| Published | June 10, 2026 |
| Type | Defensive auditing / red-teaming framework |
| Target | Indirect prompt injection in LLM agents |
| Reported result | Substantially improved vulnerability exposure across benchmarks, architectures, attacks and defenses |
| Status of root cause | Indirect prompt injection has no reliable model-side fix as of mid-2026 |
FAQ
What is PI-Hunter?
PI-Hunter is an automated auditing framework, described in arXiv paper 2606.12737 (June 10, 2026), that probes LLM agents for indirect prompt injection vulnerabilities. Instead of only measuring attack success, it builds realistic source-aware test cases and evolves them to expose and localize where injections enter and propagate through an agent.
How is PI-Hunter different from a normal prompt-injection attack?
A normal attack tries to make one payload succeed. PI-Hunter is defensive: it iteratively generates and refines test cases to reveal latent vulnerabilities and pinpoint the source and propagation path, giving developers actionable information about what to fix rather than a single success/failure score.
Does PI-Hunter fix prompt injection?
No. PI-Hunter exposes and localizes vulnerabilities; it does not patch them. As of mid-2026 there is no reliable model-side fix for indirect prompt injection, so teams must pair auditing with structural mitigations such as least-privilege tools, untrusted-content sanitization, and breaking the lethal trifecta.
What is indirect prompt injection?
Indirect prompt injection is an attack where malicious instructions are hidden inside content an agent consumes from an external source — a retrieved document, a tool response, a web page — rather than typed directly by the user. When the agent reads that content, the hidden instructions can hijack its behavior.
Who created PI-Hunter?
The paper lists Pengfei He, Lesly Miculicich, Vishesh Sharma, Ash Fox, George Lee, Jiliang Tang, Tomas Pfister and Long T. Le as authors, posted to arXiv on June 10, 2026.