Termination poisoning: trapping LLM agents in unbounded loops
A May 2026 arXiv paper shows that injected prompts can distort an agent's own 'am I done?' judgment, forcing unbounded computation. The LoopTrap framework reports up to 25x step amplification.
What is this?
On May 7, 2026, the paper LoopTrap: Termination Poisoning Attacks on LLM Agents (arXiv:2605.05846, cs.CR) introduced a class of attack that targets a part of the agent loop most defenses ignore: the moment an agent decides it is finished.
Modern agents run in an iterative loop — reason, act, observe, then self-evaluate whether the task is complete. The paper shows that by injecting text into the agent’s context, an attacker can corrupt that self-evaluation so the agent concludes the task is not done and keeps working. The result is not stolen data or remote code execution; it is unbounded computation — wasted tokens, inflated bills, and tied-up agent capacity. The authors call this Termination Poisoning and report that their automated framework, LoopTrap, drives an average 3.57x step amplification across 8 mainstream agents, with a peak of 25x.
How it works
A typical agent’s stop condition is itself an LLM judgment: after each step the model reads the trajectory so far and answers, in effect, “is the goal satisfied?” Because that judgment consumes the same context window as untrusted tool outputs, retrieved documents, and web pages, it inherits the core weakness behind all prompt injection — there is no boundary between data and control.
Termination poisoning exploits exactly this. Adversarial content placed where the agent will read it — a returned API payload, a file, a web result — is crafted to push the “are we done?” decision toward no. The paper characterizes 10 representative strategies and finds that different agents have distinct behavioral signatures: a trap that loops one agent may have no effect on another. LoopTrap operationalizes this as red-teaming: it first builds a behavioral profile of a target along four vulnerability dimensions through lightweight probing, then synthesizes a target-specific trap, scores candidates, and refines failures.
No payload is reproduced here. The mechanism is the point: the agent’s own completion check is an attack surface, and it can be steered the same way indirect prompt injection steers tool selection.
Normal loop: Poisoned loop:
reason reason
act act
observe observe <- injected "you are not done yet"
done? -> YES (stop) done? -> NO (keep going... and going)
This is the inverse framing of well-known resource attacks. Where tool-chain token drain and guardrail reasoning DoS inflate the cost of a single decision, termination poisoning attacks the number of iterations — and the idea is not new in spirit: Johann Rehberger flagged LLM cost-and-DoS loops back in 2023.
Why it matters
Availability and cost are real security properties for autonomous systems. An agent stuck in a poisoned loop burns metered API spend, holds a worker slot that legitimate jobs need, and can quietly exhaust budgets before anyone notices. At 25x amplification, a single crafted document turns one task into the compute of twenty-five. For fleets of agents processing untrusted inputs — email triage, ticket handling, web research — this is a denial-of-wallet and denial-of-service vector that classic input filters, tuned to catch data exfiltration or command injection, are not looking for.
It also undermines a common safety assumption: that an agent left running will eventually stop. If termination can be externally manipulated, “it will finish on its own” is not a control.
Defenses
The fix is architectural — never let untrusted content arbitrate when the agent halts.
- Enforce hard external limits. Cap iterations, wall-clock time, total tokens and tool calls per task in the orchestrator, outside the model’s judgment. This is the single most effective mitigation; the loop must be bounded by code, not by the LLM’s opinion.
- Make termination a deterministic check where possible. For structured tasks, verify completion against explicit, machine-checkable success criteria rather than asking the model “are you done?”
- Budget and meter per task. Set per-task token/cost ceilings with alerts, and fail closed when a task exceeds its expected envelope — an outlier step count is a strong anomaly signal.
- Separate untrusted data from control reasoning. Keep tool outputs and retrieved text in clearly delimited, lower-trust regions, and avoid feeding raw untrusted content into the completion-evaluation step. This is the same containment logic discussed in machine-speed injection containment.
- Monitor step distributions. Track iterations-per-task across the fleet; sudden amplification on specific inputs flags poisoning attempts for review.
- Red-team the stop condition. Test agents adversarially for loop behavior, not just for data leakage — the lethal trifecta framing should include “ability to consume unbounded resources.”
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| Paper published | arXiv:2605.05846 (v1) | 2026-05-07 | Defines Termination Poisoning; 10 strategies |
| Empirical scope | LoopTrap evaluation | 2026-05-07 | 8 mainstream agents, 60 tasks |
| Reported impact | LoopTrap | 2026-05-07 | 3.57x avg step amplification, 25x peak |
| Prior art (LLM cost/DoS loops) | Embrace The Red | 2023-09-16 | Early warning on infinite-loop cost abuse |
The takeaway is simple and old: an autonomous loop you do not bound is a loop an attacker can bound for you. Termination is a control decision, so treat it like one — enforce it in code, not in a prompt.