system: OPERATIONAL
← back to all hacks
AGENTS MEDIUM

Poison once, exploit forever: persistent memory poisoning of LLM agents (OWASP ASI06)

An April 2026 arXiv paper on cross-site memory poisoning and a May 13, 2026 OWASP post on the Cisco MemoryTrap finding against Claude Code converge on the same lesson: agent memory is a trust boundary.

2026-05-26 // 7 min affects: claude-code, openclaw, chatgpt-atlas, perplexity-comet, gpt-5-mini, gpt-5.2, gpt-oss-120b, langchain-agents, llamaindex-agents, crewai

What is this?

Three publications in April and May 2026 have, between them, turned agent memory poisoning from a theoretical concern into a documented attack class with a CVE-equivalent label. The OWASP Gen AI Security Project formalised it as ASI06: Memory & Context Poisoning in its 2026 Top 10 for Agentic Applications. On May 13, 2026 Idan Habler, the ASI06 entry co-lead at Cisco, published Memory Is a Feature. It Is Also an Attack Surface on the OWASP blog, framing the category around a concrete Cisco finding — MemoryTrap — against Claude Code, disclosed on April 1, 2026 and patched by Anthropic in Claude Code v2.1.50. Two days later, on April 3, 2026 (revised April 7), Wei Zou and colleagues at Amazon AWS posted Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents on arXiv (2604.02623, cs.CR), demonstrating the same class against agentic browsers — OpenClaw, ChatGPT Atlas, Perplexity Comet — without any direct memory access.

The unifying claim across the three sources is simple: an agent that retains context is an agent whose past has become part of its current control plane. Once an attacker writes into that past, every future session inherits the compromise. The 2026 research wave shows this is not a corner case.

How it works

Memory in current LLM agents lives in several places that all look benign from a developer’s seat — a memory.json file, a vector store of past trajectories, a CLAUDE.md or SKILL.md, a global hooks configuration, a summarised conversation cache. The attacker’s goal is to land attacker-controlled tokens into any of those surfaces in a way the runtime will treat as trusted on a later turn.

Three recent attack patterns illustrate the surface.

MemoryTrap (Cisco, April 1, 2026). A user clones a repository and asks Claude Code to set it up. Claude proactively offers to install required npm packages; the malicious postinstall script writes a payload into the user’s persistent memory file, into the global hooks configuration, and — pre-patch — into a layer the agent loaded directly into its system prompt. The next session, in a different project, the agent still trusts that text as if Anthropic itself had shipped it. Anthropic’s fix in Claude Code v2.1.50 removed user memories from the system prompt, closing the highest-trust path; the broader pattern remains.

eTAMP / Poison Once, Exploit Forever (Amazon AWS, April 3, 2026). Stronger threat model: the attacker has no memory access at all. They modify a single observation in the environment — a product page, a forum thread, a fake error message — that the web agent merely views. The agent stores that trajectory as a useful memory of “how this kind of task goes” and, on a different website in a future session, retrieves it. Attack success rates reach 32.5% on GPT-5-mini, 23.4% on GPT-5.2, 19.5% on GPT-OSS-120B on (Visual)WebArena. A secondary finding — Frustration Exploitation — multiplies ASR up to 8x when the agent is already struggling with dropped clicks or garbled UI. More capable models are not safer.

MINJA (arXiv 2503.03704, March 2025). The earlier seminal result, included here because it remains the cleanest demonstration: a regular user, through queries alone, drives an agent to write a bridging memory that connects benign future queries to attacker-chosen reasoning steps. Reported 98.2% injection success, 76.8% downstream attack success under the paper’s threat model.

Layer                Where it lives                  Trust assumed by runtime
-------------------  ------------------------------  ----------------------------
Persistent memory    memory.json, vector store       Treated as past first-party
System-prompt        CLAUDE.md / user memories       Loaded high-authority
Hooks / config       global hooks, shell profiles    Executed silently
Retrieved context    RAG store, summaries            Mixed into next prompt

Why it matters

Three properties make this class harder than classical prompt injection.

First, persistence. A standard injection dies with the session. A memory injection lives until someone manually cleans the store — and there is rarely a UI for that. The eTAMP paper shows the activation can happen on a different website days later, which is exactly the kind of cross-context leakage permission-based defences were never designed to catch.

Second, trust laundering. The runtime cannot easily tell attacker-written memory from user-written memory. Both arrive through the same write path; both look like first-party context on read. This is the structural complaint behind ASI06: agentic stacks already separate developer prompts from user input, but they have no equivalent separation for memory writes.

Third, the attack surface scales with capability. Memory is shipping as a feature in every major agent product — Claude Code, ChatGPT memories, the new “browser agents” (OpenClaw, ChatGPT Atlas, Perplexity Comet) studied in the eTAMP paper. Every new memory surface is a new line in the trust boundary that defenders may not know exists.

Defenses

ASI06 is recent enough that no single fix retires it. The shortest defensible list, drawn from the OWASP entry and the three papers above:

  1. Treat memory writes as untrusted by default. The trust decision should not happen at write time; it should happen at read time, with the runtime able to mark the provenance of each memory entry (user-issued, tool-output, environment-observed) and a policy that decides what can graduate to high-authority context.
  2. Strip user memories out of the system prompt. This is the specific fix Anthropic shipped in Claude Code v2.1.50. Memory may still inform the model, but it should not sit in the same layer that defines the agent’s role and rules.
  3. Quarantine environment-derived observations. Distinguish what the agent saw from what the agent decided was worth remembering. Tag observations with their source URL/domain, never let an observation from example-shop.com shape behaviour on example-bank.com.
  4. Make memory writes auditable. A diffable memory log — visible to the user, signable by the runtime — turns a silent persistence channel into a reviewable one. The OWASP Agent Memory Guard project is the reference implementation track for this control.
  5. Rate-limit and review hook writes. The MemoryTrap path went through global hooks and npm postinstall. Hooks should require an explicit human confirmation whenever an agent proposes writing one, and the system prompt should never blindly load hook files written during the same session.
  6. Test for cross-session and cross-site leakage. Standard prompt-injection regression suites end at the session boundary. ASI06-aware test suites must run an attack in session N and check for activation in session N+k on a different surface.
  7. Cap blast radius with capability scoping. Even when a poisoned memory wins the read, capability-bound execution — per-skill ACLs, no ambient credentials, egress whitelists — limits what the attacker can do with it.

Status

ItemReferenceDateNotes
OWASP framing postOWASP Gen AI Security Project2026-05-13ASI06 entry co-lead Idan Habler (Cisco)
eTAMP paperarXiv:2604.02623 v22026-04-07up to 32.5% ASR on GPT-5-mini, cross-site
MemoryTrap disclosureCisco AI Blog2026-04-01patched in Claude Code v2.1.50
MINJA paperarXiv:2503.037042025-0398.2% inject / 76.8% attack success
Habler interviewHelp Net Security2026-04-14Agentic AI memory attacks spread across sessions and users
CategoryOWASP Top 10 for Agentic Apps2025-12 → 2026ASI06: Memory & Context Poisoning

Memory is the feature that makes agents personal. It is also the feature that makes a single bad observation outlive the session in which it landed. The April and May 2026 publications above do not invent a novel exploit — they make the cost of ignoring an old one explicit.

Sources