system: OPERATIONAL
← back to all hacks
AGENTS MEDIUM NEW

Temporal memory contamination: longitudinal safety drift in memory-equipped LLM agents

Three arXiv papers from April and May 2026 converge on a failure mode complementary to memory poisoning — memory-equipped agents drift unsafe as benign context accumulates, with compressed summaries acting as a laundering channel.

2026-05-28 // 7 min affects: openclaw, claude-code, claw-like-agents, langchain-agents, llamaindex-agents, autogen, crewai, a-mem

What is this?

Memory-equipped LLM agents have a safety problem that does not require an attacker. Three arXiv preprints published between April 17 and May 20, 2026 — A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty, State Contamination in Memory-Augmented LLM Agents, and Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents — converge on the claim that the same memory mechanisms that make agents useful across sessions also make them progressively less safe across sessions, even when no payload, prompt injection, or adversarial actor is involved.

This is complementary to, not redundant with, the ASI06 memory-poisoning category formalised by OWASP on May 13, 2026. Memory poisoning is an attacker writing into trusted state. Temporal memory contamination is what happens when nobody writes anything malicious — only ordinary tasks pile up and the agent’s safety profile shifts as a function of how much it remembers.

How it works

The three papers describe complementary slices of the same surface.

Longitudinal drift (arXiv 2605.17830, May 18, 2026). Al-Tawaha et al. introduce temporal memory contamination and a trigger-probe protocol: a fixed probe set is evaluated against read-only memory snapshots at varying prefix lengths, against a NullMemory counterfactual baseline that isolates memory exposure from stream non-stationarity. Across three deployment scenarios — records, memos and forms, and email correspondence — and eight memory architectures, memory-enabled agents consistently exceed the NullMemory baseline, and memory-induced violation rates show a robust upward trend with exposure length. The effect holds on Claw-like agents using the platform’s native memory mechanism, and order-randomization experiments show that the driver is accumulated content, not encounter order.

Memory laundering (arXiv 2605.16746, May 16, 2026). Wang et al. (UIUC) study the same surface as a stateful contamination problem. Many agent systems compress long conversations into short summaries so future agents can stay informed without reading the full history. The authors show this compression can also act as a laundering step:

toxic transcript

    │  (standard safety classifier:
    │   flags as toxic, blocks)

[ compression / summary step ]

    │  (standard safety classifier:
    │   scores summary as neutral)

"laundered" memory

    │  (re-enters context on a later turn,
    │   conditions next generation toward
    │   higher toxicity than NullMemory baseline)

contaminated downstream output

A representative laundered summary in the paper reads, for example, “the discussion has become heated, with participants expressing strong disagreement” — non-toxic under classifiers, but conditioning on it measurably raises expected Detoxify scores on subsequent generations compared with a matched neutral summary. The hostile framing survives the compression below the classifier threshold.

Mnemonic sovereignty (arXiv 2604.16548, April 17, 2026). The survey reframes the broader problem as governance of persistent state: which writes are authorised, who may read, which states remain auditable, and which may be forgotten. It identifies nine governance primitives and notes that no published memory architecture currently covers all nine, and that confidentiality, availability, store/forget, and benign-persistence failures remain under-studied relative to write- and retrieve-time integrity attacks.

Why it matters

Three operational consequences.

First, the failure mode is not detectable by single-state evaluation. A memory snapshot can pass every existing benchmark and the agent can still drift unsafe after enough sessions accumulate. Safety becomes a property of the trajectory, not of any individual prompt-response.

Second, summarisation, the default scaling lever for long-running agents, is part of the attack surface. Production stacks that use a summariser to keep context length under control are routing transcripts through a transform that current safety classifiers do not reliably catch on the output side. The State Contamination paper is explicit that sanitising only the completed summary can be too late, because harmful framing may already have been compressed below the classifier threshold.

Third, the affected products are already shipped. The Longitudinal paper tests on Claw-like agents including OpenClaw with its native memory mechanism, and the mechanism it describes generalises to any deployment using A-Mem, LangChain memory modules, LlamaIndex memory, AutoGen, CrewAI, Claude Code’s memory.json/SKILL.md layer, or comparable persistent stores.

Defenses

None of the papers propose a single silver bullet. The defensive playbook below combines their recommendations with the OWASP ASI06 controls already in circulation.

  1. Evaluate longitudinally, not point-in-time. Adopt a trigger-probe protocol along the lines of arXiv 2605.17830: a fixed probe set, applied to memory snapshots at increasing prefix lengths, with a NullMemory baseline so you can distinguish memory-induced violations from stream effects. If your current red-team harness is single-turn or single-session, it is blind to this class.

  2. Gate writes, sanitise reads. The State Contamination paper’s three-pathway framework — a fine-tuned policy for residual parametric amplification, a read-side sanitiser applied before generation, and a write-side gate applied before content re-enters transcript or memory — is more robust than any single intervention. Sanitising before memory update closes the laundered channel; sanitising only at retrieval is too late.

  3. Run classifiers on transcripts, not only on summaries. Memory laundering only works if your safety check fires at summary-write time. Score the source material before compression, and treat any summary derived from flagged source material as flagged regardless of its own score.

  4. Monitor retrieval state, not only generation. Al-Tawaha et al. show that memory-induced risk is detectable from the retrieval state before generation, and confirm this with a high-recall diagnostic monitor. A pre-generation hook that inspects what is being retrieved from memory is cheaper than a post-generation classifier and catches a class the post-hoc check misses.

  5. Treat memory as a separate trust boundary with an explicit lifecycle. Per the Mnemonic Sovereignty survey, the nine governance primitives — writability, read authorisation, audit, forget, and so on — should be addressed explicitly in the agent’s architecture, not inherited from whatever the memory library happens to default to.

  6. Add a session-budget control. If your safety profile degrades monotonically with exposure length, cap the exposure length. Periodic memory resets, or session-budget controls that force compaction-and-review at fixed intervals, bound the worst case while the research community converges on a stronger defence.

Status

ItemReferenceDateNotes
Mnemonic Sovereignty surveyarXiv:2604.165482026-04-17Nine governance primitives, no architecture covers all
State Contamination paperarXiv:2605.167462026-05-16Memory laundering, three-pathway mitigation
Remembering More, Risking More paperarXiv:2605.178302026-05-18Trigger-probe protocol, NullMemory baseline, OpenClaw tested
OWASP ASI06 framing postgenai.owasp.org2026-05-13Adversarial side of the same surface

The framing that ties the three papers together is the simplest: memory safety is a longitudinal property of an agent, not a single-state property that can be captured by a snapshot. Current production stacks treat it as the latter. The next round of memory-safety benchmarks, and the next round of agent-platform defaults, need to treat it as the former.

Sources