system: OPERATIONAL
← back to all hacks
AGENTS MEDIUM NEW

Memory Control Flow Attacks: when stored memory steers an agent's tools

A March 2026 paper shows poisoned agent memory doesn't just corrupt content — it hijacks the control flow of tool selection, forcing unintended tools and skipped steps in over 90% of trials, across tasks and long after injection.

2026-06-10 // 7 min affects: gpt-5-mini, claude-sonnet-4-5, gemini-2-5-flash, langchain, llamaindex

What is this?

On March 16, 2026, Zhenlin Xu, Xiaogang Zhu, Yu Yao, Minhui Xue and Yiliao Song published From Storage to Steering: Memory Control Flow Attacks on LLM Agents (arXiv:2603.15125, cs.CR). The paper names a class of attack the authors call Memory Control Flow Attacks (MCFA).

Most published work on agent memory — including the cases we covered in agent memory poisoning (ASI06) and MemMorph — treats poisoned memory as a content problem: the agent retrieves a bad fact and produces a wrong answer. MCFA reframes the threat. The authors show that retrieved memory acts as a control input: it doesn’t just change what the agent says, it changes which tools the agent calls and in what order — even when the user gives explicit, safe instructions. Memory, in their phrase, becomes a “write-once, read-many control signal.”

How it works

A memory-augmented agent runs a loop: Read memory → Plan the tool sequence → ExecuteWrite new memory. MCFA targets the Read→Plan link. The attacker needs no internal privileges — they cannot edit the system prompt, change tool code, touch the memory store directly, or install tools. They only interact with the agent normally, and through one or a few conversations get it to store an action-oriented “preference” or “rule” in long-term memory. Later, during an unrelated benign task, that memory is retrieved and steers the tool trace.

The paper formalises five attack families (no exploit payloads are reproduced here — see the appendix of the paper for the protocol):

Family        Effect on the tool-call trace
------------  -------------------------------------------------------------
Override      Retrieved "preference" forces a risky/unwanted tool into the
              trace, overriding static safety filters.
Order         Workflow steps are reordered or skipped — e.g. a required
              Audit step is bypassed before a Transfer step. Invisible to
              allow-list defenses, which only check *which* tools, not order.
M-Scope       The injected rule generalises across unrelated tasks, acting
              as a cross-domain "master key".
Persistence   The deviation keeps firing long after the injection event,
              with no re-injection needed.
Relapse       Explicit "please stop / repair" instructions fail; the
              poisoned memory state resists textual correction.

To measure this at scale the authors built MemFlow, an automated framework that operationalises each attack as inject malicious memory → retrieve it during benign tasks → audit control-flow deviations. They ran it against GPT-5 mini, Claude Sonnet 4.5 and Gemini 2.5 Flash, using real tools from LangChain and LlamaIndex.

Why it matters

The headline numbers are high and consistent across models. Over 90% of trials were vulnerable to MCFA even under strict safety constraints. Tool-choice overrides appeared in 91.7–100% of trials, workflow reordering in 52.8–69.4%, cross-task scope expansion in 97.2–100%, with 100% persistence observed over long horizons.

Two properties make this worse than a one-off prompt injection. First, persistence and cross-task reach: a single poisoning interaction keeps steering future, unrelated tasks — the user who triggers the bad behaviour may not be the user who planted it. Second, the Order family defeats allow-list thinking. Many production guardrails check whether a tool may run; MCFA reorders or skips legitimate, allowed tools (skip the audit, then transfer), so every individual call looks permitted. This is the lethal-trifecta problem displaced into the planner: untrusted content that reaches memory now influences action selection, not just output.

The trust boundary to remember: anything an external party can get written into long-term memory should be treated as attacker-controllable input to the planner — not as trusted user state.

Defenses

The authors test a production-style mitigation and are explicit that it is not a silver bullet: more than half of the evaluated scenarios still showed over 85% control-flow deviation after it was applied. Defense here is layered, not a single switch.

  1. Role-Based Memory Segregation (RBMS). The paper’s own mitigation: separate system rules from user preferences into distinct channels and enforce an explicit priority hierarchy, so attacker-reachable user memory can never outrank system policy. It reduces attack success but does not eliminate it — treat it as a floor, not a ceiling.

  2. Make control flow a policy object, not an emergent one. Define the required tool order for sensitive workflows (e.g. Audit must precede Transfer) and enforce it deterministically outside the model. The Order family is invisible to allow-lists precisely because allow-lists ignore sequence and dependency.

  3. Treat retrieved memory as untrusted at retrieval time. Apply provenance tags to memory entries (who/what wrote this, from which channel) and refuse to let user-channel memories introduce or reorder tools in security-critical traces. Pair with consistency checks across related memories, as in A-MemGuard and the patterns in OWASP’s agent memory guidance.

  4. Audit the trace, not just the output. MCFA is defined as an auditable deviation in the tool-call trace. Log the full tool sequence per task and alert on traces that violate declared dependencies or include risky tools — this is observable even when the final answer looks correct.

  5. Test for persistence and relapse. Because corrected behaviour can relapse, security testing must inject, then run later, unrelated tasks and verify the deviation is actually gone — not just absent in the next turn. Benchmarks such as AgentDojo provide a starting harness for trace-level agent evaluation.

Status

ItemReferenceDateNotes
MCFA / MemFlow paperarXiv:2603.151252026-03-16Defines MCFA, 5 attack families, MemFlow framework
Models evaluatedPaper §42026-03-16GPT-5 mini, Claude Sonnet 4.5, Gemini 2.5 Flash
Frameworks testedLangChain, LlamaIndex2026-03-16Real-world tools, not toy mocks
Mitigation (RBMS)Paper §4.52026-03-16Reduces ASR; >85% deviation still in >half of scenarios
Related defenseA-MemGuard, arXiv:2510.023732025-10Consensus validation + dual-memory lessons

This is a research finding, not a CVE or an in-the-wild incident: there is no vendor patch to install. The actionable takeaway is architectural — if you ship an agent with persistent memory and tool use, assume external parties can plant control signals in that memory, and put the control flow of sensitive workflows under deterministic policy you own rather than under the model’s planner.

Sources