Treating AI agents like operating systems: a CISPA blueprint for isolation and privilege
A May 14, 2026 CISPA paper applies decades of OS security thinking to LLM agents. Tested on four OpenClaw-like systems, two weakness classes — cross-user exfiltration and unauthorized network egress — fail in every single one.
What is this?
On May 14, 2026, Lukas Pirch and six co-authors from CISPA Helmholtz Center and TU Berlin — including Thorsten Holz and Konrad Rieck — posted Toward Securing AI Agents Like Operating Systems on arXiv (2605.14932, cs.CR, CC-BY 4.0). The paper does not announce a new exploit class. It does something more useful for builders: it argues that LLM agents have re-invented the security problems that operating systems solved in the 1970s, and that the same toolkit — process isolation, privilege separation, mediated communication — is the realistic path out.
The authors back the analogy with a hands-on case study. They built a unified architecture covering the dominant open-source agent stacks, mapped attack surfaces onto it, and ran the same threat model against four widely deployed OpenClaw-like agents. The headline result is sobering: two weakness classes — cross-user data exfiltration and unauthorized network egress — break every single agent tested, under modest attacker capabilities.
How it works
The paper frames an agent as four pieces wired together: a planning core (the LLM), a tool layer (skills, MCP servers, browsers, shells), a memory layer (short-term context plus long-term stores), and a session boundary (per-user state). Each piece maps onto an OS concept the literature already has language for.
Operating system LLM agent
----------------- -----------------
Process ≈ Session
Process isolation ≈ Per-user state separation
User vs. kernel ≈ Trusted plan vs. untrusted tool output
Capabilities / syscalls ≈ Tool-call ACLs
File system permissions ≈ Memory + RAG read/write policies
Network namespace ≈ Egress policy for the agent process
IPC mediation ≈ Inter-agent / inter-skill communication
Against this map the paper enumerates two weakness families that survived every system tested:
- PI-1 — Cross-user data exfiltration. Agents that share a backend memory store, tool cache, or skill index across sessions allow one user’s content (documents, conversation history, secrets pasted earlier) to be recovered by another user’s session, sometimes from a single carefully shaped query. The OS analogue is the absence of user-level process isolation: every session reads from the same address space.
- NF-1 — Unauthorized message sending. Even agents marketed as “reply only” routinely reach out: HTTP fetches that the developer thought were sandboxed, MCP servers that proxy to upstream services, skills that quietly e-mail or post. There is no egress firewall on the agent process, so the moment a tool can issue any outbound request, exfiltration paths multiply.
The paper documents both with reproducible setups but — in line with responsible-disclosure norms — without dropping working payloads. The point is structural, not anecdotal: even “modest attacker capabilities” (a single user account, a single uploaded document, a single skill installation) cross both boundaries.
The work converges with adjacent results from this season. Microsoft Security’s May 7, 2026 advisory on RCE in AI-agent frameworks showed that prompts can lower into shells when the runtime conflates plan and execution privileges. The OWASP GenAI Exploit Round-up Q1 2026, published April 14, 2026, reported the same pattern at incident scale: failures are no longer about model outputs alone but about identities, orchestration layers, and supply chains. The CISPA paper is the systems-security framing those incident reports were missing.
Why it matters
Three points generalise beyond this paper.
First, the failure mode is architectural, not behavioural. PI-1 and NF-1 are not solved by a better safety classifier, a tighter system prompt, or a finer-grained jailbreak filter. A model that perfectly follows its instructions still leaks across sessions if the sessions share a backend, and still phones home if its tool layer can resolve external hostnames. Defences that focus exclusively on model output are aimed at the wrong layer.
Second, the OS literature is unusually generous here. Process isolation, capabilities (Capsicum, seL4), mandatory access control (SELinux, AppArmor), network namespaces, mediated IPC — these are not research artefacts. They are 30 years of shipped, audited, deployable engineering. The paper’s recommendations don’t ask agent builders to invent primitives; they ask agent builders to use the ones that already exist.
Third, MCP-style ecosystems amplify the blast radius. A single shared skill registry, a single multi-tenant memory store, a single broadly-permissioned MCP server: the value of these architectures is precisely that they share state, and that sharing becomes the attack surface. The paper aligns with the broader trend captured in CISA’s Careful Adoption of Agentic AI Services — namely, that agent procurement and design choices are now first-class security decisions.
Defenses
The paper’s recommendations map directly onto controls a team can apply now.
- Run each session in its own process or container. No shared filesystem write paths, no shared memory store, no shared cache that mixes user content. The OS-level guarantee is what stops PI-1; everything above it is best-effort.
- Default-deny network egress for the agent process. Whitelist the small set of hosts the agent legitimately needs to reach (your model gateway, your tool backends). Treat every other DNS lookup or HTTP request as a policy violation, log it, and break the flow before the response returns to the model. NF-1 disappears with a real egress firewall.
- Treat tool output as untrusted input, every time. Apply the same kind of taint-tracking you’d use for user-supplied form data: tool returns can carry instructions, links, or encoded payloads, and the planning LLM should not act on them without an out-of-band confirmation for any state-changing action.
- Capability-bound every tool call. A “list files” tool should not also be able to read arbitrary paths. A “fetch URL” tool should not also be able to issue POSTs. Per-tool ACLs, per-session token scopes, and a minimum-privilege default for every skill closes most of the lateral movement the paper documents.
- Mediate inter-agent / inter-skill communication. Treat A → B agent calls and skill → skill chains as IPC: schema-validated, rate-limited, logged, and revocable. The paper points to this as the agentic equivalent of OS IPC mediation, and it is also the surface the Microsoft RCE write-up singled out as the lowest-hanging escalation path.
- Audit the four-layer map for your own agent. The unified architecture in §3 of the paper is a useful template: walk the planning, tool, memory, and session-boundary layers, and check that each one has a named owner, a written policy, and a control that enforces it. Anything left implicit is the next post-mortem.
Status
| Item | Reference | Date | Notes |
|---|---|---|---|
| Paper posted | arXiv:2605.14932v1 | 2026-05-14 | cs.CR, 17 pages, CC-BY 4.0 |
| Institutions | CISPA Helmholtz Center, TU Berlin | 2026-05-14 | Holz, Rieck and team |
| Systems evaluated | 4 OpenClaw-like agents | 2026-05-14 | Vendor-anonymised in the paper |
| Universal failures | PI-1 (cross-user exfiltration), NF-1 (unauthorized egress) | 2026-05-14 | 100% of systems tested |
| Adjacent advisory | Microsoft “Prompts become shells” | 2026-05-07 | RCE in AI-agent frameworks |
| Adjacent incident corpus | OWASP GenAI Exploit Round-up Q1 2026 | 2026-04-14 | Identities, orchestration, supply chains |
| Adjacent policy guidance | CISA Careful Adoption of Agentic AI Services | 2026 | Procurement-side controls |
No single fix retires PI-1 or NF-1. The CISPA paper’s contribution is to name a category of failure — “we are running multi-user systems without the isolation primitives multi-user systems require” — and to point at the shelf of well-aged tools that already solve it. A 2026 agent deployment whose threat model stops at prompt-injection scanners and output filters is, in the paper’s framing, an operating system without process isolation: not yet wrong about model behaviour, but already wrong about systems design.
Sources
- → https://arxiv.org/abs/2605.14932
- → https://arxiv.org/html/2605.14932v1
- → https://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/
- → https://genai.owasp.org/2026/04/14/owasp-genai-exploit-round-up-report-q1-2026/
- → https://www.cisa.gov/resources-tools/resources/careful-adoption-agentic-ai-services