DEFENSE MEDIUM

Confidential Computing for Agentic AI: what enclaves can't protect

A May 2026 survey maps confidential computing onto the agentic stack — hardware enclaves can shield agent memory and KV caches from a malicious cloud operator, but they cannot stop prompt injection.

2026-06-15 // 6 min affects: llm-agents, mcp, rag-systems, gpu-tee

What is this?

On May 4, 2026, Javad Forough, Marios Kogias and Hamed Haddadi published When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI (arXiv:2605.03213). It is the first systematic attempt to map Confidential Computing (CC) — hardware-rooted trusted execution environments (TEEs) — onto the security needs of LLM agents that plan, call tools, keep persistent memory, and delegate to peer agents over protocols such as MCP and A2A.

The survey’s framing is useful precisely because it is honest about the limits. CC is not a fix for prompt injection. Its headline conclusion: while several hardware trust primitives look mature enough for targeted deployments, “no broadly established end-to-end framework yet binds them into a coherent security substrate for production agentic AI.” This is a defensive-architecture piece, not a new attack.

How it works

Today’s agent defenses — input filters, output classifiers, allowlists — “operate entirely within the software stack and can be silently bypassed by a sufficiently privileged adversary such as a compromised cloud operator.” CC moves the trust boundary into hardware: code and data run inside an attested enclave that even the host OS, hypervisor, or infrastructure operator cannot read or tamper with.

The authors decompose an agent into five layers — perception, planning, memory, action, coordination — and rank adversaries by strength: external attacker, compromised co-tenant, malicious infrastructure operator (the case CC is built to defeat), and compromised agent. They then identify the high-value assets that a TEE would wrap:

perception   -> user prompts, retrieved docs, tool inputs
planning     -> model weights, system prompts, fine-tuned LoRA adapters
memory       -> KV cache, conversation history, vector store, credentials
action       -> tool calls, parameters, tool outputs
coordination -> inter-agent messages, delegation claims, attestation evidence

Memory is called out as a top target: long-term vector stores accumulate months of proprietary context, and the KV cache can leak verbatim conversation. The survey cites the real-world LeftoverLocals finding, where KV-cache residue in shared GPU memory enabled cross-tenant conversation reconstruction — exactly the class of leak a GPU TEE is meant to close.

The hardware exists. The survey covers six platforms — Intel SGX, Intel TDX, AMD SEV-SNP, ARM TrustZone, ARM CCA, and NVIDIA H100 Confidential Computing — the first GPU TEE, anchored in an on-die root of trust with signed attestation reports. NVIDIA reports under ~7% LLM-inference overhead in CC mode, and independent measurements confirm GPU-TEE overhead is now small enough for production — unlike fully homomorphic encryption or MPC, which the authors note still impose two-to-four orders of magnitude of overhead.

Why it matters

Agentic deployments concentrate secrets: provider API keys, retrieved enterprise documents, and accumulated memory all sit in one runtime. In a multi-tenant cloud, the operator — or anyone who compromises it — is inside that trust boundary. CC is the only control that addresses that adversary directly, and an IDC survey cited by the authors found 75% of organizations adopting CC (18% in production, 57% piloting).

But the sharpest point is what CC does not solve. In an agent, “the LLM is the control plane,” and the attack surface “is the meaning of data, not its origin or format.” A TEE can prove which code is running and keep memory confidential; it cannot prove the intent of an input. So an enclave will faithfully execute a prompt-injected instruction. The 2025 EchoLeak exploit (CVE-2025-32711), a zero-click injection that exfiltrated Microsoft 365 Copilot data from a single email, would have run identically inside a perfectly attested enclave.

Defenses

Use CC for the operator threat, not the injection threat. Put model weights, vector stores, KV cache, and credentials inside a CPU+GPU TEE to defeat a malicious or compromised infrastructure operator and co-tenant memory leaks (LeftoverLocals-style).
Keep semantic defenses in place. CC is complementary to — not a replacement for — input/output filtering, least-privilege tool scoping, and the lethal-trifecta discipline. Prompt injection remains an architectural problem.
Demand composite attestation. Attest the CPU TEE and the GPU TEE together; GPU-only attestation leaves gaps. Treat attestation as “what code is running,” never as “this input is trustworthy.”
Mind the residual leaks. TEEs do not stop side channels — timing, cache/bus contention, controlled-channel and page-fault attacks, GPU residual state. Pair CC with reproducible builds, model cards, and fine-tuning provenance.
Watch the open problems. The authors flag six unresolved areas, including compound attestation for multi-hop agent chains, TEE-backed RAG isolation, and side-channel leakage in autoregressive inference.

Status

Item	Detail
Source	Survey, arXiv:2605.03213v1 (cs.CR), CC BY 4.0
Published	May 4, 2026
Threat addressed	Malicious infrastructure operator; co-tenant memory leakage
Not addressed	Prompt injection, unsafe goals, model supply-chain compromise, side channels
GPU TEE maturity	NVIDIA H100 CC, ~7% inference overhead (NVIDIA-reported)
Maturity verdict	Primitives usable; no end-to-end agentic CC substrate yet