AGENTS CRITICAL

When prompts become shells: prompt injection escalates to RCE in agent frameworks

Two CVEs in Microsoft Semantic Kernel and four in CrewAI — all disclosed in early 2026 — turn a single injected prompt into remote code execution on the host. The pattern is structural, not incidental.

2026-05-27 // 7 min affects: semantic-kernel, crewai, langchain

What is this?

Throughout the first half of 2026, multiple agent frameworks have shipped vulnerabilities where a single crafted prompt — delivered directly or through indirect prompt injection — becomes a working RCE on the host running the agent. On May 7, 2026, the Microsoft Defender Security Research Team published a deep-dive on two such bugs in Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030). Six weeks earlier, on March 30, 2026, CERT/CC published VU#221883 describing four CVEs in CrewAI (CVE-2026-2275, 2285, 2286, 2287) that chain a prompt injection into arbitrary file read, SSRF and remote code execution.

The frameworks differ, but the underlying class of bug is the same. This article is about that class.

How it works

The pattern is consistent across disclosures. The LLM is doing exactly what it was trained to do: parse a natural-language instruction into a structured tool call. The vulnerability lives one layer down, in the glue code between the model’s output and the runtime that executes the tool.

In Semantic Kernel’s In-Memory Vector Store (CVE-2026-26030), the search filter was a Python lambda assembled by string interpolation and then handed to eval(). A blocklist tried to catch obvious calls (eval, exec, open, __import__) but did not cover attribute access through the Python type hierarchy. A model-controlled argument could close the string, traverse ().__class__.__bases__[0].__subclasses__(), locate BuiltinImporter, and execute os.system("calc.exe").

# Vulnerable pattern, simplified
new_filter = f"lambda x: x.city == '{kwargs[param.name]}'"
# kwargs[param.name] is model-controlled and only filtered by a
# Python AST blocklist that does not cover attribute traversal.
eval(new_filter, {"__builtins__": {}})  # still reachable via tuple().__class__...

The second Semantic Kernel bug (CVE-2026-25592) was simpler: DownloadFileAsync in the .NET sandbox plugin was accidentally annotated [KernelFunction], exposing an unbounded File.WriteAllBytes primitive to the model. The model could be prompted to write a payload into the Windows Startup folder, bypassing the sandbox by design.

In CrewAI, the Code Interpreter Tool is expected to run untrusted Python inside a Docker container. When Docker is unreachable, the tool silently falls back to an in-process Python sandbox, where the ctypes module is not on the blocklist — so arbitrary C calls re-open RCE on the host (CVE-2026-2275, CVE-2026-2287). Two other CVEs in the same disclosure cover SSRF in the RAG search tool (CVE-2026-2286) and arbitrary local file read in the JSON loader (CVE-2026-2285), giving an attacker enough primitives to chain into credential theft.

In MITRE ATLAS terms, the technique is AML.T0051 (LLM Prompt Injection) cascading into AML.T0016 (Obtain Capabilities).

Why it matters

The model is no longer just generating text — it is choosing tools, supplying their arguments, and triggering their side-effects on the host. Any parameter the model can influence has to be treated as attacker-controlled, exactly the way a query string parameter is treated in classical web security.

These are not exotic 0-days. Semantic Kernel has 27,000+ GitHub stars and ships in Microsoft 365 Copilot integrations. CrewAI is widely used in multi-agent demos and production pilots. The vulnerable code paths are the default configurations, and indirect injection (poisoned RAG documents, hostile web pages browsed by the agent, malicious tool responses) is enough to trigger them — no privileged access to the user’s chat required.

Defenses

The Microsoft team frames the central lesson clearly: your LLM is not a security boundary. The tools you expose define your attacker’s affected scope. Concretely:

Patch first. Upgrade semantic-kernel to ≥ 1.39.4 (Python) and ≥ 1.71.0 (.NET). For CrewAI, disable allow_code_execution=True, remove the Code Interpreter Tool where possible, and enforce that Docker is actually running — do not allow silent fallback to the in-process sandbox.
Treat tool parameters like untrusted input. Never pass model-supplied strings to eval, exec, dynamic SQL, shell, File.WriteAllBytes or any equivalent sink. Use parameterized APIs or strict allowlists; blocklists in dynamic languages fail by construction.
Audit the tool registry. Every function exposed to the model is part of the attack surface. Remove decorators ([KernelFunction], @tool) from anything that does not need to be model-callable.
Defense in depth at the host. Endpoint detection for the agent process: alert on shell spawn, on writes to startup folders, on outbound connections from the model runtime. The Microsoft post publishes useful KQL hunting queries for Defender XDR.
Investigate the vulnerable window. If you ran an affected version, the patch closes the bug but does not answer whether you were already exploited. Define the window from deployment until upgrade and hunt host-level post-exploitation telemetry inside it.

Status

Framework	CVE	Disclosed	Patched in
Semantic Kernel (Python)	CVE-2026-26030	2026-05-07	1.39.4
Semantic Kernel (.NET)	CVE-2026-25592	2026-05-07	1.71.0
CrewAI Code Interpreter	CVE-2026-2275	2026-03-30	partial — `ctypes` added to blocklist in upcoming release
CrewAI Docker fallback	CVE-2026-2287	2026-03-30	partial — fail-closed option being evaluated
CrewAI JSON loader	CVE-2026-2285	2026-03-30	no full patch at time of writing
CrewAI RAG search SSRF	CVE-2026-2286	2026-03-30	no full patch at time of writing

The Microsoft team has announced a continuing research series; expect similar findings in other frameworks in the coming weeks. CrewAI vulnerabilities were reported by Yarden Porat (Cyata); Semantic Kernel vulnerabilities were reported by Uri Oren, Amit Eliahu and Dor Edry (Microsoft Defender Security Research Team).