When prompts become shells: prompt injection escalates to RCE in agent frameworks
Two CVEs in Microsoft Semantic Kernel and four in CrewAI — all disclosed in early 2026 — turn a single injected prompt into remote code execution on the host. The pattern is structural, not incidental.
What is this?
Throughout the first half of 2026, multiple agent frameworks have shipped vulnerabilities where a single crafted prompt — delivered directly or through indirect prompt injection — becomes a working RCE on the host running the agent. On May 7, 2026, the Microsoft Defender Security Research Team published a deep-dive on two such bugs in Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030). Six weeks earlier, on March 30, 2026, CERT/CC published VU#221883 describing four CVEs in CrewAI (CVE-2026-2275, 2285, 2286, 2287) that chain a prompt injection into arbitrary file read, SSRF and remote code execution.
The frameworks differ, but the underlying class of bug is the same. This article is about that class.
How it works
The pattern is consistent across disclosures. The LLM is doing exactly what it was trained to do: parse a natural-language instruction into a structured tool call. The vulnerability lives one layer down, in the glue code between the model’s output and the runtime that executes the tool.
In Semantic Kernel’s In-Memory Vector Store (CVE-2026-26030), the search filter was a Python lambda assembled by string interpolation and then handed to eval(). A blocklist tried to catch obvious calls (eval, exec, open, __import__) but did not cover attribute access through the Python type hierarchy. A model-controlled argument could close the string, traverse ().__class__.__bases__[0].__subclasses__(), locate BuiltinImporter, and execute os.system("calc.exe").
# Vulnerable pattern, simplified
new_filter = f"lambda x: x.city == '{kwargs[param.name]}'"
# kwargs[param.name] is model-controlled and only filtered by a
# Python AST blocklist that does not cover attribute traversal.
eval(new_filter, {"__builtins__": {}}) # still reachable via tuple().__class__...
The second Semantic Kernel bug (CVE-2026-25592) was simpler: DownloadFileAsync in the .NET sandbox plugin was accidentally annotated [KernelFunction], exposing an unbounded File.WriteAllBytes primitive to the model. The model could be prompted to write a payload into the Windows Startup folder, bypassing the sandbox by design.
In CrewAI, the Code Interpreter Tool is expected to run untrusted Python inside a Docker container. When Docker is unreachable, the tool silently falls back to an in-process Python sandbox, where the ctypes module is not on the blocklist — so arbitrary C calls re-open RCE on the host (CVE-2026-2275, CVE-2026-2287). Two other CVEs in the same disclosure cover SSRF in the RAG search tool (CVE-2026-2286) and arbitrary local file read in the JSON loader (CVE-2026-2285), giving an attacker enough primitives to chain into credential theft.
In MITRE ATLAS terms, the technique is AML.T0051 (LLM Prompt Injection) cascading into AML.T0016 (Obtain Capabilities).
Why it matters
The model is no longer just generating text — it is choosing tools, supplying their arguments, and triggering their side-effects on the host. Any parameter the model can influence has to be treated as attacker-controlled, exactly the way a query string parameter is treated in classical web security.
These are not exotic 0-days. Semantic Kernel has 27,000+ GitHub stars and ships in Microsoft 365 Copilot integrations. CrewAI is widely used in multi-agent demos and production pilots. The vulnerable code paths are the default configurations, and indirect injection (poisoned RAG documents, hostile web pages browsed by the agent, malicious tool responses) is enough to trigger them — no privileged access to the user’s chat required.
Defenses
The Microsoft team frames the central lesson clearly: your LLM is not a security boundary. The tools you expose define your attacker’s affected scope. Concretely:
- Patch first. Upgrade
semantic-kernelto ≥ 1.39.4 (Python) and ≥ 1.71.0 (.NET). For CrewAI, disableallow_code_execution=True, remove the Code Interpreter Tool where possible, and enforce that Docker is actually running — do not allow silent fallback to the in-process sandbox. - Treat tool parameters like untrusted input. Never pass model-supplied strings to
eval,exec, dynamic SQL, shell,File.WriteAllBytesor any equivalent sink. Use parameterized APIs or strict allowlists; blocklists in dynamic languages fail by construction. - Audit the tool registry. Every function exposed to the model is part of the attack surface. Remove decorators (
[KernelFunction],@tool) from anything that does not need to be model-callable. - Defense in depth at the host. Endpoint detection for the agent process: alert on shell spawn, on writes to startup folders, on outbound connections from the model runtime. The Microsoft post publishes useful KQL hunting queries for Defender XDR.
- Investigate the vulnerable window. If you ran an affected version, the patch closes the bug but does not answer whether you were already exploited. Define the window from deployment until upgrade and hunt host-level post-exploitation telemetry inside it.
Status
| Framework | CVE | Disclosed | Patched in |
|---|---|---|---|
| Semantic Kernel (Python) | CVE-2026-26030 | 2026-05-07 | 1.39.4 |
| Semantic Kernel (.NET) | CVE-2026-25592 | 2026-05-07 | 1.71.0 |
| CrewAI Code Interpreter | CVE-2026-2275 | 2026-03-30 | partial — ctypes added to blocklist in upcoming release |
| CrewAI Docker fallback | CVE-2026-2287 | 2026-03-30 | partial — fail-closed option being evaluated |
| CrewAI JSON loader | CVE-2026-2285 | 2026-03-30 | no full patch at time of writing |
| CrewAI RAG search SSRF | CVE-2026-2286 | 2026-03-30 | no full patch at time of writing |
The Microsoft team has announced a continuing research series; expect similar findings in other frameworks in the coming weeks. CrewAI vulnerabilities were reported by Yarden Porat (Cyata); Semantic Kernel vulnerabilities were reported by Uri Oren, Amit Eliahu and Dor Edry (Microsoft Defender Security Research Team).