<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>LLM-Hacking</title><description>Open database of LLM attacks, jailbreaks, and defenses.</description><link>https://www.llm-hacking.com/</link><item><title>SymJack: one approved file copy becomes RCE in six AI coding agents</title><link>https://www.llm-hacking.com/hacks/symjack-symlink-approval-rce-coding-agents.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/symjack-symlink-approval-rce-coding-agents.md</guid><description>Adversa AI disclosed on May 26, 2026 a symlink-hijack pattern that turns a single benign-looking shell copy into a config overwrite and host RCE across Claude Code, Cursor, Gemini, Antigravity, Copilot, Grok Build and Codex CLIs.</description><pubDate>Sat, 30 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Slopsquatting in 2026: 127 package names that all five frontier LLMs hallucinate</title><link>https://www.llm-hacking.com/hacks/slopsquatting-frontier-models-2026.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/slopsquatting-frontier-models-2026.md</guid><description>A May 16, 2026 arXiv replication of the USENIX Security &apos;25 slopsquatting study finds hallucination rates are down across frontier models — but identifies 127 phantom packages that every tested model invents identically, a model-agnostic supply-chain attack surface.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>SUPPLY CHAIN</category></item><item><title>Blindfold: action-level jailbreaks bypass semantic defenses on embodied LLMs</title><link>https://www.llm-hacking.com/hacks/blindfold-embodied-llm-action-jailbreak.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/blindfold-embodied-llm-action-jailbreak.md</guid><description>A SenSys &apos;26 paper (May 11–14, 2026) introduces Blindfold, an automated framework that jailbreaks embodied LLMs by decomposing harmful goals into individually benign actions — up to 53% higher attack success than semantic-level baselines on a real 6DoF robotic arm.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>MCPwn (CVE-2026-33032): nginx-ui MCP endpoint hands over the web server</title><link>https://www.llm-hacking.com/hacks/mcpwn-nginx-ui-cve-2026-33032.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/mcpwn-nginx-ui-cve-2026-33032.md</guid><description>An unauthenticated MCP endpoint in nginx-ui ≤ 2.3.3 lets any network attacker rewrite nginx configs and restart the service. CVSS 9.8, publicly disclosed on April 15, 2026, exploited in the wild within hours of the patch.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>INFRASTRUCTURE</category></item><item><title>Measuring LLM exploit capability: ExploitBench, ExploitGym and the SCONE-bench refresh</title><link>https://www.llm-hacking.com/hacks/exploit-evals-capability-ladder.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/exploit-evals-capability-ladder.md</guid><description>On May 22, 2026 Anthropic published Mythos Preview results on three new exploitation benchmarks. The numbers — and the way the benchmarks decompose the exploit chain — change how defenders should think about frontier offensive capability.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>Proprietary Problems: Cisco&apos;s 15-model paired-regime study shows single-turn safety scores miss most multi-turn risk</title><link>https://www.llm-hacking.com/hacks/cisco-proprietary-problems-multi-turn-frontier-eval.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/cisco-proprietary-problems-multi-turn-frontier-eval.md</guid><description>A May 27, 2026 Cisco study of 15 flagship closed models from OpenAI, Anthropic, Google, Amazon and xAI records multi-turn attack success rates of 7.89% to 88.30% — and cross-regime gaps up to 55 percentage points over single-turn baselines.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>One million exposed AI services: what the Intruder scan actually found</title><link>https://www.llm-hacking.com/hacks/one-million-exposed-ai-services-intruder-scan.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/one-million-exposed-ai-services-intruder-scan.md</guid><description>On May 5, 2026, Intruder published the results of an internet-wide scan that mapped 1 million exposed AI services across 2 million hosts. The recurring failure is not exotic — it is permissive defaults.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>The agent-human security gap: what production ships, what papers study</title><link>https://www.llm-hacking.com/hacks/agent-human-interaction-security-gap.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/agent-human-interaction-security-gap.md</guid><description>A May 23, 2026 UCLA paper audits 59 academic studies, 21 production agent systems and 26 security plugins — and finds that the defenses researchers favor have zero production deployment.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>The Autonomy Tax: how defense training breaks LLM agents</title><link>https://www.llm-hacking.com/hacks/autonomy-tax-defense-training-breaks-agents.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/autonomy-tax-defense-training-breaks-agents.md</guid><description>A March 19, 2026 USC paper measures the cost of prompt-injection-defense training on agent competence — defended models time out on 99% of tasks, vs 13% for undefended baselines.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>MCP needs a trust handshake: attested tool-server admission</title><link>https://www.llm-hacking.com/hacks/mcp-attested-tool-server-admission.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/mcp-attested-tool-server-admission.md</guid><description>A May 22, 2026 arXiv paper proposes mcp-attested — a backward-compatible MCP extension that gates tool dispatch on signed clearance, deny-by-default allowlists, and tamper-evident audit logs.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>WARD: a co-evolved guard model that holds up against adaptive prompt injection on web agents</title><link>https://www.llm-hacking.com/hacks/ward-web-agent-guard-prompt-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/ward-web-agent-guard-prompt-injection.md</guid><description>A May 14, 2026 NUS paper proposes WARD — a guard model trained against a memory-driven adversarial attacker — and reports near-perfect out-of-distribution recall on web-agent prompt injection.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>MemMorph: hijacking tool selection in LLM agents through fluent memory poisoning</title><link>https://www.llm-hacking.com/hacks/memmorph-tool-hijacking-via-memory-poisoning.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/memmorph-tool-hijacking-via-memory-poisoning.md</guid><description>A May 24, 2026 arXiv paper from NTU Singapore shows three plausible-looking memory entries can steer an agent toward an attacker-chosen tool with 85.9% success — and survive three off-the-shelf defenses.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>SilentRetrieval: fluent RAG corpus poisoning that slips past perplexity filters</title><link>https://www.llm-hacking.com/hacks/silentretrieval-rag-corpus-poisoning.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/silentretrieval-rag-corpus-poisoning.md</guid><description>A May 27, 2026 arXiv preprint introduces a two-stage attack that hides goal-hijacking triggers inside fluent documents, reaching 57% LLM-attack success on Natural Questions and MS MARCO with one poisoned record per query.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>ADVERSARIAL</category></item><item><title>CISA + Five Eyes publish the first joint guidance on agentic-AI adoption</title><link>https://www.llm-hacking.com/hacks/cisa-five-eyes-careful-adoption-agentic-ai.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/cisa-five-eyes-careful-adoption-agentic-ai.md</guid><description>On May 1, 2026, CISA, NSA and the Five Eyes cyber agencies released &apos;Careful Adoption of Agentic AI Services&apos; — a 5-risk taxonomy and a deployment playbook that critical-infrastructure operators are now expected to fold into their existing cybersecurity frameworks.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>GOVERNANCE</category></item><item><title>Microsoft Copilot Cowork: poisoned skills exfiltrate M365 files with no approval</title><link>https://www.llm-hacking.com/hacks/copilot-cowork-skill-exfiltration.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/copilot-cowork-skill-exfiltration.md</guid><description>PromptArmor&apos;s May 26, 2026 disclosure shows that a five-line prompt injection inside a Copilot Cowork skill file can leak SharePoint and OneDrive documents through auto-approved Teams messages — no patch closes the design.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>CrossMPI: image-only prompt injection steers what VLMs read and see</title><link>https://www.llm-hacking.com/hacks/crossmpi-image-only-prompt-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/crossmpi-image-only-prompt-injection.md</guid><description>A May 15, 2026 Xidian University arXiv paper introduces CrossMPI: imperceptible image perturbations that change how vision-language models interpret both the image and the user&apos;s text prompt, with 66% average success across five LVLMs.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>MULTIMODAL</category></item><item><title>IterInject: when an LLM optimiser writes its own indirect prompt injections</title><link>https://www.llm-hacking.com/hacks/iterinject-feedback-guided-indirect-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/iterinject-feedback-guided-indirect-injection.md</guid><description>A May 23, 2026 paper closes the loop between payload, diagnoser and LLM optimiser — lifting indirect-injection ASR from near-zero to 33–90% on InjecAgent and compromising 5 of 9 Claude Code targets.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>INDIRECT INJECTION</category></item><item><title>NSA AISC publishes MCP security design guidance for production AI</title><link>https://www.llm-hacking.com/hacks/nsa-aisc-mcp-security-design-considerations.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/nsa-aisc-mcp-security-design-considerations.md</guid><description>On May 20, 2026, NSA&apos;s Artificial Intelligence Security Center released a 15-page Cybersecurity Information Sheet on Model Context Protocol — eight classes of weakness, five real-world incidents, nine defensive recommendations.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>GOVERNANCE</category></item><item><title>Poisoning the Watchtower: when SOC copilots read attacker-controlled logs</title><link>https://www.llm-hacking.com/hacks/poisoning-watchtower-soc-log-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/poisoning-watchtower-soc-log-injection.md</guid><description>A May 23, 2026 paper formalises log-substrate prompt injection — adversarial content in log fields steering LLM-based SOC assistants. Best defense leaves 11.8% average injection success.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>pgAdmin 4 ships an LLM panel and a classic LFI+SSRF arrives with it (CVE-2026-7817)</title><link>https://www.llm-hacking.com/hacks/pgadmin-4-llm-api-cve-2026-7817.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/pgadmin-4-llm-api-cve-2026-7817.md</guid><description>pgAdmin 4 9.15 patches an authenticated LFI and SSRF in its new LLM API configuration endpoints. The bug class is decades old; the surface is brand new.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>SUPPLY CHAIN</category></item><item><title>Temporal memory contamination: longitudinal safety drift in memory-equipped LLM agents</title><link>https://www.llm-hacking.com/hacks/temporal-memory-contamination.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/temporal-memory-contamination.md</guid><description>Three arXiv papers from April and May 2026 converge on a failure mode complementary to memory poisoning — memory-equipped agents drift unsafe as benign context accumulates, with compressed summaries acting as a laundering channel.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>The pressure: open-source security teams under the AI-assisted vulnerability flood</title><link>https://www.llm-hacking.com/hacks/the-pressure-open-source-ai-vuln-flood.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/the-pressure-open-source-ai-vuln-flood.md</guid><description>On May 26, 2026, curl&apos;s Daniel Stenberg published &apos;The pressure&apos; — more than one credible security report per day, twelve confirmed CVEs in half a release cycle, and a pattern other maintainers are now reporting in parallel.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>GOVERNANCE</category></item><item><title>The agent harness is your real privilege boundary — and most teams draw it in the wrong place</title><link>https://www.llm-hacking.com/hacks/agent-harness-privilege-boundary.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/agent-harness-privilege-boundary.md</guid><description>A May 26, 2026 Pillar Security write-up argues the harness — Claude Code, Cursor, Codex — holds the secrets, tools and hooks an agent never sees. Recent harness bugs and CVE-2026-22708 make the case concrete.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Sockpuppeting: a one-line prefill that jailbreaks 11 production LLMs</title><link>https://www.llm-hacking.com/hacks/sockpuppeting-assistant-prefill-jailbreak.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/sockpuppeting-assistant-prefill-jailbreak.md</guid><description>A line of code injected as the last assistant message coaxes 7 of 10 major models into harmful completions. The fix is not at the model — it is API-side message-order validation.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>JAILBREAK</category></item><item><title>GrafanaGhost: indirect prompt injection chained with a URL-parse bug to exfiltrate dashboard data</title><link>https://www.llm-hacking.com/hacks/grafanaghost-indirect-prompt-injection-exfiltration.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/grafanaghost-indirect-prompt-injection-exfiltration.md</guid><description>Noma Security&apos;s April 7, 2026 disclosure shows how three modest defects — a stored injection point, a startsWith(&apos;/&apos;) URL check, and a one-word guardrail bypass — combine into a silent exfiltration path through Grafana&apos;s AI assistant.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>INDIRECT INJECTION</category></item><item><title>Networks of agents break in new ways: Microsoft&apos;s red-team, plus RAMPART and Clarity</title><link>https://www.llm-hacking.com/hacks/agent-networks-emergent-attacks-rampart.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/agent-networks-emergent-attacks-rampart.md</guid><description>Microsoft Research red-teamed an internal platform of 100+ always-on agents. Four attack patterns — propagation, amplification, trust capture, proxy chains — show up only at the network level. RAMPART and Clarity, open-sourced May 20, 2026, are the response.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Antigravity find_by_name: when a native tool call jumps over Secure Mode</title><link>https://www.llm-hacking.com/hacks/antigravity-find-by-name-tool-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/antigravity-find-by-name-tool-injection.md</guid><description>On April 20, 2026, Pillar Security disclosed that a single unsanitised parameter in Google Antigravity&apos;s find_by_name tool turned file search into arbitrary code execution — and bypassed the IDE&apos;s strictest sandbox.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Apple&apos;s May 2026 bulletin formally credits Claude on two macOS CVEs</title><link>https://www.llm-hacking.com/hacks/apple-may-2026-claude-credited-cves.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/apple-may-2026-claude-credited-cves.md</guid><description>On May 11, 2026, Apple&apos;s macOS Tahoe 26.5 advisory named Claude alongside its researchers on two CVEs — a kernel integer overflow and a WebKit use-after-free. AI-assisted vulnerability research is now in the official changelog.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>OFFENSIVE AI</category></item><item><title>BadHost (CVE-2026-48710): one Host-header character bypasses auth in Starlette, vLLM and FastMCP</title><link>https://www.llm-hacking.com/hacks/badhost-starlette-fastmcp-cve-2026-48710.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/badhost-starlette-fastmcp-cve-2026-48710.md</guid><description>X41 D-Sec disclosed on May 22, 2026 a critical auth bypass in Starlette &lt; 1.0.1. A single / ? or # in the HTTP Host header desynchronises the routed path from the path the middleware sees, breaking path-based authorization in vLLM, LiteLLM, FastMCP and thousands of FastAPI-based AI agents.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>INFRASTRUCTURE</category></item><item><title>Bleeding Llama: a GGUF parsing flaw leaks Ollama process memory to unauthenticated attackers</title><link>https://www.llm-hacking.com/hacks/bleeding-llama-ollama-cve-2026-7482.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/bleeding-llama-ollama-cve-2026-7482.md</guid><description>CVE-2026-7482, publicly disclosed in May 2026 and codenamed Bleeding Llama by Cyera, lets a remote attacker pull arbitrary chunks of an Ollama server&apos;s heap — API keys, system prompts, other users&apos; conversations — with three unauthenticated API calls. The silent patch shipped 2.5 months before the CVE was assigned.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>DATA LEAK</category></item><item><title>ClaudeBleed: when a browser agent trusts the wrong extension</title><link>https://www.llm-hacking.com/hacks/claudebleed-extension-trust-boundary.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/claudebleed-extension-trust-boundary.md</guid><description>LayerX disclosed ClaudeBleed on May 6, 2026: a trust-boundary flaw let any Chrome extension drive Claude in Chrome and exfiltrate Gmail, Drive and GitHub data. The first patch was bypassed within hours.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Encoded prompt injection: when guardrails fail because the LLM decodes the payload</title><link>https://www.llm-hacking.com/hacks/encoded-prompt-injection-action-layer.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/encoded-prompt-injection-action-layer.md</guid><description>On May 4, 2026 a tweet written in Morse code drained around $175K from a Grok-controlled crypto wallet. The incident is the most expensive demonstration to date of an old defensive blind spot — string-matching guardrails can&apos;t see through encodings that the model itself happily decodes.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>PROMPT INJECTION</category></item><item><title>The first CVE wave: AI-assisted discovery is reshaping disclosure volumes</title><link>https://www.llm-hacking.com/hacks/first-cve-wave-ai-assisted-disclosure.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/first-cve-wave-ai-assisted-disclosure.md</guid><description>VulnCheck&apos;s May 14, 2026 analysis shows year-to-date CVE issuance up +563% on Chrome, +476% on GitHub, +180% on VMware, +170% on Apache. The systemic shift behind the Apple, Mozilla and ActiveMQ headlines is now visible in the numbers.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>OFFENSIVE AI</category></item><item><title>Font-mapping prompt injection: when peer review becomes an LLM attack surface</title><link>https://www.llm-hacking.com/hacks/llm-reviewer-font-mapping-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/llm-reviewer-font-mapping-injection.md</guid><description>A May 25, 2026 arXiv benchmark shows hidden font-mapping payloads can flip LLM peer reviews from reject to accept. ICML 2026 already used the same trick in reverse to desk-reject 497 papers.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>PROMPT INJECTION</category></item><item><title>MCP STDIO transport: the design choice that became 11 CVEs and 200,000 exposed agents</title><link>https://www.llm-hacking.com/hacks/mcp-stdio-transport-by-design-rce.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/mcp-stdio-transport-by-design-rce.md</guid><description>On April 16, 2026, OX Security disclosed that Anthropic&apos;s MCP STDIO transport executes any OS command it is handed. Anthropic called it &apos;by design&apos;. The cascade has produced eleven downstream CVEs in six weeks.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>MultiBreak: 10,389 multi-turn prompts expose how conversational jailbreaks slip past LLM safety</title><link>https://www.llm-hacking.com/hacks/multibreak-multi-turn-jailbreak-benchmark.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/multibreak-multi-turn-jailbreak-benchmark.md</guid><description>A May 3, 2026 ICML paper releases the largest, most diverse multi-turn jailbreak benchmark to date. It records attack-success-rate gaps of up to 54 points over the previous state of the art on DeepSeek-R1-7B and 34.6 on GPT-4.1-mini — and quantifies how alignment that holds in single turns collapses across follow-ups.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>When prompts become shells: prompt injection escalates to RCE in agent frameworks</title><link>https://www.llm-hacking.com/hacks/prompt-injection-to-rce-agent-frameworks.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/prompt-injection-to-rce-agent-frameworks.md</guid><description>Two CVEs in Microsoft Semantic Kernel and four in CrewAI — all disclosed in early 2026 — turn a single injected prompt into remote code execution on the host. The pattern is structural, not incidental.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Teaching Claude Why: how Anthropic drove agentic misalignment to zero</title><link>https://www.llm-hacking.com/hacks/teaching-claude-why-alignment-generalization.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/teaching-claude-why-alignment-generalization.md</guid><description>On May 8, 2026, Anthropic&apos;s Alignment Science team published a case study showing that teaching Claude to explain its ethical reasoning — not just demonstrate it — cut agentic misalignment from 96% to under 1%.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>Poison once, exploit forever: persistent memory poisoning of LLM agents (OWASP ASI06)</title><link>https://www.llm-hacking.com/hacks/agent-memory-poisoning-asi06.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/agent-memory-poisoning-asi06.md</guid><description>An April 2026 arXiv paper on cross-site memory poisoning and a May 13, 2026 OWASP post on the Cisco MemoryTrap finding against Claude Code converge on the same lesson: agent memory is a trust boundary.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Treating AI agents like operating systems: a CISPA blueprint for isolation and privilege</title><link>https://www.llm-hacking.com/hacks/agents-as-operating-systems.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/agents-as-operating-systems.md</guid><description>A May 14, 2026 CISPA paper applies decades of OS security thinking to LLM agents. Tested on four OpenClaw-like systems, two weakness classes — cross-user exfiltration and unauthorized network egress — fail in every single one.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>AI-assisted ICS attack: lessons from the Monterrey water utility intrusion</title><link>https://www.llm-hacking.com/hacks/ai-assisted-ics-attack-monterrey-water.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/ai-assisted-ics-attack-monterrey-water.md</guid><description>Dragos&apos; May 2026 report on Servicios de Agua y Drenaje de Monterrey documents the first publicly analysed campaign in which a commercial LLM — Claude — was the primary technical operator of an attempted OT intrusion.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>OFFENSIVE AI</category></item><item><title>AudioHijack: imperceptible audio hijacks voice agents (IEEE S&amp;P 2026)</title><link>https://www.llm-hacking.com/hacks/audiohijack-auditory-prompt-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/audiohijack-auditory-prompt-injection.md</guid><description>An April 16, 2026 IEEE S&amp;P paper introduces auditory prompt injection: adversarial reverb hidden in audio drives 13 large audio-language models and commercial voice agents (Mistral AI, Microsoft Azure) into unauthorized actions with 79-96% success.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>MULTIMODAL</category></item><item><title>Discourse AI XSS (CVE-2026-27740): when LLM output is trusted as HTML</title><link>https://www.llm-hacking.com/hacks/discourse-ai-llm-xss-cve-2026-27740.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/discourse-ai-llm-xss-cve-2026-27740.md</guid><description>A flagged post, an AI moderator, an htmlSafe call. The Discourse AI plugin treated LLM output as trusted markup, turning indirect prompt injection into Staff-side XSS. Published March 19, 2026.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>INDIRECT INJECTION</category></item><item><title>The Lethal Trifecta: when an agent reads private data, untrusted content, and can phone home</title><link>https://www.llm-hacking.com/hacks/lethal-trifecta.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/lethal-trifecta.md</guid><description>Simon Willison&apos;s framework for the single architectural mistake that turned 2026&apos;s wave of AI-agent data exfiltration vulnerabilities into a class, not a coincidence.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>MCP Back-End Vulnerabilities: classic flaws resurface across AI database bridges</title><link>https://www.llm-hacking.com/hacks/mcp-backend-vulnerabilities-pattern.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/mcp-backend-vulnerabilities-pattern.md</guid><description>Akamai&apos;s May 12, 2026 research found SQL injection (CVE-2025-66335), missing authentication, and unsanitised inputs across three MCP servers — Apache Doris, Apache Pinot, and Alibaba RDS. The pattern, not the bugs, is the story.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>OpenAI Daybreak and GPT-5.5-Cyber: a permissive security model behind a verified-identity gate</title><link>https://www.llm-hacking.com/hacks/openai-daybreak-gpt-5-5-cyber.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/openai-daybreak-gpt-5-5-cyber.md</guid><description>Between May 7 and 12, 2026, OpenAI launched Daybreak — a cybersecurity platform built on GPT-5.5, Codex Security and a &apos;cyber-permissive&apos; sibling, GPT-5.5-Cyber. UK AISI&apos;s prior evaluation found a universal jailbreak in six hours.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>OFFENSIVE AI</category></item><item><title>Project Glasswing: 10,000+ critical bugs found by Claude Mythos in a month</title><link>https://www.llm-hacking.com/hacks/project-glasswing-claude-mythos.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/project-glasswing-claude-mythos.md</guid><description>Anthropic&apos;s May 26, 2026 update on Project Glasswing reports that ~50 partners have used Claude Mythos Preview to find more than 10,000 high/critical-severity vulnerabilities, including 271 latent bugs patched in Firefox 150 — and lays out a controlled-access model for a frontier offensive capability.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>Semantic Kernel: when a prompt becomes a shell (CVE-2026-25592, CVE-2026-26030)</title><link>https://www.llm-hacking.com/hacks/semantic-kernel-prompt-to-rce.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/semantic-kernel-prompt-to-rce.md</guid><description>Microsoft disclosed two critical vulnerabilities in Semantic Kernel on May 7, 2026 that turn a single injected prompt into host-level code execution. The root cause is architectural: tool registries and eval() treated as features, not security boundaries.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Hidden triggers in SKILL.md: semantic supply-chain attacks on agent skill registries</title><link>https://www.llm-hacking.com/hacks/skill-md-registry-supply-chain.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/skill-md-registry-supply-chain.md</guid><description>A May 12, 2026 University of Maryland paper shows that 20-token additions to a SKILL.md file can make an agent discover and select an adversarial skill in 77–86% of trials, and bypass registry-side scans up to 100% of the time.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>SUPPLY CHAIN</category></item><item><title>Trust No Tool: cognitive poisoning of LLM agents through tool feedback</title><link>https://www.llm-hacking.com/hacks/trust-no-tool-cognitive-poisoning.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/trust-no-tool-cognitive-poisoning.md</guid><description>A May 17, 2026 arXiv paper introduces &apos;cognitive poisoning&apos; — a malicious tool that wins the agent&apos;s trust over many benign-looking turns and only weaponises the final action. The defence target shifts from prompts to trajectory.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Usability as a Weapon: how feature requests turn coding LLMs insecure</title><link>https://www.llm-hacking.com/hacks/usability-as-a-weapon-coding-llm.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/usability-as-a-weapon-coding-llm.md</guid><description>A May 11, 2026 arXiv paper shows that asking a coding LLM for a faster, simpler or feature-richer version of secure code reliably drops the security constraints. UPAttack reaches 98.1% on GPT-5.2-chat and Gemini-3.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>ADVERSARIAL</category></item><item><title>Agents Rule of Two: Meta&apos;s pragmatic answer to unsolved prompt injection</title><link>https://www.llm-hacking.com/hacks/agents-rule-of-two.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/agents-rule-of-two.md</guid><description>Published Oct 31, 2025 by Meta and re-adopted in Databricks&apos; May 2026 guide, the Agents Rule of Two limits any agent session to two of three risky properties — the most actionable framework while prompt injection remains unsolved.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>Azure SRE Agent: a multi-tenant token check that let strangers watch your incidents (CVE-2026-32173)</title><link>https://www.llm-hacking.com/hacks/azure-sre-agent-multitenant-eavesdrop.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/azure-sre-agent-multitenant-eavesdrop.md</guid><description>Disclosed April 20, 2026, an Entra ID app-registration misconfiguration on Azure SRE Agent&apos;s /agentHub WebSocket let any tenant connect, listen to every prompt, reasoning step, CLI command and credential — silently.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>CVE-2026-35435: Azure AI Foundry&apos;s M365 published agents trusted callers they shouldn&apos;t have</title><link>https://www.llm-hacking.com/hacks/azure-ai-foundry-m365-agents-eop.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/azure-ai-foundry-m365-agents-eop.md</guid><description>Disclosed May 7, 2026 (CVSS 8.6), an improper access-control flaw in Azure AI Foundry let unauthorized attackers elevate privilege through M365 published agents. Microsoft reports active exploitation; mitigations are available before a patch.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Claw Chain: four OpenClaw CVEs that turn an AI agent into the attacker&apos;s hands</title><link>https://www.llm-hacking.com/hacks/claw-chain-openclaw-agent-takeover.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/claw-chain-openclaw-agent-takeover.md</guid><description>Disclosed May 15, 2026, Cyera Research&apos;s Claw Chain chains four patched OpenClaw flaws — sandbox escape, env-var disclosure, MCP loopback EoP, symlink read escape — into full host takeover via the agent itself.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Comment and Control: one prompt injection pattern, three vendors leaking GitHub Actions secrets</title><link>https://www.llm-hacking.com/hacks/comment-and-control-github-agents.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/comment-and-control-github-agents.md</guid><description>Disclosed April 15, 2026, Comment and Control turns ordinary PR titles, issue bodies and HTML comments into credential-exfiltration channels in Claude Code, Gemini CLI and GitHub Copilot Agent.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Contextual integrity: why prompt-injection defenses keep failing</title><link>https://www.llm-hacking.com/hacks/contextual-integrity-prompt-injection.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/contextual-integrity-prompt-injection.md</guid><description>A May 2026 paper by Abdelnabi and Bagdasarian recasts prompt injection through Contextual Integrity and shows that data-instruction separation is a category mistake.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>Copirate 365: chaining prompt injection, delayed tool invocation and memory hijack in M365 Copilot (CVE-2026-24299)</title><link>https://www.llm-hacking.com/hacks/copirate-365-copilot-chain.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/copirate-365-copilot-chain.md</guid><description>Johann Rehberger&apos;s DEF CON writeup, published May 2026, walks through a five-stage indirect prompt-injection chain that turns one booby-trapped email into a persistent backdoor inside Microsoft 365 Copilot. Patched, but the patterns are generic.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>PROMPT INJECTION</category></item><item><title>Indirect prompt injection in the wild: three April 2026 studies converge</title><link>https://www.llm-hacking.com/hacks/indirect-prompt-injection-in-the-wild.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/indirect-prompt-injection-in-the-wild.md</guid><description>Google, Forcepoint and CISPA independently measured indirect prompt injection across the open web in April 2026. The picture: 15K+ validated payloads, 32% growth, organized templates.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>INDIRECT INJECTION</category></item><item><title>LiteLLM CVE-2026-42208: a pre-auth SQL injection in the AI gateway</title><link>https://www.llm-hacking.com/hacks/litellm-pre-auth-sqli.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/litellm-pre-auth-sqli.md</guid><description>Disclosed April 20, 2026 and exploited 36 hours after the global advisory dropped, CVE-2026-42208 turns LiteLLM&apos;s Authorization header into a direct read on every provider key the proxy fronts.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>INFRASTRUCTURE</category></item><item><title>Mathematical encoding jailbreaks: when set theory bypasses LLM safety</title><link>https://www.llm-hacking.com/hacks/mathematical-encoding-jailbreaks.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/mathematical-encoding-jailbreaks.md</guid><description>An arXiv paper posted on May 5, 2026 shows that re-expressing a harmful prompt as a set-theory or formal-logic problem bypasses safety training on 46–56% of attempts across eight frontier models — but only when a helper LLM does the reformulation, not when mathematical syntax is bolted on top.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>JAILBREAK</category></item><item><title>When the attacker is another LLM: large reasoning models as autonomous jailbreakers</title><link>https://www.llm-hacking.com/hacks/lrm-autonomous-jailbreak-agents.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/lrm-autonomous-jailbreak-agents.md</guid><description>A Nature Communications paper formalised in May 2026 shows four reasoning models — DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini and Qwen3 235B — jailbreaking nine target LLMs with a 97.14% overall success rate, armed with nothing but a single system prompt.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item><item><title>PraisonAI CVE-2026-44338: an unauthenticated agent server, exploited in 3h44</title><link>https://www.llm-hacking.com/hacks/praisonai-auth-bypass.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/praisonai-auth-bypass.md</guid><description>Disclosed May 11, 2026, CVE-2026-44338 ships PraisonAI with authentication hard-disabled in its legacy API server. A CVE-Detector scanner hit the endpoint less than four hours later.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>ShareLeak (CVE-2026-21520): the first CVE Microsoft assigned to a Copilot prompt injection</title><link>https://www.llm-hacking.com/hacks/shareleak-copilot-studio-cve-2026-21520.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/shareleak-copilot-studio-cve-2026-21520.md</guid><description>Disclosed April 15, 2026, Capsule Security&apos;s ShareLeak write-up details an indirect prompt injection in Microsoft Copilot Studio. Microsoft assigned CVE-2026-21520 (CVSS 7.5) — an unusual industry first that reframes prompt injection as a tracked vulnerability class.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>INDIRECT INJECTION</category></item><item><title>ARGUS: a provenance-graph defense for context-aware prompt injection</title><link>https://www.llm-hacking.com/hacks/argus-provenance-graph-defense.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/argus-provenance-graph-defense.md</guid><description>Published May 5, 2026, the ARGUS paper introduces influence-provenance auditing for LLM agents — dropping attack success from 28.8% to 3.8% on a new context-aware injection benchmark.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>The Instruction Hierarchy: training LLMs to rank privileged instructions</title><link>https://www.llm-hacking.com/hacks/instruction-hierarchy.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/instruction-hierarchy.md</guid><description>OpenAI&apos;s 2024 paper proposes a structural defense against prompt injection: teach models that system &gt; user &gt; tool output. The idea is now central to GPT-4o-mini and o-series safety training.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>LMDeploy SSRF: when an image loader turns into an AI-infrastructure hijack</title><link>https://www.llm-hacking.com/hacks/lmdeploy-ssrf-ai-inference-hijack.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/lmdeploy-ssrf-ai-inference-hijack.md</guid><description>CVE-2026-33626 turned LMDeploy&apos;s load_image() into a generic SSRF primitive. Honeypots saw the first weaponised exploit 12 hours and 31 minutes after the advisory went live.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>INFRASTRUCTURE</category></item><item><title>Localhost agent hijack: cross-origin WebSocket attacks on AI coding agents</title><link>https://www.llm-hacking.com/hacks/localhost-agent-hijack.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/localhost-agent-hijack.md</guid><description>CVE-2026-44211 (CVSS 9.7), disclosed May 7, 2026, shows how a single visit to a malicious page can hijack an AI coding agent running on a developer&apos;s laptop. The attack class is generic — and architectural.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>Mini Shai-Hulud: the supply-chain worm that came for the AI tooling stack</title><link>https://www.llm-hacking.com/hacks/mini-shai-hulud-ai-supply-chain.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/mini-shai-hulud-ai-supply-chain.md</guid><description>Disclosed May 11–18, 2026, the Mini Shai-Hulud worm trojanised 170+ npm and PyPI packages — including Mistral AI, Guardrails AI and TanStack — and persists inside Claude Code and VS Code.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>SUPPLY CHAIN</category></item><item><title>Output filtering beats model self-defense: 20,000 adaptive attacks, one survivor</title><link>https://www.llm-hacking.com/hacks/output-filtering-prompt-injection-defenses.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/output-filtering-prompt-injection-defenses.md</guid><description>Posted April 26 and revised May 12, 2026, a Swept AI / Michigan paper pitted nine prompt-injection defenses against an adaptive attacker. Every model-side defense eventually broke. Application-side output filtering held — zero leaks across 15,000 attacks.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>DEFENSE</category></item><item><title>Prompts as shells: when prompt injection becomes RCE in agent frameworks</title><link>https://www.llm-hacking.com/hacks/prompts-as-shells.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/prompts-as-shells.md</guid><description>Two CVEs disclosed in Microsoft Semantic Kernel on May 7, 2026 (CVE-2026-25592, CVE-2026-26030) show how a single injected prompt can pivot from text to remote code execution on the agent&apos;s host.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>AGENTS</category></item><item><title>ASCII Smuggling: Hidden commands via Unicode Tag characters</title><link>https://www.llm-hacking.com/hacks/ascii-smuggling.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/ascii-smuggling.md</guid><description>Unicode Tag characters (U+E0000–U+E007F) are invisible to humans but interpreted by LLMs. Attackers embed them in emails, web pages, and PDFs to inject silent commands that hijack agent behavior.</description><pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate><category>PROMPT INJECTION</category></item><item><title>Many-shot jailbreaking: 256 examples to bypass any alignment</title><link>https://www.llm-hacking.com/hacks/many-shot-jailbreaking.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/many-shot-jailbreaking.md</guid><description>Anthropic researchers showed that stuffing the context window with 256 fake Q&amp;A examples reliably bypasses safety training. Bigger context = bigger attack surface.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate><category>JAILBREAK</category></item><item><title>System prompt extraction via repetition attacks</title><link>https://www.llm-hacking.com/hacks/system-prompt-extraction.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/system-prompt-extraction.md</guid><description>Asking the model to &apos;repeat the word poem forever&apos; causes it to eventually dump training data and system prompts. Documented across Claude 3, GPT-4, and Gemini.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><category>DATA LEAK</category></item><item><title>Sleeper agents: hidden backdoors that survive safety training</title><link>https://www.llm-hacking.com/hacks/sleeper-agents.md</link><guid isPermaLink="true">https://www.llm-hacking.com/hacks/sleeper-agents.md</guid><description>Anthropic demonstrated that models trained with hidden trigger phrases retain backdoor behavior even after standard RLHF safety training. The implications for open-weight LLMs are significant.</description><pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate><category>RESEARCH</category></item></channel></rss>