DEFENSE
(8)8 hack(s).
One million exposed AI services: what the Intruder scan actually found
On May 5, 2026, Intruder published the results of an internet-wide scan that mapped 1 million exposed AI services across 2 million hosts. The recurring failure is not exotic — it is permissive defaults.
MCP needs a trust handshake: attested tool-server admission
A May 22, 2026 arXiv paper proposes mcp-attested — a backward-compatible MCP extension that gates tool dispatch on signed clearance, deny-by-default allowlists, and tamper-evident audit logs.
WARD: a co-evolved guard model that holds up against adaptive prompt injection on web agents
A May 14, 2026 NUS paper proposes WARD — a guard model trained against a memory-driven adversarial attacker — and reports near-perfect out-of-distribution recall on web-agent prompt injection.
Project Glasswing: 10,000+ critical bugs found by Claude Mythos in a month
Anthropic's May 26, 2026 update on Project Glasswing reports that ~50 partners have used Claude Mythos Preview to find more than 10,000 high/critical-severity vulnerabilities, including 271 latent bugs patched in Firefox 150 — and lays out a controlled-access model for a frontier offensive capability.
Agents Rule of Two: Meta's pragmatic answer to unsolved prompt injection
Published Oct 31, 2025 by Meta and re-adopted in Databricks' May 2026 guide, the Agents Rule of Two limits any agent session to two of three risky properties — the most actionable framework while prompt injection remains unsolved.
ARGUS: a provenance-graph defense for context-aware prompt injection
Published May 5, 2026, the ARGUS paper introduces influence-provenance auditing for LLM agents — dropping attack success from 28.8% to 3.8% on a new context-aware injection benchmark.
The Instruction Hierarchy: training LLMs to rank privileged instructions
OpenAI's 2024 paper proposes a structural defense against prompt injection: teach models that system > user > tool output. The idea is now central to GPT-4o-mini and o-series safety training.
Output filtering beats model self-defense: 20,000 adaptive attacks, one survivor
Posted April 26 and revised May 12, 2026, a Swept AI / Michigan paper pitted nine prompt-injection defenses against an adaptive attacker. Every model-side defense eventually broke. Application-side output filtering held — zero leaks across 15,000 attacks.