system: OPERATIONAL
← back to categories

DEFENSE

(8)

8 hack(s).

DEFENSE MEDIUM NEW

One million exposed AI services: what the Intruder scan actually found

On May 5, 2026, Intruder published the results of an internet-wide scan that mapped 1 million exposed AI services across 2 million hosts. The recurring failure is not exotic — it is permissive defaults.

2026-05-29//7 min
DEFENSE MEDIUM NEW

MCP needs a trust handshake: attested tool-server admission

A May 22, 2026 arXiv paper proposes mcp-attested — a backward-compatible MCP extension that gates tool dispatch on signed clearance, deny-by-default allowlists, and tamper-evident audit logs.

2026-05-29//6 min
DEFENSE MEDIUM NEW

WARD: a co-evolved guard model that holds up against adaptive prompt injection on web agents

A May 14, 2026 NUS paper proposes WARD — a guard model trained against a memory-driven adversarial attacker — and reports near-perfect out-of-distribution recall on web-agent prompt injection.

2026-05-29//7 min
DEFENSE MEDIUM

Project Glasswing: 10,000+ critical bugs found by Claude Mythos in a month

Anthropic's May 26, 2026 update on Project Glasswing reports that ~50 partners have used Claude Mythos Preview to find more than 10,000 high/critical-severity vulnerabilities, including 271 latent bugs patched in Firefox 150 — and lays out a controlled-access model for a frontier offensive capability.

2026-05-26//7 min
DEFENSE MEDIUM

Agents Rule of Two: Meta's pragmatic answer to unsolved prompt injection

Published Oct 31, 2025 by Meta and re-adopted in Databricks' May 2026 guide, the Agents Rule of Two limits any agent session to two of three risky properties — the most actionable framework while prompt injection remains unsolved.

2026-05-25//6 min
DEFENSE MEDIUM

ARGUS: a provenance-graph defense for context-aware prompt injection

Published May 5, 2026, the ARGUS paper introduces influence-provenance auditing for LLM agents — dropping attack success from 28.8% to 3.8% on a new context-aware injection benchmark.

2026-05-22//7 min
DEFENSE MEDIUM

The Instruction Hierarchy: training LLMs to rank privileged instructions

OpenAI's 2024 paper proposes a structural defense against prompt injection: teach models that system > user > tool output. The idea is now central to GPT-4o-mini and o-series safety training.

2026-05-22//7 min
DEFENSE MEDIUM

Output filtering beats model self-defense: 20,000 adaptive attacks, one survivor

Posted April 26 and revised May 12, 2026, a Swept AI / Michigan paper pitted nine prompt-injection defenses against an adaptive attacker. Every model-side defense eventually broke. Application-side output filtering held — zero leaks across 15,000 attacks.

2026-05-22//6 min