← back to categories
JAILBREAK
(3)3 hack(s).
JAILBREAK MEDIUM NEW
Sockpuppeting: a one-line prefill that jailbreaks 11 production LLMs
A line of code injected as the last assistant message coaxes 7 of 10 major models into harmful completions. The fix is not at the model — it is API-side message-order validation.
2026-05-28//7 min
JAILBREAK MEDIUM
Mathematical encoding jailbreaks: when set theory bypasses LLM safety
An arXiv paper posted on May 5, 2026 shows that re-expressing a harmful prompt as a set-theory or formal-logic problem bypasses safety training on 46–56% of attempts across eight frontier models — but only when a helper LLM does the reformulation, not when mathematical syntax is bolted on top.
2026-05-25//7 min
JAILBREAK CRITICAL
Many-shot jailbreaking: 256 examples to bypass any alignment
Anthropic researchers showed that stuffing the context window with 256 fake Q&A examples reliably bypasses safety training. Bigger context = bigger attack surface.
2026-05-15//6 min