> welcome to the underbelly

Every known way to break a Large Language Model.

Open database of 618 documented LLM attacks. Jailbreaks, prompt injections, data extraction, adversarial inputs. Updated daily, sourced from arXiv and the wild.

$ browse hacks → What is this?

~ 618 EXPLOITS DETECTED ~

618

Hacks documented

Featured hack

see archive →

INFRASTRUCTURE CRITICAL NEW

Unauthenticated RCE in llama.cpp's distributed-inference RPC backend

A missing bounds check in llama.cpp's RPC backend lets any client with TCP access to the server port read and write process memory and reach remote code execution. Fixed in b8492.

2026-07-10 // 6 min

Read full breakdown →

# example prompt — illustrative, defensive

# llama.cpp RPC graph-compute RCE (illustrative, defensive)
# The tensor parser only bounds-checks data when buffer != 0:
if tensor.buffer:              # attacker sets buffer = 0 to skip this
    validate(tensor.data)      # [bounds check lives only here]
result.data = tensor.data      # [payload] taken from the wire unconditionally
# Root cause: reachability == compromise; the RPC protocol has no auth.
# Defense: upgrade to build b8492+, bind ggml-rpc-server to 127.0.0.1,
# never publish port 50052, and tunnel nodes over mTLS/WireGuard.

Recent

all hacks (618) →

RESEARCH MEDIUM NEW

When one agent red-teams another: a vulnerability concept graph for coding agents

A July 13, 2026 paper shows one research agent probing production coding agents, then storing what it learns as reusable, falsifiable concepts — a durable artifact for safety teams, not another one-off exploit.

2026-07-17//6 min

DEFENSE LOW NEW

DT-Guard: a guardrail that reasons in training, stays fast at inference

A July 2026 paper trains a content-safety guardrail on reasoning traces but drops them at inference — emitting only structured labels, keeping latency low while reaching F1 near 0.88.

2026-07-17//6 min

AGENTS MEDIUM NEW

How account-synced preferences can hijack Claude Desktop's local tools

Pentera showed an attacker with account access can hide instructions in Claude Desktop's synced Personal Preferences to drive its local tools into running attacker commands.

2026-07-17//6 min

PROMPT INJECTION CRITICAL NEW

Drive-by prompt injection: a website could silently command Copilot on mobile

Microsoft patched a critical flaw on 14 July 2026 in which a malicious webpage could make Edge for Android fire hidden prompts at the Copilot app — no confirmation, no origin check.

2026-07-17//6 min

DEFENSE CRITICAL NEW

When hosted-model guardrails lock out the defenders: lessons from an agentic intrusion

Hugging Face disclosed on 16 July 2026 that an autonomous AI agent breached its infrastructure — and that commercial model guardrails blocked its own responders from analysing the attack.

2026-07-17//6 min

RESEARCH MEDIUM NEW

Why one refusal switch can't tell a pentester from an attacker

A July 2026 paper shows LLM safety refusal isn't a single switch but a subspace spread across layers — domain-blind, prone to blocking legitimate security work, and separable in open weights.

2026-07-17//6 min

> subscribe to /var/log/hacks

One weekly digest of new attacks.

Every Monday morning. Curated hacks, key papers, defense techniques. No spam, no clickbait. Unsubscribe in one click.