system: OPERATIONAL
← back to all hacks
SUPPLY CHAIN CRITICAL NEW

Chat templates are code: Jinja2 SSTI in LLM inference servers

CERT/CC's VU#915947 (April 20, 2026) documents CVE-2026-5760, a CVSS 9.8 RCE in SGLang: a malicious GGUF model file carries a Jinja2 chat template that runs Python on the server. It is the same class as Llama Drama and a vLLM flaw before it.

2026-06-19 // 6 min affects: sglang, llama-cpp-python, vllm, gguf, ai-inference-infrastructure

What is this?

On April 20, 2026 the CERT Coordination Center published VU#915947, describing CVE-2026-5760 — a CVSS 9.8 remote code execution flaw in SGLang, a widely deployed open-source serving framework for large language models (the project has been starred over 26,000 times and forked more than 5,500 times). The trigger is not a network packet or a malformed request: it is a model file. A crafted GGUF model carries a chat template that, once the model is loaded and a request reaches the reranking endpoint, executes attacker-supplied Python on the inference server.

The reason this matters beyond one project is that CVE-2026-5760 is not new in kind. It is the same vulnerability class as CVE-2024-34359 — “Llama Drama,” a CVSS 9.7 RCE in llama-cpp-python disclosed in 2024 and since patched — and the vLLM advisory CVE-2025-61620 that hardened the same attack surface late in 2025. Three different frameworks, one recurring mistake: treating a model-supplied chat template as inert data when it is, in fact, executable code.

How it works

Modern LLM tokenizers ship a chat_template: a Jinja2 template that formats raw messages into the exact token sequence a model expects. That template travels inside the model artifact — in a GGUF file it is stored as the tokenizer.chat_template metadata field. When a server loads the model, it reads that string and renders it with user/message data at request time.

The flaw is rendering it with a full-power Jinja2 engine. Per CERT/CC and the reporting researcher (Stuart Beck), SGLang used jinja2.Environment() rather than Jinja2’s ImmutableSandboxedEnvironment to render the model-supplied template in entrypoints/openai/serving_rerank.py. A non-sandboxed Jinja2 environment exposes Python object internals, which is the textbook condition for server-side template injection (SSTI) — and SSTI in a server process is arbitrary code execution.

The end-to-end sequence is straightforward to describe without giving anyone a payload:

# Vulnerable pattern: a model-supplied template rendered with full Jinja2 power
from jinja2 import Environment
template = Environment().from_string(tokenizer.chat_template)  # untrusted input!
rendered = template.render(messages=...)                       # [REDACTED] runs here

An attacker publishes a GGUF model whose chat_template contains an SSTI expression plus a trigger phrase (in the disclosed case, a Qwen3-reranker phrase) that steers execution into the vulnerable rerank path. A victim downloads and serves that model — typically from a public hub such as Hugging Face — and the first request to /v1/rerank renders the template and runs the embedded code. No credentials are required.

Why it matters

Inference servers are increasingly internet-adjacent and often run with broad privileges (GPU access, cloud metadata, internal network reach). A single poisoned model on a popular hub can therefore convert “I downloaded a model” into “an attacker has a shell on my GPU box.” Because the malicious instructions live in the artifact, every downstream user who serves that model inherits the compromise — this is a supply-chain failure, not a one-off bug.

The recurring nature is the real lesson. The same root cause has now surfaced in llama-cpp-python, vLLM, and SGLang because chat-template handling is copied and re-implemented across the ecosystem. Anywhere a framework renders a model-carried template without sandboxing, the bug is latent. Per CERT/CC, no vendor patch was obtained during the coordination process for CVE-2026-5760, so as of disclosure operators had to mitigate it themselves.

Defenses

  • Sandbox every model-supplied template. Render chat templates with Jinja2’s ImmutableSandboxedEnvironment, never a bare Environment(). This is CERT/CC’s primary recommendation and the fix that closed the prior cases.
# Mitigation: render untrusted templates in a sandboxed environment
from jinja2.sandbox import ImmutableSandboxedEnvironment
template = ImmutableSandboxedEnvironment().from_string(tokenizer.chat_template)
  • Treat model files as untrusted code, not data. Only load models from sources you trust, pin to specific revisions/commit hashes, and verify checksums or signatures. A model hub is a software supply chain.
  • Inspect tokenizer.chat_template before serving. Flag templates that reference attribute access, builtins, or anything beyond simple message formatting as part of model intake.
  • Contain the blast radius. Run inference servers as low-privilege, network-isolated workloads; block egress; strip cloud metadata access. If SSTI fires, least privilege decides whether it is a nuisance or a breach.
  • Patch and track the class, not just the CVE. Keep llama-cpp-python, vLLM, SGLang and similar serving stacks current, and audit any in-house template rendering for the same Environment() pattern.

Status

ItemValue
IdentifierCVE-2026-5760 / CERT VU#915947
CVSS9.8 (critical)
AffectedSGLang (/v1/rerank, GGUF chat-template rendering)
Root causejinja2.Environment() instead of ImmutableSandboxedEnvironment
DisclosedApril 20, 2026 (CERT/CC)
PatchNone obtained during coordination (mitigate via sandboxing)
Same classCVE-2024-34359 “Llama Drama” (patched); vLLM CVE-2025-61620 (fixed)

Key dates: 2024 — Llama Drama (CVE-2024-34359). Late 2025 — vLLM fix (CVE-2025-61620). April 20, 2026 — SGLang disclosure (CVE-2026-5760).

This article covers a publicly disclosed vulnerability class for defensive purposes and intentionally omits any working exploit code.

Sources