system: OPERATIONAL
← back to all hacks
INFRASTRUCTURE CRITICAL NEW

LightLLM CVE-2026-26220: pickle on a WebSocket the server forces onto the network

CVE-2026-26220 (disclosed Feb 15, 2026) puts pickle.loads() on two unauthenticated WebSocket endpoints in LightLLM's prefill-decode mode — and the server refuses to bind to localhost, so the surface is always remote.

2026-06-02 // 6 min affects: lightllm, vllm, gpu-inference-nodes, ml-serving-infrastructure

What is this?

On February 15, 2026, security researcher Valentin Lobstein (Chocapikk) publicly disclosed an unauthenticated remote code execution flaw in LightLLM, a Python LLM inference engine (~3,900 GitHub stars). Tracked as CVE-2026-26220 and assigned by VulnCheck on Feb 16, the bug is rated CVSS 4.0 9.3 (Critical) and classified CWE-502 — Deserialization of Untrusted Data. It affects LightLLM ≤ 1.1.0.

The flaw is a textbook one with a modern twist. LightLLM’s prefill-decode (PD) disaggregation mode exposes two WebSocket endpoints that call pickle.loads() on raw binary frames with no authentication — and the server explicitly refuses to bind to localhost, so in PD mode the attack surface is always network-reachable by design. It belongs to the same recurring class as vLLM’s CVE-2025-32444 (CVSS up to 10.0), where untrusted pickle over a node-to-node channel produced identical results.

How it works

Python’s pickle module is a serializer that, by design, can reconstruct arbitrary objects — including objects whose reconstruction runs code. Feeding attacker-controlled bytes to pickle.loads() is equivalent to handing the process a script to execute. This is not a LightLLM-specific subtlety; it is a documented property of pickle and the reason CWE-502 exists.

LightLLM’s PD mode splits inference across nodes: a PD master orchestrates worker registration and KV-cache transfers between separate prefill and decode GPU nodes. Workers connect to the master over WebSocket to register and report status. Per the disclosure and the CVE tracking issue, the deserialization sits in lightllm/server/api_http.py:

# /pd_register  (api_http.py ~line 310)
data = await websocket.receive_bytes()
obj = pickle.loads(data)          # untrusted network bytes -> object graph

# /kv_move_status  (api_http.py ~line 331) — same pattern
upkv_status = pickle.loads(data)

Two further pickle.loads() calls live in the worker-side PD loop. Reaching them requires no credentials: /pd_register asks for a JSON registration frame first, but the node_id is an unvalidated integer and mode is just a string check — neither is authentication. /kv_move_status takes pickle on the very first frame.

The aggravating detail is the deployment constraint baked into startup:

assert manager.args.host not in ["127.0.0.1", "localhost"]

The master must bind to a routable interface, because remote workers need to reach it. There is no “safe” loopback-only configuration for PD mode. A working proof-of-concept exists in the researcher’s writeup; the mechanism is the standard pickle reduce gadget — a crafted object whose deserialization invokes a system command [REDACTED] — and we do not reproduce a runnable payload here. The structural point is sufficient: any host that can open a socket to the PD master gets code execution as the serving process, before any model inference happens.

Why it matters

This is not a clever new attack — it is a high-impact instance of a class the ML-serving ecosystem keeps re-shipping. The same root cause (pickle on a trusted-by-assumption internal channel) produced vLLM’s CVE-2025-32444. The pattern recurs because disaggregated serving multiplies node-to-node channels, and each one that speaks pickle is a latent RCE.

The blast radius is the inference tier itself: GPU nodes that typically hold model weights, API keys, cloud credentials, and a privileged position inside the network. An attacker who lands code on a PD master is already past the perimeter that matters. And because PD master endpoints are by-design network-exposed, an instance reachable from an untrusted segment (a flat internal network, an over-broad security group, an exposed cluster port) is exploitable without a single credential.

There is also a process lesson. The disclosure notes that the project had received earlier deserialization reports — #784 (ZMQ recv_pyobj, March 2025) and a private report in #1102 (November 2025) — that went unresolved, which is why this one was escalated to a CVE rather than handled quietly. Treat fast-moving ML infrastructure as security-immature by default: the code velocity is high, the security review cadence often is not.

Defenses

You do not need a vendor fix to neutralize this. The mitigations are standard infrastructure hygiene and apply to any pickle-on-the-wire situation, not just LightLLM:

  1. Do not expose PD endpoints to untrusted networks. Since the master cannot bind to localhost, confine it with the layer below: bind to a dedicated private interface, restrict the port to known worker IPs via firewall / security-group rules, and put PD traffic on an isolated VLAN or mesh. Never let /pd_register or /kv_move_status be reachable from a general-purpose subnet.

  2. Authenticate node-to-node channels. Worker registration and KV-status are inter-node IPC, not public API. Front them with mutual TLS (client certificates) or a shared-secret/token check so an unregistered peer cannot even open the socket.

  3. Replace pickle for network IPC. The data exchanged here is simple structured state (ints, strings, dicts). JSON, MessagePack, or protobuf carry it without granting code execution. This is the durable fix the maintainers were asked to make in #1213.

  4. If pickle is unavoidable, constrain it. Use a RestrictedUnpickler with an explicit allowlist of safe classes and reject everything else, and/or wrap frames in HMAC signatures so the server only deserializes messages it can verify.

  5. Detect the deployment, then the abuse. Inventory any LightLLM running with --run_mode pd_master | prefill | decode. Alert on inbound connections to PD ports from outside the worker allowlist, and on inference processes spawning shells (sh -c, bash, os.system-style children) — the high-signal runtime trace of a deserialization gadget firing.

  6. Generalize the audit. This is a fleet-wide class, not one CVE. Grep your serving stack for pickle.loads, recv_pyobj, torch.load(...) on untrusted input, and joblib loads on any socket, queue, or HTTP boundary. Every one is a candidate CWE-502.

Status

ItemReferenceDateNotes
Public disclosure + PoCChocapikk (V. Lobstein)2026-02-15WebSocket pickle.loads() in PD mode
CVE assignedVulnCheck advisory2026-02-16CVSS 4.0 9.3, CWE-502, affects ≤ 1.1.0
Tracking + proposed fixGitHub issue #12132026-02Maintainer fix requested; project has a history of slow security response
Same class, peer projectCVE-2025-32444 (vLLM)2025Pickle over node IPC, CVSS up to 10.0

The honest framing: CVE-2026-26220 is not novel — it is the ML-serving world re-learning that pickle on an unauthenticated network socket is remote code execution. Until inter-node transports drop pickle, the defense is yours to own: isolate the PD plane, authenticate the peers, and assume any serializer that can rebuild arbitrary objects will, eventually, rebuild an attacker’s.

Sources