INFRASTRUCTURE CRITICAL

LMDeploy SSRF: when an image loader turns into an AI-infrastructure hijack

CVE-2026-33626 turned LMDeploy's load_image() into a generic SSRF primitive. Honeypots saw the first weaponised exploit 12 hours and 31 minutes after the advisory went live.

2026-05-22 // 6 min affects: lmdeploy, internlm-xcomposer, vision-language-models, gpu-inference-nodes

What happened

On April 21, 2026, the InternLM team published advisory GHSA-6w67-hwm5-92mq for CVE-2026-33626, a Server-Side Request Forgery in LMDeploy, a popular open-source toolkit for serving vision-language and large language models. The bug sits in load_image() (lmdeploy/vl/utils.py): the function fetches any URL supplied in a vision request without checking whether the destination is link-local, loopback, or in an RFC1918 range. NIST rates the issue CVSS 7.5 (High). All versions up to and including 0.12.2 with vision-language support are vulnerable; 0.12.3 ships the fix.

The story is not the bug itself — SSRF is textbook — but the timing. Sysdig’s Threat Research Team observed the first weaponised exploit attempt 12 hours and 31 minutes after the GitHub advisory went live, from IP 103.116.72[.]119. In a single eight-minute session, the attacker turned the image loader into a generic HTTP SSRF primitive and probed the GPU node’s internal surface.

How it works

LMDeploy exposes vision-language endpoints that accept a JSON body referencing an image. The library passes that URL to load_image(), which calls requests.get with no allow-list, deny-list, or DNS pinning. Anything the inference container can reach, the attacker can reach through the model API.

# Conceptual sketch from the public advisory and Sysdig write-up.
# DO NOT use against any system you do not own.
POST /v1/chat/completions  HTTP/1.1
Host: model.example.com
Content-Type: application/json

{
  "model": "internvl-2-8b",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text",      "text": "describe this image"},
      {"type": "image_url", "image_url": {
         "url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"
      }}
    ]
  }]
}

The reproduced exploit chain from Sysdig’s honeypot traffic touched five targets in sequence: AWS Instance Metadata Service (IMDS) at 169.254.169.254, an internal Redis port, MySQL on the same VPC subnet, a secondary HTTP administrative interface, and an out-of-band DNS exfiltration endpoint to leak whatever the loader’s error messages echoed back. Because vision-LLM nodes typically run on GPU instances with broad IAM roles — S3 model artefacts, training datasets, sometimes cross-account assume-role — one successful IMDS fetch is sufficient to pivot to the cloud account.

The published proof-of-concept is the simplest possible HTTP request; there is no novel cryptographic trick or model-side exploit. The point of writing about it is that an entire category of AI-serving infrastructure has been quietly running a classic 2010-era webapp bug behind a 2026 GPU front-end.

Why this matters

Three properties make this incident worth a separate writeup.

The primitive is AI-specific but the impact is cloud-wide. A vision endpoint that accepts a URL is, structurally, an HTTP proxy with credentials. Anyone hosting open-source inference behind a public route inherits whatever the GPU node can reach — and that surface is large by design (model storage buckets, telemetry sinks, observability endpoints, internal tool APIs).

The time-to-exploit was effectively zero. 12h 31min is below the cycle time of most patch-management workflows. Sysdig framed this as a watershed: attackers are now polling GitHub advisories for AI-infrastructure repos and weaponising disclosures in less than a working day. Treating GPU-serving stacks like research code rather than production web services is no longer viable.

Vision-LLM nodes are credential warehouses. The same instance role that lets the inference server stream a 70B model from S3 also reaches IMDS. There is no “minimum privilege” default in the most-deployed Helm charts and Docker recipes; teams ship the AWS-managed broad role because it works.

Defenses

There is no model-side mitigation. The fixes are infrastructure and operational.

Upgrade LMDeploy to 0.12.3 or later for the URL-validation patch. Re-check every fork and internal vendored copy — the affected function is lmdeploy/vl/utils.py:load_image. The advisory is GHSA-6w67-hwm5-92mq.
Enforce IMDSv2 with hop-limit 1 on every GPU/inference instance. IMDSv2 forces a token round-trip that SSRF primitives without header control cannot perform.
Egress-allow-list outbound traffic from GPU and inference nodes. Outbound HTTP from a model server should reach a small, named list of endpoints: object storage, log sinks, model telemetry. Block everything else, including link-local and RFC1918 by default.
Scope IAM roles per workload. A read role for model artefacts in S3 is sufficient for inference. Do not attach the same role used for training, fine-tuning, or pipeline orchestration to the public serving node.
Treat vision endpoints as untrusted file fetchers. Any inference service that resolves URLs on the server side (vision, RAG, scraping tools, document QA) should run its fetch in an isolated subnet with no metadata, no internal-network reachability, and no shared IAM. The same advice applies to vllm, Triton, Text Generation Inference, and any tool plug-in that takes a URL.
Subscribe to GitHub Security Advisories for AI-infra repos (InternLM/lmdeploy, vllm-project/vllm, huggingface/text-generation-inference, triton-inference-server/server, BerriAI/litellm, langchain-ai/langchain, microsoft/semantic-kernel). Sysdig’s measurement is the working assumption now: assume hours, not days, between advisory and exploit.

Status

Item	Date	Status
GitHub advisory GHSA-6w67-hwm5-92mq published	2026-04-21	Public
First in-the-wild exploit observed (Sysdig honeypots)	2026-04-22	Confirmed
Patch released — LMDeploy 0.12.3	2026-04-21	Available
CVSS rating	—	7.5 High
Affected versions	≤ 0.12.2 with `lmdeploy[vl]`	Vulnerable

The wider lesson, beyond LMDeploy specifically: AI inference stacks are now treated by attackers as ordinary high-value web services. The defence playbook is also ordinary — egress controls, IMDSv2, scoped IAM, prompt patching — but it has to actually be applied to the boxes running the models, not just to the application tier in front of them.