SUPPLY CHAIN MEDIUM NEW

Slopsquatting in 2026: 127 package names that all five frontier LLMs hallucinate

A May 16, 2026 arXiv replication of the USENIX Security '25 slopsquatting study finds hallucination rates are down across frontier models — but identifies 127 phantom packages that every tested model invents identically, a model-agnostic supply-chain attack surface.

2026-05-29 // 6 min affects: claude-sonnet-4-6, claude-haiku-4-5, gpt-5.4-mini, gemini-2.5-pro, deepseek-v3.2, pypi, npm

What is this?

On May 16, 2026, an independent researcher published The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort on arXiv (2605.17062). The paper replicates the methodology of Spracklen et al. (USENIX Security ‘25, arXiv 2406.10279) on five frontier code-capable models released between October 2025 and March 2026: Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.4-mini, Gemini 2.5 Pro, and DeepSeek V3.2. The headline is dual. Overall package-hallucination rates have fallen by roughly an order of magnitude compared to the 2024 cohort, but the slopsquatting attack surface has not gone away — and it has acquired a new, model-agnostic shape.

How it works

Slopsquatting is the registration, on a public registry such as PyPI or npm, of a package name that a code-generating LLM is known to hallucinate. The term was coined in April 2025 by Seth Larson (PSF Developer-in-Residence) and popularised by Andrew Nesbitt; it is a portmanteau of “AI slop” and “typosquatting.” When a developer copies a generated snippet that begins with pip install <hallucinated-name> or npm install <hallucinated-name>, the attacker-registered package is installed and its post-install script runs in the developer’s environment.

The 2026 replication paired 199,845 Python and JavaScript prompts, queried each model, and validated every imported name against the live PyPI and npm master lists. Three findings stand out.

First, the spread between models has collapsed. Spracklen measured 5.2% hallucination on commercial LLMs and 21.7% on open-source ones in 2024. The 2026 cohort sits between 4.62% (Claude Haiku 4.5) and 6.10% (GPT-5.4-mini) — roughly a tenfold compression of the inter-model spread, but still well above zero.

Second, 127 package names — 109 on PyPI, 18 on npm — were invented identically by all five models. A single-model audit cannot find these: only a cross-model study reveals the intersection. For an attacker, that intersection is the highest-value target list, because a malicious package registered under one of those names is reachable from any of the five assistants without per-model tuning.

Third, the 2026 paper measures a Python-over-JavaScript hallucination asymmetry that inverts the 2024 ordering, a Haiku-below-Sonnet inversion inside the Anthropic family (smaller model hallucinates less), and a Jaccard similarity peak of J = 0.343 between DeepSeek V3.2 and GPT-5.4-mini, suggestive of shared training-data origins.

The original Spracklen work already documented the precondition that makes slopsquatting practical: 58% of hallucinated package names re-appear across repeated generations, and 43% appear in every one of ten reruns. Hallucinations are not noise; they are repeatable artefacts that an attacker can enumerate cheaply.

Why it matters

The empirical case for slopsquatting was first proven in 2023 when Bar Lanyado (Lasso Security) registered the empty package huggingface-cli, a name LLMs hallucinated in place of the correct huggingface_hub[cli]. The empty package received over 30,000 downloads in three months, and even appeared in an Alibaba research repository’s README.

The 2026 result adds a structural concern. Even as frontier models converge toward sub-7% hallucination rates, they appear to converge on the same invented names. A small, stable, cross-model namespace of phantom packages is precisely the kind of attack surface that scales with adoption: every new agentic coding tool, every new “vibe coding” workflow that auto-installs dependencies without human review, inherits the same 127 names.

Defenses

Treat AI-generated import/require statements as untrusted input. Pin every dependency in a lockfile (requirements.txt with hashes, package-lock.json, pnpm-lock.yaml, uv.lock) and verify it against the registry before the first install in a new project. Lockfiles with hash IDs neutralise a freshly registered slopsquat because the hash will not match.

Reject install actions issued by an agent without explicit allowlisting. Coding agents that can run pip install or npm install should be configured to require a human-reviewed allow-list of package names, or to install only from an internal mirror that has pre-vetted packages.

Use a registry proxy with a quarantine window. Internal PyPI/npm proxies (Sonatype Nexus, JFrog Artifactory, Artifact Registry, internal pypi-mirror) can be configured to delay the visibility of newly published packages by 7–30 days, draining most slopsquatting attempts before they reach developers.

Run a supply-chain scanner on every dependency tree. Tools like Socket, Snyk, Phylum, and OSV-Scanner flag packages with install scripts, obfuscated code, recent registration, low download counts, or maintainer anomalies — the operational signature of a freshly registered slopsquat.

Verify package names by hand for any new dependency proposed by an LLM. A 5-second check on pypi.org/project/<name> or npmjs.com/package/<name> rules out the entire class. The Churilov paper’s released dataset (Zenodo 10.5281/zenodo.19859120) lists the 127 cross-model phantoms; treat them as a denylist.

Do not rely on the model to self-detect. Spracklen found GPT-4 Turbo and DeepSeek could flag their own hallucinated names with ~75% accuracy when asked, but that leaves 25% through — and adversarial users do not ask. Application-layer checks are the correct trust boundary.

Status

Item	Reference	Date	Notes
Original study	Spracklen et al., arXiv 2406.10279 / USENIX Security ‘25	2024-06 (preprint) / 2025-08 (proceedings)	16 LLMs, 576,000 samples, 19.7% mean hallucination
Term coined	Seth Larson (PSF), Andrew Nesbitt	2025-04	”Slopsquatting” portmanteau
2026 replication	Churilov, arXiv 2605.17062	2026-05-16	5 frontier models, 199,845 prompts, 4.62–6.10% range
Cross-model phantom set	Churilov dataset (Zenodo)	2026-05-16	127 names (109 PyPI, 18 npm) invented by all five models
Observed in-the-wild exploit	Bar Lanyado / `huggingface-cli` PoC	2023-06	30,000+ downloads of an empty stand-in package
Mapped frameworks	OWASP LLM05 (Improper Output Handling), OWASP LLM03 (Supply Chain), MITRE ATLAS AML.T0010	2026	Insecure output → package installation

The compression of hallucination rates across frontier models is a real win for safety teams at Anthropic, OpenAI, Google and DeepSeek. The 127-package shared intersection is a reminder that aligned models converge in their errors as well as their answers, and that the supply-chain layer — registries, lockfiles, scanners, proxies — is where this class of attack has to be stopped.