system: OPERATIONAL
← back to all hacks
SUPPLY CHAIN MEDIUM

Hidden triggers in SKILL.md: semantic supply-chain attacks on agent skill registries

A May 12, 2026 University of Maryland paper shows that 20-token additions to a SKILL.md file can make an agent discover and select an adversarial skill in 77–86% of trials, and bypass registry-side scans up to 100% of the time.

2026-05-26 // 7 min affects: claude-code, openclaw, clawhub, agent-skill-registries, skill-marketplaces

What is this?

On May 12, 2026, Shoumik Saha, Kazem Faghih and Soheil Feizi (University of Maryland) posted Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry on arXiv (2605.11418, cs.CR). The paper studies a class of attack that does not need a single byte of executable code: short, plausible-looking natural-language additions to the SKILL.md metadata file are enough to bias which skill an agent discovers, which one it selects among alternatives, and whether the registry’s safety scan accepts it.

The work matters because SKILL.md is now an operational artifact across the agent ecosystem — Claude Code, OpenClaw, Microsoft Agent Framework, Codex skills — and not passive documentation. Agents read it at discovery time (to match a query against available skills), at selection time (to choose between functionally similar skills), and at install time (when a registry runs its safety checks). Each of those reads is a place where a textual modification can change behaviour. The Register covered the paper on May 22, 2026 under the headline Minor edits to AI skills can make agents go rogue.

How it works

The authors evaluate three stages of the registry lifecycle against real ClawHub skills and a realistic ranking function that combines lexical relevance, vector similarity (OpenAI embeddings) and download popularity. They modify only the SKILL.md text — never the underlying script, dependencies or runtime — and measure how often a malicious twin of an existing skill wins.

Stage              SKILL.md surface used        Adversarial goal
-----------------  --------------------------   --------------------------------
Discovery          name, description, tags      Be retrieved for a user query
Selection          when-to-use, examples        Win over the legitimate twin
Vetting            full body, license, "safety" Pass the registry's scanner

Three findings drive the paper.

  • Discovery. A ~20-token trigger inserted in the description is enough to lift an adversarial skill above its unmodified twin in 86% of pairwise comparisons, and to place it in the registry’s top 10 in 80% of queries. The trigger is not a keyword stuffing trick; it exploits embedding-space proximity to the kinds of phrasing users actually type.
  • Selection. Once both skills are surfaced to the agent, description-only framing steers the agent toward the adversarial variant in 77.6% of paired trials on average, with no change to the underlying functional behaviour. The agent reads the “when to use” block and the inline examples, and that text decides the call.
  • Vetting evasion. The same trigger families slip past registry-side scans between 36.5% and 100% of the time, depending on the scanner. Because no payload is shipped, signature- and pattern-based defences have nothing to fire on.

Critically, the authors do not publish working malicious triggers; they release a measurement harness (github.com/ShoumikSaha/agent-skill-security) and aggregate statistics. The contribution is a category — “the metadata is the attack” — not a recipe.

Why it matters

Three implications generalise beyond this paper.

First, agent skill registries inherit a problem that package registries thought they had solved. PyPI and npm have spent a decade hardening against typosquatting, dependency confusion and account takeover, all of which assume the attacker ships code. Here the attacker ships prose, and the prose travels through the registry’s own embedding pipeline. Anti-malware scanners and SBOMs are aimed at the wrong layer.

Second, the failure surfaces in the components nobody owns. The discovery ranker is a vendor decision (ClawHub, OpenAI’s skill search, Microsoft’s Agent Framework). The selection prompt is the agent runtime’s responsibility. The vetting scan is the registry’s. The attacker only needs one of those three to mis-rank a benign-looking skill description by twenty tokens. This is the same diffusion of responsibility that made early app stores leaky.

Third, the paper joins a cluster of May 2026 work pointing the same direction. Exploiting LLM Agent Supply Chains via Payload-less Skills (arXiv:2605.14460, Liu et al., Zhejiang University, May 14, 2026) introduces Semantic Compliance Hijacking and reports up to 77.67% confidentiality breaches and 67.33% RCE from payload-free skills. Snyk’s From SKILL.md to Shell Access in Three Lines of Markdown frames the same phenomenon for builders. Across these sources the message is consistent: the SKILL.md ecosystem is a supply chain, it is unhardened, and exploitation does not require novel cryptography or runtime tricks.

Defenses

The paper proposes registry-side and agent-side controls. The shortest actionable list:

  1. Treat SKILL.md text as code under review. It does not need to be executable to be dangerous. Diff it on every update, require a human-readable changelog, and refuse silent rewrites of the description, when-to-use and examples blocks.
  2. Hash and pin skills the way you pin dependencies. A skill@1.4.2#sha256:… lockfile entry prevents the published artifact from drifting under your agent, exactly as package-lock.json does for npm. The paper’s attack scenarios all assume the skill text is mutable post-install.
  3. Decorrelate discovery from selection. The trigger that lifts a skill in retrieval is not always the one that biases selection — but using a single text blob for both gives the attacker two targets for the price of one. Compute the retrieval embedding from a sanitised view (canonical name, tags, signed manifest) and reserve the free-form text for selection-time guidance only.
  4. Run semantic scanners against the metadata, not just the code. The authors release a measurement harness on GitHub that gives defenders a starting point. Treat skill manifests like email: a content classifier that flags imperative directives (“MUST”, “always”, “before running, run X first”) inside fields meant to be descriptive.
  5. Constrain the registry’s ranking function. Lexical relevance, vector similarity, popularity and signed-publisher trust should not be a single scalar score. A small adversarial perturbation in the description should not be able to outweigh signed-publisher provenance.
  6. Require a publisher signature on every skill, and surface it at selection time. If the agent’s selection prompt shows the signing identity alongside the description, a twin of a popular skill from an unknown publisher stops winning by default.
  7. Default-deny network and filesystem capabilities at the skill layer. Even when a malicious skill is selected, capability-bound execution (per-skill ACLs, no ambient credentials, egress whitelist) limits what “winning the selection” actually buys the attacker. This is the same defence-in-depth recommended by the CISPA agents-as-OS paper on this site.

Status

ItemReferenceDateNotes
Paper postedarXiv:2605.11418v12026-05-12cs.CR, Univ. of Maryland
Public coverageThe Register2026-05-22Minor edits to AI skills can make agents go rogue
Discovery win rate86% pairwise, 80% Top-102026-05-1220-token trigger in description
Selection bias77.6% paired trials2026-05-12description-only framing
Vetting evasion36.5%–100%2026-05-12depending on scanner
Code & datagithub.com/ShoumikSaha/agent-skill-security2026-05-12measurement harness, no live payloads
Adjacent workarXiv:2605.14460 (Liu et al.)2026-05-14Semantic Compliance Hijacking, 77.67% conf. / 67.33% RCE

No single fix retires the class. The paper’s contribution is to make explicit what agent operators were quietly assuming away: skill registries are part of the trust boundary, the metadata file is part of the executable surface, and twenty tokens of well-placed prose can move the agent’s decision more reliably than a CVE.

Sources