system: OPERATIONAL
← back to all hacks
DEFENSE LOW NEW

SkillGuard: a permission framework that governs what an agent skill can do at runtime

A June 2026 paper closes the gap between what a skill injects into an agent's context and what it makes the agent do, using manifests, deny-by-default access control and runtime monitoring.

2026-06-17 // 6 min affects: llm-agents, agent-skills, tool-calling-agents

What is this?

SkillGuard: A Permission Framework for Agent Skills (arXiv:2606.03024, posted June 2026) is a defensive proposal for one of the fastest-growing surfaces in agentic AI: skills. A skill is a packaged bundle — instructions, tool definitions, sometimes code — that an agent loads to extend what it can do. The problem the paper attacks is that today’s skill ecosystems mostly rely on trust-based loading and static inspection: you read the file, decide it looks fine, and install it. That leaves a gap between what a skill can inject into the agent’s context and what it can cause the agent to do at runtime.

This is a defensive, systems-side contribution. There are no exploit payloads in it. The question it answers is how to constrain a skill once it is running, not how to abuse one.

How it works

SkillGuard reframes a skill as a permission-bearing executable artifact rather than a trusted text file, and applies a dual-plane governance model that regulates two distinct things at once:

  • Context influence — what the skill is allowed to put into, or change about, the agent’s reasoning context.
  • Action side effects — what the skill is allowed to make the agent actually do: which tools, files, network destinations, and protected objects it can touch.

Concretely, the framework combines several mechanisms drawn from classic access control and adapted to agents:

  • Skill manifests — a declared statement of intent and required capabilities, so a skill’s permissions are explicit and auditable rather than implied.
  • Deny-by-default enforcement — anything not declared is refused, the opposite of the “load and trust” status quo.
  • Runtime access control — permissions are checked while the skill acts, not only when its files are inspected at install time.
  • User-mediated authorization — high-impact capabilities require a human decision rather than being granted silently.
  • Capability inference and behavior monitoring — the system infers what a skill actually needs and watches for divergence between declared intent and observed runtime behavior.

The reported numbers give a sense of coverage and cost. SkillGuard’s permission taxonomy covers 99.76% of observed protected objects, and automated manifest generation reaches 91.0% F1 — meaning the framework can largely propose a skill’s permission manifest without a human writing it by hand. In adversarial evaluation, it reduces attack success from 32.37% to 23.02% for contextual injections and from 25.56% to 16.67% for more obvious injections, while keeping benign task utility. Those are partial reductions, not elimination — a point worth keeping in mind.

Why it matters

Skills inherit every weakness of prompt injection and tool misuse, and add a packaging-and-distribution problem on top. The broader literature has already mapped this surface: a survey of agent skills covers their architecture, acquisition and security risks (arXiv:2602.12430), and evaluation work such as SkillVetBench (arXiv:2606.15899) scores open-source skills for security risk before installation. The recurring theme is that a skill is untrusted third-party content that gets unusual privileges — it can rewrite the agent’s instructions and hand it new tools — yet it is usually governed by little more than a glance at the file.

SkillGuard matters because it moves enforcement to where the risk actually lives: runtime, with least privilege. Static inspection can catch known-bad files, but it cannot see what a skill does once the agent is reasoning and acting on live, possibly attacker-influenced, data. Tying a skill to a declared manifest and refusing anything outside it turns “I read the README” into an enforceable boundary. The partial nature of the reported reductions also carries a lesson: a permission layer lowers the blast radius, it does not make a malicious or hijacked skill safe.

Defenses

For teams shipping or installing agent skills, the practical takeaways generalize beyond this one framework:

  • Treat skills as untrusted, privileged code. A skill that can edit context and add tools is a higher-privilege object than a normal document. Govern it accordingly, not on trust.
  • Adopt deny-by-default for capabilities. Grant a skill only the tools, paths, and network destinations it declares it needs; refuse everything else. Don’t let install-time trust become runtime authority.
  • Separate context influence from action side effects. Knowing a skill can shape reasoning is different from knowing it can send data out. Track and gate both planes.
  • Require human authorization for high-impact actions. Irreversible or sensitive operations (deletes, transfers, external sends, credential access) should need an explicit human approval, not a silent grant.
  • Monitor declared intent versus runtime behavior. A manifest is only useful if you detect divergence from it. Log and alert when a skill reaches for capabilities it never declared.
  • Don’t treat a permission layer as a guarantee. SkillGuard reduces injection success but does not zero it out. Pair it with input/output filtering, sandboxing, and the usual lethal-trifecta hygiene (limit private-data access, untrusted content, and external communication in the same loop).

Status

ItemDetail
Paper”SkillGuard: A Permission Framework for Agent Skills”
arXiv ID2606.03024
PostedJune 2026
TypeDefensive permission framework — no exploit payloads
ModelDual-plane governance: context influence + action side effects
MechanismsManifests, deny-by-default, runtime access control, user-mediated authorization, capability inference, behavior monitoring
Reported resultsTaxonomy covers 99.76% of protected objects; manifest generation 91.0% F1; injection success 32.37%→23.02% (contextual) and 25.56%→16.67% (obvious); utility preserved

Sources