← 2.1.0 Finding tested · runtime-test

The real sandbox boundary is the agent definition, not the skill

allowed-tools in skill frontmatter parses and does NOT enforce; tools: in an agent definition does. The tool is absent, not filtered.

The tools: field in a custom agent definition is enforced at the tool-availability layer; the allowed-tools field in a skill's YAML frontmatter is parsed but not enforced -- the missing tool is simply absent, not filtered.

What the test found

Testing was performed on Claude Code 2.1.0 (2026-01-07). Two skill frontmatter formats were exercised against a Bash call that should have been blocked:

# YAML list format -- did not restrict
allowed-tools:
  - Read
  - Grep
  - Glob

# Comma format -- did not restrict either
allowed-tools: Read,Grep,Glob

In both cases, Bash remained callable inside the skill's execution context. The field parsed without error; it produced no enforcement. Status recorded: INCONCLUSIVE.

The same restriction applied via a custom agent definition passed immediately:

# .claude/agents/cheap-researcher.md
---
name: cheap-researcher
tools: Read, Grep, Glob
model: haiku
---

When a skill routed to cheap-researcher via agent: cheap-researcher, Bash was not available in the forked context. The test notes specifically: "Restriction happens at tool availability level, not execution." The tool is not there to call -- it is not intercepted after the fact.

How the boundary works

Custom agents are loaded at session start (they do not hot-reload; a session restart is required). The tools: field in the agent definition sets the toolset the agent receives. When a skill forks into that agent context, the fork inherits the agent's toolset. There is no runtime filter checking calls against a skill-level list; the capability is simply not provisioned.

Skill frontmatter allowed-tools, by contrast, appears to be advisory at most -- possibly influencing permission prompts or serving as documentation, not a hard gate.

Why it matters

Treating allowed-tools as a security boundary is a false assumption. Any skill author who writes allowed-tools: Read,Grep,Glob expecting to prevent Bash calls is mistaken. The runtime ignores the restriction.

The practical split:

Surface Field Enforced
Agent definition (.claude/agents/*.md) tools: Yes
Skill frontmatter allowed-tools: No

For sandboxed third-party skill execution, model-tiered pipelines, or any context where tool restriction is a correctness or safety requirement, the constraint must live in an agent definition.

Caveats

The allowed-tools result is marked INCONCLUSIVE rather than FAIL -- additional configurations were not ruled out. The enforcement gap may be narrower than complete non-enforcement; what is certain is that the two tested formats did not restrict Bash. Until a passing configuration is demonstrated, the safe operating assumption is: agent definitions enforce; skill frontmatter does not.

Invalid agent names fail silently with no surfaced error, falling back to default model and tools -- a separate hazard when relying on agent-based sandboxing.

Evidence & receipt
◇ ed25519 receipt
idfinding_893ba66d516b332f034043bc
alged25519
pubkey9b87705613b1e2fd064d57fa75a6b679d2856ceafad6b1daa8f982493871b6dd
sig3bf4ad95f825aea6360ba0b13aa90678e99dd35d667f5dd2aacf14c1bb5c700a9fbab4a5ecb4b84e24b3182f8c1e4eb987c241985da0bfc0be40b2caf3392301

Signed with an ed25519 key held off the repo. Anyone can verify against the published public key; nobody without the secret key can forge it. Click verify: it recomputes the signature in your browser. The signature proves integrity and authorship of this exact content — not a third-party timestamp or that the underlying claim is objectively true. signedAt is when the @f3/attest pipeline ran, not when the work happened; the evidence refs carry the source dates.

Connected