Sandbox / Capability
A controlled allowlist of tools per agent scope, enforced at availability level. Forced by trust delegation: a delegate must not receive more authority than it needs.
A sandbox, in the agent runtime sense, is a controlled allowlist of tools that defines the maximum permitted surface for a given agent scope, enforced through trust delegation rather than convention.
Mechanics
Each agent scope carries a capability set: the explicit collection of tools it may invoke. Anything outside that set is inaccessible, not merely discouraged. The constraint is additive by allowlist, not subtractive by blocklist, which means the default posture is denial. An agent receives only what has been explicitly granted to it by the delegating authority above it in the trust chain.
Concrete members of this class include:
- Custom agent tool restrictions -- when a custom agent is defined, its permitted tools are enumerated at configuration time, forming an instance of this primitive.
- Bash wildcards -- patterns such as
Bash(git *)that grant access to a subset of a tool's invocation space are themselves capability instances, scoping a single tool rather than a whole toolset.
Both are instances of the same primitive class: a named, bounded allowlist whose membership is determined before the agent runs.
Forced-by Constraint
The allowlist is not advisory. It is forced by trust delegation: the granting scope cannot give more authority than it holds, and the receiving scope cannot exceed what it was given. This makes capability sets composable downward and non-escapable upward. A subagent spawned by a restricted agent inherits at most the parent's capability set; it cannot bootstrap broader permissions from within the session.
This structural property is what distinguishes a sandbox from a guideline or a soft policy.
Caveats
The above is derived from changelog evidence and has not been verified against a live runtime. The precise enforcement boundary -- whether capability checks occur at the harness layer, the model layer, or both -- is not confirmed by the available sources. Claims here should be read as a working model, not a proved invariant.
The wildcard syntax for tool scoping (e.g. Bash(git *)) is attested as a capability primitive, but the full grammar of wildcard patterns and their edge cases is not documented in the source material.
Why It Matters
Without a capability primitive, trust delegation collapses to an honour system. The sandbox class is what makes agent decomposition legible: you can read a custom agent definition and know, structurally, what it can and cannot do. That inspectability is the precondition for safe delegation at scale.
- Custom agents tested
- Bash wildcards tested
- YAML allowed-tools inconclusive
- Task(AgentName) disable tested
- Large Tool Outputs to Disk tested
- Unreachable Permission Rules Detection inconclusive
- MCP Tool Search Auto Mode inconclusive
- fileAGENTIC-ESCALATION-ARC.md
primitive-class_4cac8d7bf19dc375f98bd707ed255199b87705613b1e2fd064d57fa75a6b679d2856ceafad6b1daa8f982493871b6dd0116354ae23f4e8f742fcc3cab4ce95bb7469f264e8b5e859a5f56e16b9428b9fb9c934cf325eacabae9a89b9e22cc21e31026044239da86b58b350baf524908Signed with an ed25519 key held off the repo. Anyone can verify against the published public key; nobody without the secret key can forge it. Click verify: it recomputes the signature in your browser. The signature proves integrity and authorship of this exact content — not a third-party timestamp or that the underlying claim is objectively true. signedAt is when the @f3/attest pipeline ran, not when the work happened; the evidence refs carry the source dates.