Sandbox / Capability — Runtime Atlas

A sandbox, in the agent runtime sense, is a controlled allowlist of tools that defines the maximum permitted surface for a given agent scope, enforced through trust delegation rather than convention.

Mechanics

Each agent scope carries a capability set: the explicit collection of tools it may invoke. Anything outside that set is inaccessible, not merely discouraged. The constraint is additive by allowlist, not subtractive by blocklist, which means the default posture is denial. An agent receives only what has been explicitly granted to it by the delegating authority above it in the trust chain.

Concrete members of this class include:

Custom agent tool restrictions -- when a custom agent is defined, its permitted tools are enumerated at configuration time, forming an instance of this primitive.
Bash wildcards -- patterns such as Bash(git *) that grant access to a subset of a tool's invocation space are themselves capability instances, scoping a single tool rather than a whole toolset.

Both are instances of the same primitive class: a named, bounded allowlist whose membership is determined before the agent runs.

Forced-by Constraint

The allowlist is not advisory. It is forced by trust delegation: the granting scope cannot give more authority than it holds, and the receiving scope cannot exceed what it was given. This makes capability sets composable downward and non-escapable upward. A subagent spawned by a restricted agent inherits at most the parent's capability set; it cannot bootstrap broader permissions from within the session.

This structural property is what distinguishes a sandbox from a guideline or a soft policy.

Caveats

The above is derived from changelog evidence and has not been verified against a live runtime. The precise enforcement boundary -- whether capability checks occur at the harness layer, the model layer, or both -- is not confirmed by the available sources. Claims here should be read as a working model, not a proved invariant.

The wildcard syntax for tool scoping (e.g. Bash(git *)) is attested as a capability primitive, but the full grammar of wildcard patterns and their edge cases is not documented in the source material.

Why It Matters

Without a capability primitive, trust delegation collapses to an honour system. The sandbox class is what makes agent decomposition legible: you can read a custom agent definition and know, structurally, what it can and cannot do. That inspectability is the precondition for safe delegation at scale.

Kind	Class
Members	8 across 5 versions
Evidence	1 ref
Receipt	ed25519 · verifiable