← Atlas Class

Sandbox / Capability

A controlled allowlist of tools per agent scope, enforced at availability level. Forced by trust delegation: a delegate must not receive more authority than it needs.

A sandbox, in the agent runtime sense, is a controlled allowlist of tools that defines the maximum permitted surface for a given agent scope, enforced through trust delegation rather than convention.

Mechanics

Each agent scope carries a capability set: the explicit collection of tools it may invoke. Anything outside that set is inaccessible, not merely discouraged. The constraint is additive by allowlist, not subtractive by blocklist, which means the default posture is denial. An agent receives only what has been explicitly granted to it by the delegating authority above it in the trust chain.

Concrete members of this class include:

  • Custom agent tool restrictions -- when a custom agent is defined, its permitted tools are enumerated at configuration time, forming an instance of this primitive.
  • Bash wildcards -- patterns such as Bash(git *) that grant access to a subset of a tool's invocation space are themselves capability instances, scoping a single tool rather than a whole toolset.

Both are instances of the same primitive class: a named, bounded allowlist whose membership is determined before the agent runs.

Forced-by Constraint

The allowlist is not advisory. It is forced by trust delegation: the granting scope cannot give more authority than it holds, and the receiving scope cannot exceed what it was given. This makes capability sets composable downward and non-escapable upward. A subagent spawned by a restricted agent inherits at most the parent's capability set; it cannot bootstrap broader permissions from within the session.

This structural property is what distinguishes a sandbox from a guideline or a soft policy.

Caveats

The above is derived from changelog evidence and has not been verified against a live runtime. The precise enforcement boundary -- whether capability checks occur at the harness layer, the model layer, or both -- is not confirmed by the available sources. Claims here should be read as a working model, not a proved invariant.

The wildcard syntax for tool scoping (e.g. Bash(git *)) is attested as a capability primitive, but the full grammar of wildcard patterns and their edge cases is not documented in the source material.

Why It Matters

Without a capability primitive, trust delegation collapses to an honour system. The sandbox class is what makes agent decomposition legible: you can read a custom agent definition and know, structurally, what it can and cannot do. That inspectability is the precondition for safe delegation at scale.

Members 8 across 5 versions
Evidence & receipt
  • fileAGENTIC-ESCALATION-ARC.md
◇ ed25519 receipt
idprimitive-class_4cac8d7bf19dc375f98bd707
alged25519
pubkey9b87705613b1e2fd064d57fa75a6b679d2856ceafad6b1daa8f982493871b6dd
sig0116354ae23f4e8f742fcc3cab4ce95bb7469f264e8b5e859a5f56e16b9428b9fb9c934cf325eacabae9a89b9e22cc21e31026044239da86b58b350baf524908

Signed with an ed25519 key held off the repo. Anyone can verify against the published public key; nobody without the secret key can forge it. Click verify: it recomputes the signature in your browser. The signature proves integrity and authorship of this exact content — not a third-party timestamp or that the underlying claim is objectively true. signedAt is when the @f3/attest pipeline ran, not when the work happened; the evidence refs carry the source dates.