Experiment 01

Claude Code feature intelligence.

Pointing Claude Code's own dynamic-workflow primitive at Claude Code's evolution. 137 agents across 3 model-tiered runs, every claim traced to source, the whole thing captured as typed workflow definitions.

3runs
137agents
3.22Mtokens
63versions
4WGL defs

The three runs

Test 1 — single version
sequential extract -> write -> verify on 2.1.154
3 agents 99k tokens 3.8 min
opus
Test 2 — multi version
6-version pipeline + synthesis barrier
19 agents 612k tokens 8.2 min
opus
Test 3 — full flow
56 versions, tiered, cross-version reconcile
115 agents 2.51M tokens 18.1 min
haiku/sonnet workers, opus barrier

Per-agent token count barely moved across runs. Per-agent cost collapsed: the third run did far more work with the premium model touching only the reconciliation barrier. Fan-out cost is a model-tier decision, not an agent-count decision.

The agentic escalation arc

How Claude Code crossed from one agent per session to fleet-scale orchestration, one capability layer at a time.

2.1.50 isolation
Declarative `isolation: worktree` in agent definitions (plus paired WorktreeCreate/WorktreeRemove hooks) lets subagents run in isolated git worktrees automatically — the key new fan-out orchestration primitive in 2.1.50. The rest is predominantly memory-leak hardening.
2.1.71 autonomy
Cron scheduling tools for recurring prompts within a session — the first changelog-stated primitive that lets an agent schedule recurring work itself, which directly extends multi-agent/long-running orchestration surface (paired with the new `/loop` command for the user-facing recurrence form).
2.1.111 fan-out
`/ultrareview` ships cloud-based parallel multi-agent code review — the only new primitive in this release that is itself an agent-orchestration mechanism, directly mirroring the fan-out pattern this test instruments.
2.1.139 autonomy
Agent view (`claude agents`, Research Preview) is the marquee primitive: a single list of every Claude Code session — running, blocked on you, or done — which is exactly the cross-session fan-out monitoring surface a multi-agent orchestration test needs. Real feature release (agent view, `/goal`, two new hook fields, plugin details, subagent identity headers), but the bulk of the changelog (~40 of ~50 bullets) is bugfix hardening.
2.1.154 fan-out keystone
Dynamic workflows (`/workflows`): you can now ask Claude to create a workflow and it orchestrates work across tens to hundreds of background agents, turning single-session prompting into fan-out, multi-agent orchestration of large/complex tasks — this is the primitive the whole 2.1.154 test is instrumenting.
2.1.160 fan-out
Almost entirely hardening: 2.1.160 is a bugfix/safety release. The closest thing to a new primitive is the dynamic-workflow trigger keyword being renamed from `workflow` to `ultracode` — relevant to orchestration because it changes how the fan-out/dynamic-workflow feature is invoked and de-conflicts the common word "workflow" from triggering a run. This is a rename of an existing trigger, not a net-new capability. No genuinely new tools, hooks, or settings were added; the new behaviors are safety prompts and friction reductions on existing primitives.
2.1.169 control-plane
Hardening-heavy release: ~25 fixes/improvements vs. 3 genuine new primitives (plus 2 new surfaces bundled into fixes). The standout primitive for agent orchestration is `disableBundledSkills` / `CLAUDE_CODE_DISABLE_BUNDLED_SKILLS`, which lets an orchestrator strip the model's default skill/workflow/command surface down to a controlled set — a targeted control-plane lever for deterministic multi-agent fan-out. Output is changelog-derived, not runtime-tested.

Reconciliation

A fan-out of isolated agents is blind to one error class: a feature called new in one version that actually shipped earlier. A final cross-version pass caught both, and both were corrected.

/effort xhigh shipped 2.1.111, re-claimed 2.1.154 corrected
workspace.git_worktree shipped 2.1.97, re-claimed 2.1.98 corrected

Every version documented (63)

Changelog-derived primitive extraction for every substantive version. Untested by design: the complement to runtime battle-tests, not a substitute.

versionprimitivesheadline
2.1.33 5 **Multi-agent workflow primitives:** New hook events (`TeammateIdle`, `TaskCompleted`) + agent-spawning restrictions (`Task(agent_type)`) + persistent memory scopes (`memory` frontmatter) unlock structured multi-agent orchestration where teammates coordinate across sessions with durable state and controlled delegation. 2.1.41 6 **`claude auth` CLI subcommands (login/status/logout)** — enables programmatic and interactive authentication workflows without manual config file editing. This matters for automation, CI/CD, and credential rotation loops where auth state must be explicitly managed. 2.1.42 1 **One-time Opus 4.6 effort callout** — introduces a new system-level user communication for eligible users, improving model discoverability and driving adoption of the latest capability tier without ongoing permission friction. 2.1.45 4 **Claude Sonnet 4.6 support + SDK rate-limit observability.** Model expansion (new flagship) + SDK types for throttle-aware consumer code. Why it matters: agents and tools that respect rate limits can now signal backpressure programmatically. 2.1.46 1 claude.ai MCP connector support in Claude Code — agents can now use Model Context Protocol integrations from the claude.ai web interface, bridging web tools into the CLI environment and expanding available capabilities for agent workflows. 2.1.47 4 **`last_assistant_message` hook field**: Stop and SubagentStop hooks can now access the final assistant response text directly without parsing transcript files, enabling cleaner automation of agent result capture and persistence. 2.1.49 8 Worktree isolation for Claude and subagents enables reproducible, containerized agent workflows without branch conflicts or stale state bleeding across sessions. 2.1.50 6 Declarative `isolation: worktree` in agent definitions (plus paired WorktreeCreate/WorktreeRemove hooks) lets subagents run in isolated git worktrees automatically — the key new fan-out orchestration primitive in 2.1.50. The rest is predominantly memory-leak hardening. 2.1.51 8 **`claude remote-control` subcommand** — enables external build systems and CI/CD pipelines to orchestrate Claude Code locally, expanding integration surface beyond direct user interaction to automated environments and distributed workflows. 2.1.59 3 **Auto-memory + /copy:** Claude Code now saves useful context automatically and provides interactive code block selection, enabling faster context reuse and precise code copying in multi-turn workflows. 2.1.63 7 **Bundled slash commands + HTTP hooks**: slash commands now ship with `/simplify` and `/batch` out-of-the-box; HTTP hooks enable external system integration without shell commands. Together, these unlock workflow distribution and third-party integrations in Claude Code without custom scripting. 2.1.69 14 Hook observability expansion: `InstructionsLoaded` event + `agent_id`/`agent_type` fields + `worktree` status-line field. For the first time, agents can observe exactly when and from where their instruction context was loaded, which sub-agent ID is executing, and whether execution is inside a worktree — enabling routing, audit, and multi-worktree orchestration patterns that were previously guesswork. 2.1.71 4 Cron scheduling tools for recurring prompts within a session — the first changelog-stated primitive that lets an agent schedule recurring work itself, which directly extends multi-agent/long-running orchestration surface (paired with the new `/loop` command for the user-facing recurrence form). 2.1.72 9 `ExitWorktree` tool — completes the symmetric pair for worktree isolation. `EnterWorktree` already existed; without `ExitWorktree` agents had no clean path back to the parent tree. This closes the loop and makes programmatic worktree workflows (branch-per-task, parallel agents, sparse-checkout isolation) fully composable without leaking session state. 2.1.73 1 **`modelOverrides` setting** — allows mapping model picker entries to custom provider model IDs (Bedrock inference profile ARNs, etc.). This enables teams behind corporate proxies or using non-standard LLM deployments to use Claude Code with pinned or alternative model configurations without forking the CLI. 2.1.74 3 Enhanced `/context` command with actionable optimization suggestions — identifies context-heavy tools, memory bloat, and capacity warnings with specific remediation tips. This makes memory/capacity management observable and prescriptive rather than opaque. 2.1.75 5 **1M context window for Opus 4.6** — Max, Team, and Enterprise plans now have Opus 4.6 with 1M context by default, removing the friction of opt-in usage tier selection and enabling agents to reason over larger corpora without breaking the working-memory budget. 2.1.76 8 MCP elicitation support — MCP servers can now interrupt mid-task to request structured input from the user via interactive dialogs (form fields or browser URL). Paired with the `Elicitation` and `ElicitationResult` hooks, this closes the last major gap in MCP's ability to drive complex multi-step agent interactions that require human confirmation or credential input at runtime. 2.1.77 4 **Output token ceiling uplift (Opus 4.6 64k→128k)** — Model capacity expansion enables longer response payloads in a single turn, reducing need for continuation prompts in code/analysis-heavy workflows. Agentic relevance: agents generating large documents, multi-file refactors, or comprehensive reports now fit complete outputs without truncation. 2.1.78 6 **`StopFailure` hook event** — agents can now react to API errors that halt execution (rate limit, auth failure) instead of silent termination, enabling recovery workflows and graceful degradation. 2.1.79 5 **`/remote-control` command** — bridge CLI session to claude.ai/code for browser/phone continuation, enabling workflow mobility and device-agnostic handoff during extended work. 2.1.80 5 **`effort` frontmatter for skills and slash commands** — agents can now override the model effort level at the point of skill/command definition, enabling context-aware defaulting without manual per-invocation effort tweaks. 2.1.81 3 `--bare` flag — scriptable automation in deterministic environments (CI, headless servers, backend schedulers). Skips OAuth, hooks, LSP, and plugins to run repeatable `-p` invocations with just `ANTHROPIC_API_KEY`. Agents can now run fast, stateless sub-tasks without interactive approval paths. 2.1.83 16 `managed-settings.d/` drop-in directory for policy fragments. This matters because it decouples multi-team policy authoring from a single `managed-settings.json` file — teams can now deploy, update, or remove their own fragments independently without merge conflicts, which is the enabling primitive for org-scale Claude Code governance. 2.1.84 8 `TaskCreated` hook — fires the moment a task is created via `TaskCreate`, giving orchestrators a synchronous interception point to record, route, or gate new tasks before they run. This is the most agentic primitive in the release; every other addition is either env-var tuning or hardening. 2.1.85 6 **Headless hook integrations via PreToolUse `updatedInput`** — agents and headless systems can now satisfy user questions without interactive prompts, enabling self-contained tool-use flows with delegated permission handling. 2.1.86 2 **`X-Claude-Code-Session-Id` header + VCS exclusion list expansions** — Enables proxy-layer session aggregation and improves tooling compatibility (Jujutsu, Sapling) without user configuration, reducing friction for multi-VCS monorepos and enterprise deployments. 2.1.89 6 **Headless resumption with deferred permissions** — `"defer"` permission decision in `PreToolUse` hooks + `-p --resume` now enables agents and headless scripts to pause at permission gates and resume after hook re-evaluation, unblocking long-running workflows and interactive permission resolution without operator presence. 2.1.90 3 **/powerup command** — interactive lessons with animated demos for teaching Claude Code features. This unlocks guided onboarding and self-directed skill building without leaving the CLI. 2.1.91 4 MCP tool result persistence override via `_meta["anthropic/maxResultSizeChars"]` — agents can now transmit large results (up to 500K) like DB schemas without truncation, unlocking deeper integration with data discovery tasks. 2.1.92 6 **Interactive Bedrock setup wizard** — new 3rd-party LLM platform integration flow with guided AWS auth, region config, and model pinning. Enables agents to ingest non-Claude models into the Claude Code runtime without manual credential/region assembly. 2.1.94 6 **Amazon Bedrock-powered inference via Mantle + default effort bump** — agents can now explicitly opt into Bedrock inference (Amazon's managed model endpoint) and will see higher default reasoning effort on provisioned compute, unlocking cost-aware and regulated-environment multi-provider inference. 2.1.97 6 Focus view toggle in NO_FLICKER mode (`Ctrl+O`) — compact display of prompt, tool summaries with edit diffs, and final response. Reduces context switching for agents monitoring active work without full session visibility. 2.1.98 9 Monitor tool — a first-class tool for streaming events from background scripts. Fills a long-standing gap: agents running background processes had no native way to observe stdout line-by-line; Monitor closes that loop and enables reactive agentic pipelines without polling hacks. 2.1.101 6 **`/team-onboarding` command** — generates a teammate ramp-up guide from your local Claude Code usage patterns. Enables knowledge transfer and onboarding automation without manual documentation. 2.1.105 4 **`EnterWorktree` path parameter** enables worktree navigation without re-parsing repository state—a quiet efficiency gain for multi-worktree session continuity and agent branch-switching workflows. 2.1.108 6 **Prompt caching control and session recap** unlock fine-grained performance tuning and context continuity. Fine TTL control (5m/1h) enables agents to trade cache freshness for latency on data-update-sensitive tasks; recap surfaces automatic context recovery for interrupted sessions. 2.1.110 10 TUI fullscreen rendering + Push notifications: interactive experience upgrades bringing flicker-free rendering to focused work and async notification capability to remote/mobile clients. Enables longer-form agent conversations without terminal flicker and lets Claude proactively notify the user when remote. 2.1.111 6 `/ultrareview` ships cloud-based parallel multi-agent code review — the only new primitive in this release that is itself an agent-orchestration mechanism, directly mirroring the fan-out pattern this test instruments. 2.1.113 10 **Native binary spawn + sandbox network filtering** — Claude Code now runs as a native platform binary instead of bundled JavaScript, and administrators/users can surgically block specific domains even under broad `allowedDomains` wildcards. This shifts CLI performance and security policy control to fine-grained network isolation on each platform. 2.1.118 7 **Hooks invoking MCP tools directly** (`type: "mcp_tool"`) — hooks can now call MCP servers as first-class invocations, enabling agent verifiers, auto-mode rule checkers, and session orchestration logic to reach external services without wrapping them in bash or shimming through tool-call syntax. This unlocks tighter MCP integration into the Claude Code control plane. 2.1.119 10 Settings persistence with override precedence — `/config` settings now persist to `~/.claude/settings.json` and participate in project/local/policy override precedence, enabling stable per-context configuration without manual CLI flags. 2.1.120 4 **`claude ultrareview [target]` command** — enables non-interactive code review automation in CI/CD pipelines and scripts, with structured JSON output for integration into automation tooling. This unlocks review-as-a-gate and audit-trail capture for agent-driven codebases. 2.1.121 7 **MCP server eager-load control** (`alwaysLoad` config option): Skip deferred tool resolution for trusted MCP servers, making all tools immediately available without tool-search cost. This reduces latency for high-trust integrations and enables agents to invoke tools in non-interactive sessions reliably. 2.1.122 2 **PR URL search in `/resume` and Bedrock service tier selection**: the `/resume` search now accepts GitHub PR URLs to jump directly to the session that created them, and `ANTHROPIC_BEDROCK_SERVICE_TIER` env var lets users tune Bedrock API tier (default/flex/priority) — both reduce friction in session discovery and cloud infra config. 2.1.129 7 **Plugin URL fetching + env-var sync-output control** — agents and power users can now inline ephemeral plugins from remote URLs and force deterministic terminal output, unlocking faster iteration on plugin-driven workflows and more reliable CI/automation on exotic terminals. 2.1.132 2 **Session identity in Bash subprocesses:** Bash tool commands can now read `CLAUDE_CODE_SESSION_ID` to correlate their execution with the parent Claude Code session and hook invocations. This enables agents to build session-aware audit logs, cross-reference hook events, and reason about causality within long-running sessions. 2.1.133 6 **`worktree.baseRef` setting** — agents can now choose whether new worktrees branch from origin (fresh isolation) or local HEAD (preserving unpushed work). This is critical for multi-agent workflows where one agent's uncommitted code should flow to isolation worktrees, and for developer flows that rely on local draft branches. 2.1.136 2 **Enterprise telemetry integration and auto-mode hardening.** 2.1.136 adds environment-variable control for OpenTelemetry survey feedback and a new `hard_deny` classifier rule type for absolute permission blocking — expanding enterprise audit/compliance options and tightening auto-mode decision boundaries. 2.1.139 10 Agent view (`claude agents`, Research Preview) is the marquee primitive: a single list of every Claude Code session — running, blocked on you, or done — which is exactly the cross-session fan-out monitoring surface a multi-agent orchestration test needs. Real feature release (agent view, `/goal`, two new hook fields, plugin details, subagent identity headers), but the bulk of the changelog (~40 of ~50 bullets) is bugfix hardening. 2.1.141 7 Desktop notification capability via hook JSON `terminalSequence` field unlocks notification, window-title, and bell signaling without requiring a controlling terminal — critical for daemonized/backgrounded agent workflows that run detached from the shell. 2.1.142 12 **`claude agents` configuration flags** — Eight new flags (`--add-dir`, `--settings`, `--mcp-config`, `--plugin-dir`, `--permission-mode`, `--model`, `--effort`, `--dangerously-skip-permissions`) unlock dynamic configuration of dispatched background sessions, enabling agents to customize runtime behavior without hardcoded defaults. 2.1.143 15 Plugin dependency management on disable/enable with transitive chain resolution and force-enable. This addresses a common footgun where uninstalling a plugin that others depend on silently breaks dependent plugins, and now provides copy-pasteable hints for safe uninstall chains. 2.1.144 7 **`/resume` for background sessions** — unified control plane for resuming background work. Sessions started via `claude --bg` or agent view now appear alongside interactive ones in a single picker, marked with `bg` badge. Critical for multi-session orchestration: agents can spawn background work and return to resume it later without context loss. 2.1.145 9 **`claude agents --json`** unlocks JSON-based session introspection for scripting, enabling tmux-resurrect integration, custom status bars, and programmatic session pickers. This is the first machine-readable gate to live Claude agent state. 2.1.152 13 **`/code-review --fix` and hook-driven skill lifecycle management unlock tighter agent self-repair loops**: code review findings can now be automatically applied to the working tree, and hooks can inject skills mid-session and set message transformations—together enabling agents to review their own diffs, apply hardening suggestions, and suppress noisy output. 2.1.153 5 **`skipLfs` option for git/github sources + improved `claude agents` autocomplete**: These two features combine UX (smarter command dispatch) with infrastructure (faster cloning) to make agent project setup faster and more discoverable. 2.1.154 7 Dynamic workflows (`/workflows`): you can now ask Claude to create a workflow and it orchestrates work across tens to hundreds of background agents, turning single-session prompting into fan-out, multi-agent orchestration of large/complex tasks — this is the primitive the whole 2.1.154 test is instrumenting. 2.1.157 8 **Local plugin auto-load from `.claude/skills`** — decentralizes plugin distribution, enables repo-local skill scaffolding, and pairs with new `EnterWorktree` worktree-switching and `agent` field routing to unlock mid-session context+tool pivots without marketplace dependency. 2.1.160 2 Almost entirely hardening: 2.1.160 is a bugfix/safety release. The closest thing to a new primitive is the dynamic-workflow trigger keyword being renamed from `workflow` to `ultracode` — relevant to orchestration because it changes how the fan-out/dynamic-workflow feature is invoked and de-conflicts the common word "workflow" from triggering a run. This is a rename of an existing trigger, not a net-new capability. No genuinely new tools, hooks, or settings were added; the new behaviors are safety prompts and friction reductions on existing primitives. 2.1.163 6 Version-gating for environments: managed settings `requiredMinimumVersion` and `requiredMaximumVersion` enforce Claude Code version constraints organization-wide, preventing version skew in managed teams and reducing friction from incompatible client/server state. 2.1.166 8 Fallback model routing and fine-grained thinking control unlock graceful degradation (load-shedding to 3 alternate models) and explicit thinking suppression (including on Claude 3.7 and future default-thinking models). Together with cross-session messaging hardening, this strengthens agent resilience and sandboxing in multi-session orchestration. 2.1.169 5 Hardening-heavy release: ~25 fixes/improvements vs. 3 genuine new primitives (plus 2 new surfaces bundled into fixes). The standout primitive for agent orchestration is `disableBundledSkills` / `CLAUDE_CODE_DISABLE_BUNDLED_SKILLS`, which lets an orchestrator strip the model's default skill/workflow/command surface down to a controlled set — a targeted control-plane lever for deterministic multi-agent fan-out. Output is changelog-derived, not runtime-tested.

Corpus and WGL definitions live in foundation3 (docs/experiments/claude-features, packages/exo/ontology/src/wgl-ultracode.ts). Generated 2026-06-09.