Runtime Atlas · 2.1.0
Claude Code 2.1.0 (Night Zero)
The first version battle-tested by hand, 2026-01-07. 14 features, 5 workflows, 23 commits in ~2.5 hours.
39signed entities
30evidenced claims
13runtime transcripts
0receipts failing
Release1
Finding3
The cost frontier: pay for intelligence only where it bends the outcome tested agent: Explore routes to Haiku (~15x cheaper). Three Haiku forks in parallel, Opus synthesizing, is a production architecture, not a demo. The real sandbox boundary is the agent definition, not the skill tested allowed-tools in skill frontmatter parses and does NOT enforce; tools: in an agent definition does. The tool is absent, not filtered. The self-improver invented its own safety limits tested Given Edit on itself + hot-reload + fork, a skill evolved v1.0 to v1.10 and invented a max-iteration counter, anti-patterns, and a graceful shutdown. Nobody told it to. Emergent alignment from capability design.
Primitive14
agent field tested Routes to Task agents; agent: Explore resolves to Haiku, ~15x cheaper. Agent frontmatter hooks tested Fire in a fork (skill hooks don't); portable middleware. Bash wildcards tested Prefix, suffix, and middle patterns work; one rule covers many commands. context: fork tested Isolated context, same model, no shared state; hooks do not fire inside a fork. Custom agents tested Model and tools ENFORCED, not filtered; the tool is absent. No hot-reload. language setting inconclusive Global only, ignored in skill frontmatter. once: true hooks tested Per-invocation scope; good for one-shot setup. PreToolUse updatedInput inconclusive Hook succeeds but input not modified as documented. Skill hooks tested Fire inline only, not in a fork; full JSON over stdin: middleware. Skill hot-reload tested Instant discovery, no restart; on-demand scan, no daemon, read fresh each invocation. Skills visible by default tested Recently used skills surface first in the slash menu. Subagent denial recovery tested Continues after a denial, tries alternatives. Task(AgentName) disable tested Granular agent-type blocking. YAML allowed-tools inconclusive Parses without error; does not restrict. Use agent definitions.
Test13
agent field — runtime test tested Hands-on runtime battle-test of agent field. Result: PASS. Agent frontmatter hooks — runtime test tested Hands-on runtime battle-test of Agent frontmatter hooks. Result: PASS. Bash wildcards — runtime test tested Hands-on runtime battle-test of Bash wildcards. Result: PASS. context: fork — runtime test tested Hands-on runtime battle-test of context: fork. Result: PASS. Custom agents — runtime test tested Hands-on runtime battle-test of Custom agents. Result: PASS. language setting — runtime test inconclusive Hands-on runtime battle-test of language setting. Result: PARTIAL. PreToolUse updatedInput — runtime test inconclusive Hands-on runtime battle-test of PreToolUse updatedInput. Result: INCONCLUSIVE. Skill hooks — runtime test tested Hands-on runtime battle-test of Skill hooks. Result: PASS. Skill hot-reload — runtime test tested Hands-on runtime battle-test of Skill hot-reload. Result: PASS. Skills visible by default — runtime test tested Hands-on runtime battle-test of Skills visible by default. Result: PASS. Subagent denial recovery — runtime test tested Hands-on runtime battle-test of Subagent denial recovery. Result: PASS. Task(AgentName) disable — runtime test tested Hands-on runtime battle-test of Task(AgentName) disable. Result: PASS. YAML allowed-tools — runtime test inconclusive Hands-on runtime battle-test of YAML allowed-tools. Result: INCONCLUSIVE.
Workflow4
Model-Tiered Pipeline tested custom agents: Haiku scans, Opus analyzes. Pay for intelligence only where it bends the outcome. Parallel Research Synthesis tested fork + agent: Explore: Three Haiku forks in parallel, Opus synthesizes. ~30s. A production architecture. Rapid Skill Development Loop tested hot-reload + fork: v1 to v2 with no restart, both on Haiku; fork gives a clean context each run. Self-Improving Skills tested hot-reload + fork + edit: Evolved v1 to v1.10 over ten runs and invented its own safety limit.
Commit1
Open question2
Is the silent fallback on invalid agent names (fail-open) a Is the silent fallback on invalid agent names (fail-open) a one-off, or the platform's default failure direction? We tested 14 features and 8 workflows and, by our own estima We tested 14 features and 8 workflows and, by our own estimate, discovered ~10% of what's possible.