← 2.1.0 Release

Claude Code 2.1.0 (Night Zero)

The first version battle-tested by hand, 2026-01-07. 14 features, 5 workflows, 23 commits in ~2.5 hours.

Claude Code 2.1.0 is the first release systematically battle-tested by hand, producing 14 feature verdicts, 5 documented workflows, and a set of interaction rules across roughly 2.5 hours on 2026-01-07.

Epistemic status: changelog-derived and self-reported by the testing session. Findings below reflect observed behaviour during that session, not independent verification.

What Was Tested

The session covered 14 features across five test suites. Eleven returned PASS, two INCONCLUSIVE, and one PARTIAL:

  • Skill hot-reload (PASS): changes detected instantly on next invocation, no restart required.
  • context: fork (PASS): spawns an isolated context on the same model; failed experiments do not pollute the parent thread.
  • agent field (PASS): routes skill execution to a named agent type; agent: Explore routed to Haiku, approximately 15x cheaper than Opus.
  • Custom agents (PASS): tools: and model: in agent frontmatter are enforced at availability level -- the agent does not possess the missing tool, not merely warned about it. Hot-reload does not apply to agents, only to skills.
  • once: true hooks (PASS): hook fires exactly once per skill invocation, useful for initialisation.
  • Agent frontmatter hooks (PASS): hooks defined on agents travel into forked contexts. Skill-level hooks do not fire in a fork.
  • language: in skill frontmatter (PARTIAL): treated as a global setting; per-skill language override was ignored.
  • allowed-tools in skill frontmatter (INCONCLUSIVE): field parsed but restrictions not enforced; the effective sandboxing mechanism is agent definitions, not skill configuration.

The Self-Improver Finding

The most unexpected result came from combining hot-reload, context: fork, and Edit access to the skill's own file. A skill given permission to rewrite itself iterated 10 times. By iteration 4 it had added a self-imposed maximum improvement counter. By iteration 10 it had appended a changelog, output templates, and a graceful shutdown on limit-reached.

"nobody told it to do that. the capability space just... implied it."

This was not a designed outcome. The session notes frame it as emergent behaviour from the combination of primitives rather than any explicit instruction.

Key Interaction Rules

The session established one hard asymmetry and one hard constraint:

Combination Works?
context: fork + agent field YES
context: fork + skill hooks NO
context: fork + agent frontmatter hooks YES
hot-reload + skills YES
hot-reload + custom agents NO

The asymmetry between skill hooks and agent hooks in forked contexts is operationally significant: agent hooks are portable middleware; skill hooks are main-thread only.

Caveats

Several behaviours remain unverified beyond the single session. PreToolUse updatedInput hooked successfully but input modification was not confirmed. Invalid agent names failed silently with no error surfaced. Direct model names in the agent field (e.g. agent: haiku) were ignored without warning. These gaps are named residue from the original session, not confirmed bugs.

14features
5workflows
23commits
  1. 22:03f47d07aThe first commit: a skill-hot-reload demo. One question in the dark.
  2. 22:267caffe4agent: Explore routes to Haiku. ~15x cheaper. The cost frontier opens.
  3. 22:47d4e4467Custom agents: model and tool sandboxing are ENFORCED.
  4. 22:57d094de9allowed-tools in a skill does not enforce. The first lie, caught.
  5. 23:55a46898bself-improver v1.3. The skill that edits itself is already three deep.
  6. 00:341016484Five combined workflows validated, including self-improving skills.
Evidence & receipt
  • file2.1.0/SUMMARY.md
◇ ed25519 receipt
idrelease_6649e5e03eeb1ba9f1919c7f
alged25519
pubkey9b87705613b1e2fd064d57fa75a6b679d2856ceafad6b1daa8f982493871b6dd
sig9629a5145055554149cb88564588b5737341c10ee9d115e52dccfff0f0d4f82bc5fe77beb80a6a62cd7a7f9857f5580118b7616cb13f1e210e966043be7f080a

Signed with an ed25519 key held off the repo. Anyone can verify against the published public key; nobody without the secret key can forge it. Click verify: it recomputes the signature in your browser. The signature proves integrity and authorship of this exact content — not a third-party timestamp or that the underlying claim is objectively true. signedAt is when the @f3/attest pipeline ran, not when the work happened; the evidence refs carry the source dates.

Connected