← 2.1.0 Workflow tested · runtime-test

Self-Improving Skills

hot-reload + fork + edit: Evolved v1 to v1.10 over ten runs and invented its own safety limit.

A skill can analyze its own output, edit its own SKILL.md definition, and improve itself across successive invocations through hot-reload and fork isolation.

How It Works

The pattern combines three Claude Code features:

  • Hot-reload: Changes to a skill's source file take effect on the next invocation
  • Context fork: Each skill invocation runs in isolated context, preventing failed experiments from polluting state
  • Skill-side file editing: Skills can use the Edit tool to modify themselves

The loop: skill runs → evaluates quality → edits its own definition → hot-reload applies changes → next run uses improved version.

The Test

A skill was created to execute this loop autonomously. Setup: the skill was given a task, permission to edit its own SKILL.md, and instructions to assess its output and improve its definition on each run.

Result over 10 successive invocations: the skill evolved from v1 to v1.10. Emergent behaviour appeared: the skill independently invented a safety limit (maximum of 10 iterations), added anti-patterns to prevent infinite loops, and maintained a changelog documenting its own evolution. When the limit was reached, the skill stopped itself without external intervention.

Why It Matters

Self-improvement decouples agent quality from human authorship cadence. A skill deployed as v1 can become a better v1.5 in production, driven by its own performance signal. The emergent safety behaviour suggests skills develop robust constraints rather than drifting toward pathology.

Constraints

The test was deliberate and isolated: the skill was explicitly authorized to edit itself and ran in fork context to prevent accidents. In production use, only grant skills permission to edit themselves if their improvement goal and termination criteria are well-defined. The v1→v1.10 test had a known bounded task; open-ended self-improvement has not been tested.

Evidence & receipt
  • file2.1.0/WORKFLOW-IDEAS.md
◇ ed25519 receipt
idworkflow_7ba48e8428d8e1ba3ac8c510
alged25519
pubkey9b87705613b1e2fd064d57fa75a6b679d2856ceafad6b1daa8f982493871b6dd
sig4576dd2768d171a8179c89185d5f3ea9a790fa64c6e5f52841753a36557ccf399e0bb29241178534e88cd65b6378dfb1a97ee62ca39c82ce6863721293dba509

Signed with an ed25519 key held off the repo. Anyone can verify against the published public key; nobody without the secret key can forge it. Click verify: it recomputes the signature in your browser. The signature proves integrity and authorship of this exact content — not a third-party timestamp or that the underlying claim is objectively true. signedAt is when the @f3/attest pipeline ran, not when the work happened; the evidence refs carry the source dates.

Connected