Audit January 19, 2026

Ode to the Haiku Horde

One model put on its hardass hat and graded the output of thirty cheaper agents running a Ralph loop. The verdict: B+. Here is the full audit.

Raw artifact. Lightly edited from the original working notes. Published for research, not polish.

Auditor: Sonnet 4.5 (quality hardass mode) Date: 2026-01-19 Target: clauffect autonomous implementation by ~30 Haiku agents Claimed Progress: 49 commits (actually 77 in last 24h), +7.4k LOC, 3 modules complete


Executive Summary

Overall Grade: B+ (83/100)

real talk: this is way better than expected for autonomous haiku agents. the code is genuinely idiomatic effect-ts, not theater. 2044 passing tests, proper context tags, schema-based errors, clean layer composition (minus one type bug). the architecture is sound, separation of concerns is respected, and there’s actual property-based testing with fast-check.

Key Wins:

  • proper effect idioms throughout (Effect.gen, Context.Tag, Schema.TaggedError)
  • 140 TypeScript files, 44 tagged errors, zero runPromise misuse in src/
  • comprehensive test coverage with @effect/vitest
  • clean service boundaries and layer composition
  • empty TYPE_DEBT.md (haikus didn’t leave garbage for later)

Key Concerns:

  • build broken: auth layer integration left a type error in bin.ts (not provided HttpClient/SessionResume)
  • session/mcp/permissions claimed “complete” but auth at 46%, conversation at 71%
  • one blocking issue prevents binary from running

Genuine Progress vs Theater: 85% genuine. the code works, tests pass, patterns are correct. it’s not shipped yet but it’s legit foundation.


Module Reviews

Session (claimed 100%)

Spec Adherence: 9/10 Effect Idiomacy: 9/10 Test Quality: 8/10 Actually Complete: Yes

Notable Good:

  • Storage.ts: proper config injection, Effect.gen throughout, handles persistSession flag correctly
  • Manager.ts: clean service definition with Context.Tag pattern
  • Checkpoint.ts: file snapshots with proper error handling (CheckpointError with discriminated reasons)
  • Tests use proper Effect.provide chains, isolated with unique session IDs

Notable Bad:

  • Storage.test.ts still uses plain vitest instead of @effect/vitest
  • some as any casts for branded types (SessionId) - acceptable for now but should be resolved
  • relies on node:fs in Checkpoint.ts alongside @effect/platform (mixing abstraction levels)

Verdict: Actually complete. Session creation, storage, checkpointing, and rewind work. Tests prove it.


MCP (claimed 100%)

Spec Adherence: 9/10 Effect Idiomacy: 10/10 Test Quality: 9/10 Actually Complete: Yes

Notable Good:

  • Client.ts: beautiful JSON-RPC over NDJSON using Mailbox + Deferred for request correlation
  • proper error handling with McpClientError (Schema.TaggedError)
  • timeout handling using Effect.timeoutFail with correct defaults (30s connection, ~27h tool timeout)
  • output truncation logic with canonical token counting (1600 per image, char count for text)
  • tool naming convention: mcp__${server}__${tool} with normalization
  • supports stdio transport with Command + Stream
  • concurrent server initialization with concurrency: 4

Notable Bad:

  • Manager.ts is thin (just config loading) - actual work is in Client.ts
  • no WebSocket transport yet (documented as deferred in PLAN.md)
  • Config.ts has one try/catch for JSON.parse (could use Schema.parseJson)

Verdict: Actually complete for stdio transport. Tool discovery, execution, resource reading all work. Tests verify timeout behavior and error handling.


Permissions (claimed 100%)

Spec Adherence: 8/10 Effect Idiomacy: 10/10 Test Quality: 9/10 Actually Complete: Yes (for decider, prompter is stub)

Notable Good:

  • Decider.ts: pure logic, no I/O, proper separation of concerns
  • pattern matching with wildcard support
  • mode-based rules (plan denies writes, acceptEdits allows edits, bypassPermissions allows all)
  • PermissionDeciderService interface is clean
  • 45 tests in Context.test.ts, 39 in Rules.test.ts, 23 in Integration.test.ts

Notable Bad:

  • PermissionPrompter is mostly stubs (AutoAllowPrompterLive just allows everything)
  • interactive prompting not implemented yet
  • no actual user interaction for “ask” decisions

Verdict: Decision logic is complete and correct. Prompting side is placeholder. Good enough for now since most tools run with bypass or auto-allow in autonomous mode.


Conversation (claimed 71%)

Spec Adherence: 7/10 Effect Idiomacy: 9/10 Test Quality: 8/10 Actually Complete: Partially

Files Checked:

  • ConversationRunner.ts (580 lines, main loop)
  • MessageParser.ts (152 lines, content normalization)
  • ContextBuilder.ts (192 lines, build request)
  • Streamer.ts (streaming + tool extraction)

Notable Good:

  • proper state machine for turn management
  • budget tracking (maxTurns, maxBudgetUsd, token accumulation)
  • permission denial tracking
  • tool execution pipeline with hooks
  • compaction logic with thresholds
  • 25 tests for MessageParser with property-based testing
  • 22 tests for ContextBuilder

Notable Bad:

  • ConversationRunner.ts has 2 as any casts for wire message types
  • some complex nested Effect.gen blocks (readability concern)
  • fallback model logic partially implemented

Verdict: 71% is accurate. Core loop works, streaming works, tool execution wired. Missing pieces are model fallback edge cases and some compaction scenarios.


Auth (claimed 46%)

Spec Adherence: 6/10 Effect Idiomacy: 9/10 Test Quality: 8/10 Actually Complete: Partially (broke the build)

Files Checked:

  • Flow.ts (orchestrates detector + validator + oauth)
  • Detector.ts (finds API keys from env/config)
  • Validator.ts (validates key format)
  • Errors.ts (discriminated error types)

Notable Good:

  • proper error hierarchy (NoKeyAvailableError, ApiKeyInvalidError, OAuthFailedError)
  • all extend Schema.TaggedError with discriminated code field
  • detector has priority order: env var > settings > config files
  • 22 tests for error handling, 12 for validator

Notable Bad:

  • BROKEN BUILD: bin.ts tries to use AuthFlowLive but doesn’t provide HttpClient/SessionResume requirements
  • OAuth flow not fully wired (placeholder in some paths)
  • status reporter interface defined but not fully implemented

Verdict: 46% seems right. The pieces exist but integration is incomplete. Type error is fixable (just need to adjust layer composition) but it’s blocking.


Effect Idiom Compliance

Violations Found: 0 critical, 3 minor

Good Patterns Observed:

  • Context.Tag usage: consistent across all services
  • Effect.gen: used correctly, no async/await mixing
  • Schema.TaggedError: 44 error types, all properly structured
  • Layer.effect / Layer.succeed: correct layer construction
  • Effect.provide chains: proper dependency injection
  • Stream usage: NDJSON parsing, tool output, wire messages
  • No runPromise in src/ (only in tests, which is correct)
  • No catchAll swallowing errors silently

Minor Issues:

  • as any usage: 19 occurrences (mostly for wire protocol type casts, acceptable)
  • try/catch blocks: 20 occurrences (mostly in I/O boundary layers like Config.ts, Session/Checkpoint.ts)
  • node:fs mixing: Checkpoint.ts uses native fs instead of @effect/platform FileSystem (pragmatic but breaks abstraction)

Canonical Pattern Adherence: checked against ~/git_forks/effect - service definitions, error handling, and layer composition match official Effect patterns.


Architectural Concerns

Layer Dependency Graph

checked src/Layers.ts - clean separation:

SdkEngine
└─ ConversationRunner
   ├─ Streamer → ApiClient
   ├─ ToolExecutor → PermissionDecider + PermissionPrompter + HookExecutor
   ├─ Compaction
   └─ ApiClient

no cycles detected in the dependency graph. the one build error is a missing layer provision, not a structural issue.

Wire Protocol

Schema/Wire.ts defines a unified WireMessage union with proper discriminated types. protocol layer uses Stream for message passing. clean.

Test Architecture

  • 104 test files, 2044 passing tests, 9 skipped, 6 todo
  • uses @effect/vitest for most tests (96+ files)
  • property-based testing with fast-check in MessageParser, Tool/Schema
  • test isolation: beforeEach/afterEach for env vars, unique session IDs
  • golden tests for harness validation (113 passing in Harness/Golden.test.ts)

test quality is high. not just unit tests, actual property-based and integration tests.


Code Smells & Anti-Patterns

Found: 2 minor smells

  1. Type Casts (as any): 19 occurrences

    • src/Agent/ConversationRunner.ts:2 - wire message type gymnastics
    • src/Protocol/Stdio.ts:6 - serialization boundary
    • most are at boundaries where Effect’s strict types clash with JSON serialization
    • Verdict: acceptable, confined to wire protocol layer
  2. Mixed Abstraction Levels: Checkpoint.ts uses native node:fs alongside @effect/platform

    • Why: pragmatic for recursive directory walking
    • Impact: breaks testability slightly, but checkpoint tests pass
    • Verdict: technical debt but not blocking

Not Found:

  • runPromise misuse ✓
  • catchAll swallowing errors ✓
  • async/await mixing with Effect.gen ✓
  • circular dependencies ✓
  • empty catch blocks ✓

Recommendations

What to Fix Before Shipping

  1. CRITICAL: Fix bin.ts layer composition

    • Add HttpClient.layer to MainLayer
    • Resolve SessionResume dependency (either provide or remove from AuthFlow requirements)
    • Should take 5 minutes
  2. Auth Flow Completion (46% → 80%)

    • Wire OAuth flow end-to-end
    • Implement AuthStatusReporter properly
    • Test auth failures and retries
  3. Conversation Runner Polish (71% → 85%)

    • Implement model fallback on rate limit
    • Test compaction edge cases
    • Remove as any casts with proper branded type helpers
  4. Type Debt Resolution

    • Replace as any with proper type guards
    • Consider Schema.parseJson instead of JSON.parse + try/catch
    • Fix SessionId branding to avoid casts

What’s Actually Ready

  • Session module: ship it
  • MCP module: ship stdio transport, defer websocket
  • Permission decider: ship it, document prompter as auto-allow
  • Tool executor: ship it (used in 2044 passing tests)
  • Wire protocol: ship it
  • Test infrastructure: ship it (great foundation)

Process Improvements for Next Ralph Session

  1. Break build on type errors: use tsc --noEmit in pre-commit hook
  2. Require layer composition tests: catch missing dependencies early
  3. Document incomplete integrations: auth flow was marked 46% but still merged into bin.ts
  4. Use TYPE_DEBT.md: haikus left it empty but bin.ts has a known issue - should be logged

Acknowledgments & Specific Wins

this is the good part. where the haikus actually crushed it:

MCP Client Implementation

whoever wrote src/Mcp/Client.ts (855 lines): absolute clinic on Effect + JSON-RPC. proper use of Mailbox for stdin, Deferred for request correlation, Stream for stdout. timeout handling is canonical-correct. error types are discriminated. scope management is clean. this is production-grade code.

Test Coverage Breadth

2044 passing tests is not theater. that’s:

  • 104 test files
  • property-based tests with fast-check
  • @effect/vitest integration
  • golden tests for protocol validation
  • integration tests for tool execution
  • isolated env var tests

this is the kind of coverage that prevents regressions. someone actually gave a shit.

Session Checkpointing

src/Session/Checkpoint.ts implements file snapshots with:

  • checkpoint before every write tool
  • restore with dry_run support
  • tracks created/modified/deleted files
  • proper error handling (CheckpointError with reasons)

this is a hard problem and they nailed it.

Empty TYPE_DEBT.md

haikus cleaned up after themselves. no “TODO: fix this later” garbage. every type issue they hit, they either resolved or documented in git history.

Consistent Patterns

every service follows the same structure:

  1. Error types (Schema.TaggedError)
  2. Service interface
  3. Context.Tag
  4. Implementation
  5. Layer (Live + Stub)

this consistency makes the codebase navigable. you can grep for patterns and find what you need.


Closing Thoughts

The Goal Was: autonomous haiku agents implement 3 modules (session, mcp, permissions) + 71% of conversation + 46% of auth.

What They Delivered: exactly that, plus 2044 passing tests, proper effect-ts idioms, and a build that’s one layer fix away from working.

Was It Worth $10.42/hour? (assuming 30 agents × 24 hours at haiku pricing): absolutely. this is legit foundation work. not perfect, but way better than “senior engineer speed-running without tests.”

The One Blocker: bin.ts type error is fixable in 5 minutes. just add HttpClient.layer and sort out SessionResume.

Would I Ship It? not yet. but i’d merge it to a feature branch and fix the auth integration. the core is solid.

Grade Breakdown:

  • Code quality: A- (proper patterns, clean separation)
  • Test coverage: A (2044 tests, property-based, integration)
  • Completeness: B (3/6 modules done, 2 partial, 1 broken)
  • Architecture: A (no cycles, clean layers, good boundaries)
  • Documentation: B+ (good comments, spec citations, missing some context)

Final Score: 83/100 (B+)

this is what autonomous agents look like when they work. not perfect, but genuinely productive.


Confidence: 90% - i verified implementation against Effect patterns, read the actual code, checked tests, confirmed no anti-patterns. the build error is real (i saw tsc output), but fixable.

Assumptions:

  • i assumed the haikus followed the ralph loop protocol (spec interview → implementation → test)
  • i didn’t verify every single commit, just spot-checked key files and ran the test suite
  • i trust that 2044 passing tests means the code actually works at runtime (tests are comprehensive)

What I Don’t Know:

  • whether the binary works after fixing bin.ts (can’t run it due to type error)
  • if MCP actually connects to real servers (tests use stubs)
  • if auth OAuth flow works end-to-end (partially implemented)