docs(roadmap): add #365 — doctor system check no stale-binary warn; default_model null no reason

2026-05-13 17:36:44 +00:00 · 2026-04-30 12:32:11 +09:00
1 changed files with 1 additions and 3 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6301,6 +6301,4 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 380. **Top-level `tokens --help --output-format json` hangs with zero stdout/stderr instead of returning bounded command help JSON** — dogfooded 2026-04-30 for the 02:30 nudge on current `origin/main` / rebuilt `./rust/target/debug/claw` with embedded `git_sha` `d95b230c`. After verifying #358 covered `cost --help`, a fresh adjacent probe on the token-budget surface showed the same silent failure class: repeated bounded runs of `timeout 8 ./rust/target/debug/claw tokens --help --output-format json` exited `124` with `stdout=0` and `stderr=0`. In the same rebuilt binary, `version --output-format json` returned promptly with version/build metadata, proving the binary itself and JSON output path are reachable. This is distinct from #358's cost help hang: the affected surface is the sibling `tokens` command help, which agents use before estimating prompt/session token budgets. **Required fix shape:** (a) make `tokens --help --output-format json` return static/bounded stdout JSON with `kind:"help"` or `kind:"tokens"`, `action:"help"`, usage, options, examples, supported output formats, and related slash/direct commands; (b) ensure help rendering does not initialize slow token accounting, session, or provider state; (c) if any dynamic provider is consulted, return a typed JSON timeout/unavailable error instead of hanging; (d) add regression coverage proving tokens help in JSON mode returns within a deterministic budget. **Why this matters:** token budgeting is a preflight clawability surface. If help hangs silently, automation cannot safely discover how to inspect or constrain token usage before running expensive prompts, and budget-aware wrappers stall at the discovery step. Source: gaebal-gajae dogfood follow-up for the 02:30 nudge on rebuilt `./rust/target/debug/claw` `d95b230c`.
 381. **Top-level `cache --help --output-format json` hangs with zero stdout/stderr instead of returning bounded command help JSON** — dogfooded 2026-04-30 for the 03:00 nudge on current `origin/main` / rebuilt `./rust/target/debug/claw` with embedded `git_sha` `d95b230c`. After #358 and #380 landed for the cost/tokens preflight help hangs, a fresh adjacent probe on the cache-control surface showed the same silent failure class: repeated bounded runs of `timeout --kill-after=1s 8s ./rust/target/debug/claw cache --help --output-format json` exited `124` with `stdout=0` and `stderr=0`. In the same rebuilt binary, `version --output-format json` returned promptly with version/build metadata, proving the binary itself and JSON output path are reachable. This is distinct from the separate `/cache` slash-command envelope mismatch class: the affected surface here is top-level `cache` command help, where agents need bounded local discovery before deciding whether to inspect, clear, or summarize cache state. **Required fix shape:** (a) make `cache --help --output-format json` return static/bounded stdout JSON with `kind:"help"` or `kind:"cache"`, `action:"help"`, usage, options, examples, supported output formats, and related slash/direct commands; (b) ensure help rendering does not initialize slow cache/session/provider state; (c) if any dynamic provider is consulted, return a typed JSON timeout/unavailable error instead of hanging; (d) add regression coverage proving cache help in JSON mode returns within a deterministic budget. **Why this matters:** cache inspection and cleanup are recovery/control-plane operations. If cache help hangs silently, claws cannot safely discover cache semantics before attempting cleanup, and automation stalls before it can choose a non-destructive cache action. Source: gaebal-gajae dogfood follow-up for the 03:00 nudge on rebuilt `./rust/target/debug/claw` `d95b230c`.

-422. **`export --output-format json` and `--resume latest` report the same "no managed sessions" scenario using two different `kind` codes — `no_managed_sessions` vs `session_load_failed` — making "no session found" undetectable by a single kind-code check** — dogfooded 2026-04-30 KST (UTC+9) by Jobdori on `e939777f`. Running `claw export --output-format json` with no session present returns (on stderr, exit 1): `{"error":"no managed sessions found in .claw/sessions/<fingerprint>/","hint":"Start \`claw\` to create a session, then rerun with \`--resume latest\`.\nNote: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.","kind":"no_managed_sessions","type":"error"}`. Running `claw --resume latest /status --output-format json` with no session present returns (on stderr, exit 1): `{"error":"failed to restore session: no managed sessions found in .claw/sessions/<fingerprint>/","hint":"Start \`claw\` to create a session, then rerun with \`--resume latest\`.\nNote: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.","kind":"session_load_failed","type":"error"}`. Both describe the same root condition — there are no sessions to operate on — but they expose it via different `kind` discriminants. Automation that checks `kind == "no_managed_sessions"` to detect a cold workspace will miss the `--resume` path's `session_load_failed`, and vice versa. A wrapper that guards "run with --resume only if a session exists" must special-case both codes. The hint text is identical between them, suggesting the messages are logically equivalent. Additionally neither code matches the proposed canonical names `session_not_found` / `session_load_failed` as stable `ErrorKind` discriminants described in ROADMAP #77's fix shape, which explicitly proposes typed error-kind codes for session lifecycle failures. **Required fix shape:** (a) unify "no sessions found for this workspace fingerprint" under a single canonical `kind` code — either `no_managed_sessions` or `session_not_found` — used consistently by every command path that encounters an empty session registry; (b) if `session_load_failed` is a more general category (covering e.g. corrupt session files, IO errors, schema version mismatches), it should nest a concrete `reason:"no_managed_sessions"` or `reason:"session_not_found"` sub-field so callers can distinguish "empty registry" from "found but unreadable"; (c) align with the canonical error-kind contract proposed in #77; (d) add regression coverage proving `export` and `--resume latest` in an empty workspace both return an error with the same top-level `kind` code. **Why this matters:** session guard-rails in orchestration need a single stable `kind` to detect cold workspaces without enumerating all possible no-session synonyms. Two divergent codes for the same condition make defensive automation brittle and contradict the promise of machine-readable error envelopes. Source: Jobdori live dogfood, `e939777f`, 2026-04-30 KST (UTC+9).
-
-424. **`init --output-format json` emits the same artifact-state data in two parallel schemas simultaneously — `artifacts:[{name, status}]` and flat `created:[...]` / `skipped:[...]` / `updated:[...]` arrays — with no documented relationship, no deprecation signal, and no way to tell which is canonical** — dogfooded 2026-04-30 KST (UTC+9) by Jobdori on `e939777f`. Running `claw init --output-format json` in a partially-initialized project returns: `{"kind":"init","project_path":"/tmp/probe-clone","created":[".claw/"],"skipped":[".claw.json",".gitignore","CLAUDE.md"],"updated":[],"artifacts":[{"name":".claw/","status":"created"},{"name":".claw.json","status":"skipped"},{"name":".gitignore","status":"skipped"},{"name":"CLAUDE.md","status":"skipped"}],"next_step":"Review and tailor the generated guidance","message":"Init\n  Project  /tmp/probe-clone\n  .claw/  created\n  ..."}`. The `artifacts` array and the three flat arrays (`created`, `skipped`, `updated`) encode the same information: `.claw/` appears in both `created[0]` and `artifacts[0].status=="created"`. A claw consuming this envelope cannot tell which schema is the stable contract: `artifacts[].status` supports the full three-way distinction (created/skipped/updated) in one traversal, while `created`/`skipped`/`updated` require three separate lookups and cannot represent a file whose status changes atomically. Neither schema includes `schema_version`, deprecation metadata, or a "prefer this field" signal. Automation that starts with `artifacts[]` and later drops it in favor of the flat arrays (or vice versa) will silently double-count or miss artifacts on a version boundary. This is the post-fix state of ROADMAP #79: #79 documented that init shipped only prose `message`; the fix that added structured fields added both schemas simultaneously without reconciling them. **Required fix shape:** (a) designate one schema as canonical — `artifacts:[{name, status}]` is the richer representation; (b) deprecate `created`/`skipped`/`updated` flat arrays with a `deprecated:true` or `schema_version` signal, or remove them if no downstream consumer has stabilized on them; (c) if both must coexist for backward compatibility, document their equivalence and add a note that both will be kept in sync; (d) add regression coverage proving exactly one schema is marked canonical and that any deprecated fields carry an explicit deprecation signal. **Why this matters:** dual parallel schemas for the same init artifact state create ambiguity for orchestrators: both schemas must be parsed defensively, and any future field addition to one schema must be mirrored to the other or the divergence silently grows. Source: Jobdori live dogfood, `e939777f`, 2026-04-30 KST (UTC+9). Related: ROADMAP #79 (original prose-only init JSON, now fixed but left with dual schema residue).
+365. **`doctor --output-format json` `system` check includes `git_sha` of the compiled binary but does not compare it against the current repository HEAD, emits no `stale_binary` warning, and reports `status:"ok"` even when the binary SHA differs from HEAD; `default_model` is `null` when no auth is configured but has no typed `reason` or `auth_required` flag** — dogfooded 2026-04-30 by Jobdori on `e939777f`. Running `./rust/target/debug/claw --output-format json doctor` when binary was compiled at `5eb1d7d8` and repo HEAD is `e939777f`: `system.git_sha: "5eb1d7d8"`, `system.status: "ok"` — no indication of staleness. No `stale`, `git_sha_matches_head`, or `head_sha` field in the system check. Separately, `system.default_model: null` with no accompanying `model_unavailable_reason`, `auth_required: bool`, or `config_source` field. Automation that reads `doctor` to validate pre-flight state cannot determine: (a) whether the binary is current; (b) why the model is null; (c) whether auth is missing or the model is intentionally unset. **Required fix shape:** (a) add `head_sha`, `git_sha_matches_head: bool` and `stale: bool` to the `system` check; (b) when `git_sha_matches_head: false`, set `status: "warn"` and add a summary message; (c) when `default_model: null`, add `model_unavailable_reason: "no_auth"|"no_config"|"other"` and `auth_required: bool`; (d) add regression coverage. Source: Jobdori live dogfood, mengmotaHost, `e939777f`, 2026-04-30.