docs(roadmap): add subagent lane event conformance gap

docs(roadmap): add worker replay receipt integrity gap
docs(roadmap): add mcp tool bridge registry drift gap
2026-05-24 14:36:44 +00:00 · 2026-05-22 21:31:15 +00:00 · 2026-05-22 21:01:16 +00:00 · 2026-05-22 20:31:24 +00:00 · 2026-05-22 20:01:13 +00:00 · 2026-05-22 19:31:27 +00:00
1 changed files with 323 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6428,3 +6428,326 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)


 450. **`prompt` emits `kind:"missing_credentials"` JSON on STDERR (not stdout), leaving stdout at 0 bytes — automation pattern `output=$(claw prompt hello --output-format json)` captures nothing on auth-absent failure; `doctor` correctly surfaces `auth.status:"warn"` with `api_key_present:false` but exposes no `prompt_ready:false` field that automation can check before invoking `prompt`** — dogfooded 2026-05-16 by Jobdori on `a35ee9a0` in response to Clawhip pinpoint nudge at `1505208225321062521`. Exact reproduction (isolated env, no creds, fresh git repo, HEAD `a35ee9a0`): `timeout 5 env -i HOME=$ISOLATED_HOME PATH=$PATH CLAW_CONFIG_HOME=$PROBE/.claw-cfg claw prompt hello --output-format json > stdout.txt 2> stderr.txt` → stdout = **0 bytes**, stderr = 195 bytes containing `{"error":"missing Anthropic credentials…","exit_code":1,"hint":null,"kind":"missing_credentials","type":"error"}`, exit code 1. Confirms Gaebal's `1505208553793781792` pinpoint that `prompt` timeout + zero bytes was the prior state — HEAD `a35ee9a0` now correctly exits 1 with `kind:"missing_credentials"` **but the envelope is still routed to stderr** (issue #447 class, same class as prior entries #422, #435). **Contrast with `doctor`:** `claw doctor --output-format json 2>/dev/null` succeeds to stdout with `checks[auth].status:"warn"`, `api_key_present:false`, `auth_token_present:false` — but the auth check has no `prompt_ready:false` field. Automation that gates on `doctor` before invoking `prompt` must re-derive readiness from `api_key_present && auth_token_present` — there is no single canonical boolean. **Three compound problems:** (a) **stdout-empty on `--output-format json` failure**: same class as #447; `prompt`'s error envelope goes to stderr, not stdout. The canonical automation idiom `if ! result=$(claw prompt "q" --output-format json); then echo "$result" | jq .kind; fi` sees `$result=""` on failure — the jq call gets nothing. All `--output-format json` error paths must route JSON to stdout per #447 contract; (b) **`doctor` missing `prompt_ready` field**: `doctor --output-format json` already knows auth is absent (`api_key_present:false`) but surfaces no derived `prompt_ready:bool` or `prompt_blocked_reason:string` field. Automation must infer readiness from `api_key_present || auth_token_present || legacy_*_present` — a 5-field OR across legacy fields that is fragile as auth mechanisms evolve. A single `prompt_ready:false` (with `prompt_blocked_reason:"auth_missing"`) inside the `auth` check would give downstream a stable contract; (c) **`claw prompt` with no auth does no preflight and fires straight at the API**: the preflight check that `doctor` runs (auth discovery) is not reused by `prompt` to emit a fast typed error before attempting the network call. Both Gaebal's pinpoint (prompt hanging silently on older HEAD) and the current behavior (prompt hitting auth gate after a brief API attempt) stem from the same root: prompt does not short-circuit at the point where `doctor` already knows auth is absent. If `doctor` can emit `kind:"doctor"` with `auth.status:"warn"` in ~20ms without a network call, `prompt` should emit `kind:"missing_credentials"` in the same window and output it to stdout. **Required fix shape:** (a) `prompt --output-format json` must write the `kind:"missing_credentials"` JSON envelope to **stdout**, not stderr — same fix as #447 for all error envelopes; (b) add `prompt_ready:bool` and `prompt_blocked_reason:string|null` to the `auth` check in `doctor --output-format json`; derive it as `api_key_present || auth_token_present || legacy_saved_oauth_present`; (c) `prompt` must run the credential preflight check (same codepath as doctor's auth check) before attempting any API call and emit `{"kind":"missing_credentials","prompt_blocked_reason":"auth_missing"}` on **stdout** with exit 1 if the check fails; (d) `--output-format json` stdout routing fix must cover: `prompt`, `session list` (cross-ref #449), `skills uninstall` (cross-ref #431), `resume` (cross-ref #435), `acp serve` (cross-ref #443) — the full `kind:"missing_credentials"` class; (e) regression test: `claw prompt hello --output-format json` with no creds writes JSON to stdout (0 bytes stderr), exits 1, `kind:"missing_credentials"`, in under 200ms (no network attempt). **Why this matters:** `prompt` is the primary consumer entry point. Auth-absent failure routing to stderr breaks every automation wrapper that captures `$(claw prompt ... --output-format json)`. The `doctor` preflight metadata gap means auth-readiness checks require parsing 5 legacy fields instead of reading one boolean. Cross-references #447 (all JSON error envelopes on stderr), #449 (session list hits auth gate), #431 (skills uninstall hits auth gate), #357 (auth gate on local ops cluster), #422 (exit-code parity). Source: Jobdori live dogfood, `a35ee9a0`, 2026-05-16.
+
+451. **Dogfood automation can silently run probes in the wrong repository/worktree when adjacent checkouts share similarly named binaries and stale build artifacts, so reports may mix evidence from Clawdbot/OMX with claw-code** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 12:00/12:30 UTC nudge cycle. While following a tmux-hook JSON lifecycle probe, the shell reported `/home/bellman/clawd` as the top-level worktree and executed `node dist/cli/omx.js ...` from Clawdbot/OMX artifacts instead of a claw-code checkout; a later correction found the actual claw-code repos under `/home/bellman/Workspace/claw-code-*`, including one `main` checkout hundreds of commits behind `origin/main`. The transcript therefore briefly contained plausible-looking CLI evidence from the wrong product tree before git provenance checks caught it. **Required fix shape:** (a) before dogfood probes, emit a mandatory machine-readable provenance preflight with repo root, remote URL, branch, HEAD, upstream HEAD, ahead/behind counts, binary path, and embedded build SHA when available; (b) make report templates include this provenance block before any command evidence; (c) warn or block when the requested product name does not match the remote/package/binary identity, or when the checkout is behind the target upstream by a configured threshold; (d) add regression coverage around multi-worktree/multi-product environments proving dogfood harnesses cannot silently attribute evidence from a neighboring repo or stale artifact. **Why this matters:** stale-branch confusion is not just a git annoyance; it corrupts the evidence chain. Claws can land or report fixes against the wrong codebase if the harness does not prove repo and binary identity before probing. Source: gaebal-gajae dogfood response to Clawhip messages `1506265193863446711` and `1506272743895728249` on 2026-05-19.
+
+
+452. **Validated claw-code checkouts can have no runnable local debug binary, and the failure is a raw shell `No such file or directory` instead of a typed build/provenance preflight result** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 13:00 UTC nudge after applying the provenance guard from #451. The corrected claw-code checkout was `/home/bellman/Workspace/claw-code-pr2967` with remote `https://github.com/ultraworkers/claw-code.git`, branch `docs/roadmap-workdir-provenance`, HEAD `6183d95`, upstream `origin/main` at `f8e1bb7`, and ahead/behind `1/0`. The first real probe then failed before reaching claw-code logic: `timeout --kill-after=1s 8s ./rust/target/debug/claw plugins list --output-format json` exited `127` with stderr `timeout: failed to run command './rust/target/debug/claw': No such file or directory` and empty stdout. This is distinct from stale-binary mismatch: here the selected checkout is identifiable, but there is no built binary and no canonical instruction/result telling automation whether to build, locate an installed `claw`, or stop. **Required fix shape:** (a) provide a canonical `claw dogfood preflight --output-format json` or equivalent script that checks expected binary paths, installed binary fallback, embedded build SHA, workspace HEAD, and build freshness before any product probe; (b) when the expected local binary is absent, return a typed result such as `kind:"dogfood_preflight"`, `binary_status:"missing"`, `expected_path`, `recommended_build_command`, and `can_use_installed_binary:false|true`; (c) integrate the preflight into dogfood report templates so a missing build artifact is reported as startup friction, not a raw shell 127; (d) add regression/fixture coverage for missing binary, stale binary, matching debug binary, and installed-binary fallback cases. **Why this matters:** after #451 proves the repo is right, claws still need to prove the executable exists and corresponds to that repo. A raw shell missing-file error wastes a nudge cycle and tempts operators to run whatever stale binary happens to be nearby. Source: gaebal-gajae dogfood response to Clawhip message `1506280293831544997` on 2026-05-19.
+
+
+453. **Plugin list JSON can report bundled plugin `source` paths from a stale user registry in a different checkout, with no stale-source warning or current-bundled-root distinction** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 13:30 UTC nudge after a successful local build in `/home/bellman/Workspace/claw-code-pr2967` (branch `docs/roadmap-workdir-provenance`, HEAD `25d663d`, binary `./rust/target/debug/claw` reporting `git_sha:"25d663d"`). Running `./rust/target/debug/claw plugins list --output-format json` returned structured `plugins[]`, but both bundled plugin entries reported `source` under `/home/bellman/Workspace/claw-code-parity-worktrees/clawcode-ux-enhance/...` instead of the current checkout. Cleaning and rebuilding the `plugins` crate did not change the output; the stale paths came from `~/.claw/plugins/installed.json`, where bundled plugin records persisted old `source.path` values. The JSON payload gave no `source_stale`, `source_exists`, `current_bundled_root`, `registry_path`, or `source_origin:"registry"` cue, so automation would treat another worktree's bundled plugin path as current truth. **Required fix shape:** (a) for bundled plugins, derive/display source from the current binary/workspace bundled root rather than a persistent user registry path when possible; (b) if registry source is retained, expose `registry_path`, `source_origin`, `source_exists`, `source_matches_current_bundle_root`, and `current_bundled_root` fields; (c) warn in text mode and JSON diagnostics when bundled plugin registry records point outside the current binary/workspace provenance; (d) add regression coverage where `installed.json` contains stale bundled paths from another checkout and `plugins list --output-format json` either self-heals or marks the source stale. **Why this matters:** plugin lifecycle actions rely on source provenance. If a fresh build from checkout A reports bundled plugin sources from checkout B, claws can inspect, enable, update, or debug the wrong plugin tree and misattribute lifecycle failures to current code. Source: gaebal-gajae dogfood response to Clawhip message `1506287843021160500` on 2026-05-19.
+
+
+454. **`plugins help --output-format json` returns a success-shaped plugin inventory plus `Unknown /plugins action 'help'` prose instead of structured plugin command help or supported lifecycle actions** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 14:00 UTC nudge on a freshly built binary from `/home/bellman/Workspace/claw-code-pr2967` (`./rust/target/debug/claw version --output-format json` reported `git_sha:"25d663d"`; the worktree had roadmap-only commits ahead of that source build). Running `./rust/target/debug/claw plugins help --output-format json` exited `0` and returned JSON with `kind:"plugin"`, `action:"help"`, `status:"ok"`, a full `plugins[]` inventory, and message `Unknown /plugins action 'help'. Use list, install, enable, disable, uninstall, or update.` It did not return `supported_actions[]`, usage, action metadata, destructive-action markers, target requirements, or a typed `unsupported_action` / `help_unavailable` status. The same probe also confirmed `plugins show example-bundled --output-format json` still returns `status:"ok"` plus an unknown-action message and no selected `plugin` object, matching the existing unsupported-show class, but the new pinpoint is the absence of a plugin lifecycle discovery/help contract. **Required fix shape:** (a) implement `plugins help --output-format json` as a real help/discovery payload with `supported_actions[]`, per-action `requires_target`, `destructive`, `resume_safe`/automation notes, usage, and examples; (b) if `help` is intentionally unsupported, return a non-ok typed JSON envelope with `code:"unsupported_plugin_action"` and structured `supported_actions[]`, not `status:"ok"`; (c) avoid attaching full plugin inventory to unsupported/help responses unless requested, or mark it as incidental; (d) add regression coverage proving plugin lifecycle help is machine-readable and does not require scraping `message` for available actions. **Why this matters:** plugin lifecycle commands include install/enable/disable/update/uninstall; claws need a safe discovery surface before attempting mutations. A success-shaped unknown-help response with only prose action names keeps lifecycle automation brittle and encourages trial-and-error against mutating commands. Source: gaebal-gajae dogfood response to Clawhip message `1506295397285494905` on 2026-05-19.
+
+
+455. **Plugin help entrypoints hang before producing any help bytes, and with normal user config they emit only a deprecation warning before timeout** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Bounded probes of `plugins --help --output-format json`, `plugins help --output-format json`, and `plugins list --help --output-format json` each timed out after 8s with `stdout=0`; under the normal user config each had `stderr=121` containing only the repeated config deprecation warning, and under an isolated clean `HOME`/`CLAW_CONFIG_HOME` even `plugins --help` and JSON help forms timed out with both stdout and stderr empty. This is distinct from #454's success-shaped unknown-help JSON observed on the action-dispatch path: the flag-style/local help path can hang before returning any help or typed JSON at all. **Required fix shape:** (a) route `plugins --help`, `plugins help`, and subcommand help forms through static help rendering before config/plugin registry loading; (b) make JSON help return bounded stdout JSON with `kind:"plugin"`, `action:"help"`, `supported_actions[]`, and usage metadata; (c) if dynamic plugin state is intentionally consulted, enforce a short internal timeout and return a typed `plugin_help_unavailable` JSON error instead of zero-byte hangs; (d) add regression coverage with clean home and deprecated-config home proving plugin help emits bytes promptly and does not initialize slow lifecycle/registry paths. **Why this matters:** help must be the safe escape hatch when plugin lifecycle is broken. If every plugin help spelling can hang before bytes, claws cannot discover valid plugin actions, recover from registry issues, or explain how to fix lifecycle state without external docs. Source: gaebal-gajae dogfood response to Clawhip message `1506302942754639892` on 2026-05-19.
+
+
+456. **Static `--help --output-format json` hangs across multiple local lifecycle namespaces under a clean home, so help discovery is not a reliable no-side-effect escape hatch** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After pushing the accumulated roadmap branch, clean-environment probes using isolated `HOME` and `CLAW_CONFIG_HOME` showed that `mcp --help --output-format json`, `agents --help --output-format json`, `skills --help --output-format json`, `memory --help --output-format json`, and `session --help --output-format json` each timed out after 8s with `stdout=0` and `stderr=0`. This extends the plugin-specific #455 into a shared parser/help-layer gap: even command namespaces whose help should be static and local can enter a zero-byte hang before any JSON or text help is emitted. **Required fix shape:** (a) centralize `--help`/`help` handling before config, auth, registry, session, MCP, memory, or plugin initialization for all local lifecycle namespaces; (b) make `--output-format json` help return a bounded stdout payload with `kind:"help"`, namespace, usage, supported actions/sections, output formats, and side-effect/auth requirements; (c) add a global deterministic help timeout guard that returns typed JSON such as `kind:"help_unavailable"` instead of allowing zero-byte hangs; (d) add clean-home regression coverage for `mcp`, `agents`, `skills`, `memory`, `session`, and `plugins` help forms proving they emit bytes promptly and do not touch slow lifecycle providers. **Why this matters:** help is the only safe discovery path when lifecycle state is broken. If help itself can hang with no bytes, claws cannot learn how to inspect or recover MCP, agents, skills, memory, sessions, or plugins without external docs or guesswork. Source: gaebal-gajae dogfood response to Clawhip message `1506310493080518767` on 2026-05-19.
+
+
+457. **Root help is bounded in text mode, but `help --output-format json` and command `--help --output-format json` convert help into a zero-byte hang** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In an isolated clean `HOME`/`CLAW_CONFIG_HOME`, `./rust/target/debug/claw --help` and `./rust/target/debug/claw help` both exited 0 and printed 7403 bytes of root text help. But `help --output-format json`, `version --help --output-format json`, `doctor --help --output-format json`, and `status --help --output-format json` each timed out after 8s with `stdout=0` and `stderr=0`. This narrows #456: the help renderer itself is capable of returning promptly, but the parser/order path that combines help with JSON output appears to route into a different slow/non-returning path. **Required fix shape:** (a) parse `--help`/`help` before selecting command execution paths, auth, provider, session, or lifecycle initialization, while still preserving requested output format; (b) make root and command help JSON static/bounded with `kind:"help"`, `scope`, `command`, `usage`, `options`, `examples`, and `supported_output_formats`; (c) add regression coverage proving `claw --help`, `claw help`, `claw help --output-format json`, and representative command help forms all return within a small deterministic budget under clean home; (d) ensure JSON help does not silently fall back to text or zero-byte timeout. **Why this matters:** this is a parser-order failure in the safest command surface. Operators can get text help, but the moment automation asks for JSON discovery, help becomes a hang, forcing claws back to prose scraping or external docs. Source: gaebal-gajae dogfood response to Clawhip message `1506318038167847045` on 2026-05-19.
+
+
+458. **Global output-format flag ordering is parser-hostile, and even `version --output-format json` can hang under clean env despite normal-env success** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with normal-env `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Clean-environment flag-order probes showed `--output-format json --help`, `--output-format json help`, `--output-format json version`, `--version --output-format json`, and `--output-format json --version` each failed immediately with text-mode stderr only, e.g. `[error-kind: cli_parse] error: unknown option: --output-format json --help`, and `stdout=0`. The parser appears to treat the whole trailing string as one unknown option rather than recognizing a global output-format flag before the command. Separately, in the same clean `HOME`/`CLAW_CONFIG_HOME`, canonical `version --output-format json` timed out after 8s with `stdout=0`/`stderr=0`, even though the same command succeeds in the normal environment. **Required fix shape:** (a) make global flags such as `--output-format json` accepted before or after the subcommand, or return structured JSON `cli_parse` errors on stdout when JSON format is requested anywhere in argv; (b) parse flag values as separate tokens in error reporting instead of echoing combined strings like `--output-format json --help` as one option; (c) ensure `version` is fully local/static and cannot hang under clean env or missing config/auth; (d) add clean-env regression coverage for `version --output-format json`, `--output-format json version`, `--version --output-format json`, and JSON parse errors with stdout envelopes. **Why this matters:** claws often put global flags first for CLI uniformity and run in sanitized envs. If global JSON selection is order-sensitive and local version can hang only under clean env, startup probes become unreliable exactly in CI/sandbox contexts. Source: gaebal-gajae dogfood response to Clawhip message `1506325598245748828` on 2026-05-19.
+
+
+459. **Dogfood timeout claims lack a required retry/evidence contract, so transient hangs can be recorded as durable product gaps without immediate reproducibility metadata** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 16:30 UTC nudge while narrowing #458 on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. The previous 16:00 pass saw clean-env `version --output-format json` time out with zero bytes, but a focused 16:30 retry matrix using isolated `HOME`/`CLAW_CONFIG_HOME` plus minimal env variants (`TERM`, `USER`/`LOGNAME`, `SHELL`, `LANG`/`LC_ALL`, and all combined) returned valid version JSON every time. That means the earlier timeout may have been transient harness load, process scheduling, or invocation interference, while the report format had no mandatory retry count, timing, command log artifact, or “reproduced N/M” field to distinguish flaky evidence from stable behavior. **Required fix shape:** (a) dogfood timeout reports must include retry count, per-attempt exit code/stdout/stderr byte counts, elapsed duration, env summary, binary provenance, and whether the failure reproduced after process isolation; (b) add a standard `timeout_evidence` block to report templates and ROADMAP entries before filing zero-byte hang claims; (c) classify un-reproduced hangs as `flaky_unconfirmed` with follow-up probes instead of stable product bugs; (d) provide a small harness command that runs bounded retries and emits machine-readable evidence JSON. **Why this matters:** zero-byte timeouts are high-severity but easy to misattribute. Without a retry/evidence contract, claws can pollute the backlog with transient scheduler artifacts or miss real nondeterministic hangs because the evidence shape is too thin. Source: gaebal-gajae dogfood response to Clawhip message `1506333141571211314` on 2026-05-19.
+
+
+460. **Root `help --output-format json` is reproducibly bounded but only wraps 7.4KB of prose in `{kind,message}`, with no structured command or slash-command schema** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Applying the retry/evidence discipline from #459, three clean-home attempts of `./rust/target/debug/claw help --output-format json` all exited 0 with `stdout=7563`, `stderr=0`, valid JSON keys exactly `kind,message`, `kind:"help"`, and a `message` string of length 7401. There were no `commands[]`, `options[]`, `slash_commands[]`, resume-safety flags, output-format support metadata, side-effect/auth requirements, or examples as structured fields. This supersedes the flaky zero-byte hang framing from #457 for root help: the stable reproducible gap is schema opacity, not timeout. **Required fix shape:** (a) keep `message` for human rendering but add a versioned structured help schema with `schema_version`, `commands[]`, `global_options[]`, `slash_commands[]`, `examples[]`, and `related_docs[]`; (b) include per-command fields such as `name`, `aliases`, `usage`, `description`, `supports_json`, `requires_auth`, `side_effects`, and `resume_safe` where applicable; (c) expose slash-command metadata without requiring prose scraping; (d) add regression coverage proving root help JSON has stable structured fields and that old `message` remains optional/backward-compatible. **Why this matters:** help JSON is the bootstrap discovery surface for claws. Valid JSON that contains only prose still forces automation to scrape text before choosing safe commands or resume paths. Source: gaebal-gajae dogfood response to Clawhip message `1506340691431657472` on 2026-05-19.
+
+
+461. **Command-specific `--help --output-format json` reproducibly zero-byte hangs even though root JSON help returns bounded prose JSON** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After #460 confirmed root `help --output-format json` is bounded but schema-opaque, two clean-home attempts each for `status --help --output-format json`, `doctor --help --output-format json`, `version --help --output-format json`, and `sandbox --help --output-format json` all timed out after 8s with `stdout=0` and `stderr=0`. This establishes a split contract: root JSON help reaches the help serializer, while command-specific JSON help falls into a non-returning command execution/parser path. **Required fix shape:** (a) add a command-help dispatch layer that catches `<command> --help` before entering the command's runtime handler; (b) share the same bounded JSON help schema from #460 for command-specific help, with fields `kind:"help"`, `command`, `usage`, `options`, `examples`, `supports_json`, `requires_auth`, and `side_effects`; (c) ensure local/static commands like `version`, `status`, `doctor`, and `sandbox` never initialize slow providers just to render help; (d) add clean-home regression coverage proving command-specific JSON help emits bytes promptly for representative static and lifecycle commands. **Why this matters:** claws often discover command contracts one command at a time. If root help is available but every command's JSON help hangs, automation still cannot inspect option-level semantics safely and must scrape root prose or guess. Source: gaebal-gajae dogfood response to Clawhip message `1506348241128788111` on 2026-05-19.
+
+
+462. **Text-mode `<command> --help` also zero-byte hangs, proving the bug is command-help dispatch rather than JSON serialization** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After #461 showed command-specific JSON help hangs, a text-mode retry matrix in isolated clean `HOME`/`CLAW_CONFIG_HOME` showed two attempts each for `status --help`, `doctor --help`, `version --help`, `sandbox --help`, `mcp --help`, `agents --help`, and `skills --help` all timed out after 6s with `stdout=0` and `stderr=0`. Root `--help` and `help` remain bounded, so the split is not help text rendering generally and not JSON formatting specifically; it is the command-specific help dispatch path failing to intercept `<command> --help` before some non-returning runtime path. **Required fix shape:** (a) implement a first-stage argv parser that recognizes `<command> --help` and `<command> help` for every registered command before command runtime initialization; (b) render static text help in text mode and structured JSON help in JSON mode from the same command metadata registry; (c) add regression coverage for both text and JSON help for representative static commands (`version`, `status`, `doctor`, `sandbox`) and lifecycle commands (`mcp`, `agents`, `skills`, `plugins`); (d) ensure every command-help path has a bounded no-provider/no-auth/no-config execution budget. **Why this matters:** users do not only run root `--help`; they naturally ask `claw status --help` or `claw mcp --help`. If that path hangs silently, the product loses the most basic local recovery surface before any real action starts. Source: gaebal-gajae dogfood response to Clawhip message `1506355792582938665` on 2026-05-19.
+
+
+463. **Root help advertises direct slash examples like `claw /skills`, but direct slash behavior is inconsistent: `/skills` runs a huge local report, `/help` aliases root help, while `/status` is rejected as interactive-only despite being marked resume-safe** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Root help examples include `claw /skills`, and clean-home probes showed direct `claw /skills` exits 0 and prints a 27KB skill inventory, while direct `claw /help` exits 0 and prints root help. But direct `claw /status` exits 1 with `slash command /status is interactive-only... use claw --resume SESSION.jsonl /status ... when the command is marked [resume] in /help`, even though `/status` is explicitly marked `[resume]` in root help and has a top-level sibling `claw status`. The same surface therefore mixes three semantics for direct slash invocation: accepted alias, accepted local slash report, and rejected interactive-only/resume-only command. **Required fix shape:** (a) define a single direct-slash CLI contract: either reject all slash commands outside REPL/resume with structured guidance, or allow resume-safe/local slash commands consistently; (b) if allowing direct slash commands, route `/status` to the same local/resume-safe status serializer as `claw status` or require `--resume` with a typed `resume_required` code; (c) make help examples distinguish top-level commands from slash commands and avoid advertising `claw /skills` unless direct slash invocation is intentional for the whole supported set; (d) add regression coverage for `/help`, `/skills`, `/status`, `/mcp`, `/agents`, and their top-level equivalents proving consistent direct/resume/text/json behavior. **Why this matters:** claws copy help examples literally. If `claw /skills` works but `claw /status` says interactive-only despite `[resume]`, automation cannot infer which slash commands are safe outside the REPL and will oscillate between direct, top-level, and resume forms. Source: gaebal-gajae dogfood response to Clawhip message `1506363340048564425` on 2026-05-19.
+
+
+464. **Global `--output-format json` placement is broken for top-level subcommands: post-subcommand placement silently hangs, while pre-subcommand placement is rejected as `cli_parse` despite help documenting `--output-format` as a global flag** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 19:00/19:30 UTC nudges on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Root help lists `--output-format FORMAT` in the global Flags section and examples like `claw [--model MODEL] [--output-format text|json] prompt TEXT`, while top-level commands (`claw status`, `claw doctor`, `claw mcp`, `claw skills`, `claw version`) advertise local diagnostic/report surfaces. Clean-home probes showed two distinct bad paths: `claw version --output-format json`, `claw status --output-format json`, and `claw skills --output-format json` timed out after 5s with `stdout=0`/`stderr=0`; the more canonical GNU-style global placement `claw --output-format json version`, `claw --output-format json status`, and `claw --output-format json skills` exited 1 with `[error-kind: cli_parse] error: unknown option: --output-format json <command>`. The only working form observed for `version` is the special local parser path used by `claw version --output-format json` in a non-clean environment/provenance preflight, which contradicts the clean-home bounded test and suggests the parser/runtime path is environment-sensitive as well as placement-sensitive. **Required fix shape:** (a) make `--output-format` a true global flag accepted before any subcommand and before slash commands; (b) make top-level local commands also accept trailing `--output-format` if the project wants common CLI ergonomics; (c) normalize both placements into one parsed `OutputFormat` before command dispatch; (d) ensure parse errors in JSON-requested mode emit structured JSON rather than prose-on-stderr; (e) add clean-home regression coverage for `claw --output-format json version|status|doctor|mcp|skills` and `claw version|status|doctor|mcp|skills --output-format json`, with bounded no-provider execution. **Why this matters:** claws and shell users routinely place global flags either before or after subcommands. A machine-readable output flag that sometimes hangs and sometimes parse-errors means automation cannot reliably request JSON for the exact local diagnostics it needs during recovery. Source: gaebal-gajae dogfood response to Clawhip messages `1506370889653027012` and `1506378443812765756` on 2026-05-19.
+
+
+465. **`claw skills` is technically bounded but operationally noisy: the default text output dumps full descriptions for every discovered skill (~27KB / 65 skills in clean-home dogfood), unlike `status`, `doctor`, `mcp`, `sandbox`, and `agents` which stay compact enough for recovery logs** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Clean-home text probes showed `version`, `status`, `doctor`, `mcp`, `sandbox`, and `agents` all exit promptly with compact diagnostic reports (`version` 136 bytes, `status` 1274 bytes, `doctor` 2519 bytes, `mcp` 117 bytes, `sandbox` 352 bytes, `agents` 17 bytes). `skills` also exits 0, but emits 27,573 bytes by default because it prints every skill name plus full description. This makes `claw skills` a poor first-response diagnostic in clawhip logs, tmux tails, CI failure artifacts, and copied support snippets: the useful inventory count and roots are buried under pages of prose. This is distinct from discovery-scope/security issues (#85/#95): even when the skill set is legitimate, the default report shape is too verbose for a recovery command. **Required fix shape:** (a) make default text `claw skills` compact: counts by source/root plus names only, truncated with an explicit `--verbose` hint; (b) add `claw skills --verbose` or `claw skills list --verbose` for full descriptions; (c) add `--format compact|verbose` or reuse `--compact` if the CLI standardizes it; (d) keep JSON mode complete but add `summary` and `entries[].description` fields so consumers can choose; (e) add regression coverage enforcing that default text output for 50+ skills stays under a small byte/line budget while verbose preserves current detail. **Why this matters:** local diagnostics should be safe to paste and scan during failures. A 27KB default skill dump hides the actual signal and makes every dogfood/support loop noisier than necessary. Source: gaebal-gajae dogfood response to Clawhip message `1506385992368652489` on 2026-05-19.
+
+
+466. **Syntactically valid MCP config with a nonexistent command is reported as healthy/configured by `doctor`, `mcp`, and `status` instead of surfacing executable reachability as degraded** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In a temp workspace containing only `.claw.json` with `mcpServers.broken.command:"/definitely/not/a/real/mcp-server"`, clean-home text probes showed `doctor` exits 0 with `Failures 0`, `Config Status ok`, and `Boot preflight ... mcp=true · servers 1`; `mcp` exits 0 and lists `broken stdio project /definitely/not/a/real/mcp-server --serve`; `status` exits 0 and reports `Boot preflight ... mcp=true plugins=true last_failed=none`. None of the three surfaces attempts even a cheap executable existence/PATH reachability check, so a server that cannot possibly launch is indistinguishable from a configured usable MCP server until the first runtime tool call fails. This is distinct from malformed-config degraded-mode items (#143/#144/#440): here the JSON shape is valid and the parser succeeds, but lifecycle readiness is false. **Required fix shape:** (a) add an MCP executable preflight pass that classifies each stdio server as `reachable:true|false|unknown` using absolute-path existence/executable-bit checks and PATH lookup for bare commands, without launching untrusted code; (b) expose per-server fields in `claw mcp` / JSON (`launch_status`, `command_exists`, `command_executable`, `path_resolution`, `error_kind:"mcp_command_not_found"`); (c) make `doctor` add an `mcp_reachability` check with `status:"warn"` or `"fail"` when any configured server command is missing; (d) make `status` distinguish `mcp_configured:true` from `mcp_reachable:false` instead of the current single `mcp=true`; (e) regression test with one valid `/bin/echo` server and one nonexistent absolute path proving partial status is reported. **Fresh mixed-PATH proof (21:00 UTC):** a temp config containing one PATH-resolvable server (`pathEcho.command:"echo"`) and one missing bare command (`missingBare.command:"definitely-not-a-real-mcp-bare-command-xyz"`) still made `mcp` list both as configured, `doctor` report `Failures 0` and `mcp=true · servers 2`, and `status` report `mcp=true` with no degraded marker. So the fix must handle both absolute paths and PATH lookup for bare commands, and it must preserve partial success (`echo` reachable, missingBare not found) instead of collapsing both into `configured`. **Why this matters:** MCP failures are often setup/path mistakes. If doctor says failures 0 and mcp=true while the command path does not exist, claws and users waste turns discovering the break only after a blocked tool call. Source: gaebal-gajae dogfood response to Clawhip messages `1506393539385491598` and `1506401088801083495` on 2026-05-19.
+
+
+467. **Bare `claw plugins` / `claw plugin` returns the plugin inventory, but the natural explicit action `claw plugins list` / `claw plugin list` zero-byte hangs; singular/plural aliases are wired for the default action but not for the `list` subaction** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In isolated clean `HOME`/`CLAW_CONFIG_HOME`, `plugins` and `plugin` each exited 0 and printed the two bundled plugins (`example-bundled`, `sample-hooks`) in text mode. But `plugins list` and `plugin list` each timed out after 8s with `stdout=0` and `stderr=0`. `plugins help` and `plugin help` also timed out, while direct `/plugins` and `/plugin` were rejected as interactive-only, showing that there are at least three parser paths for the same lifecycle namespace. This is a narrower current-main follow-up to the historical plugin route/help items (#78/#145/#348/#420/#454/#455): the route now exists for the default inventory, but adding the explicit canonical `list` action drops into a different non-returning path. **Required fix shape:** (a) normalize `plugin` and `plugins` aliases plus omitted action into one parser action before dispatch (`None` and `Some("list")` must be equivalent); (b) route `help` to static plugin help before plugin registry/lifecycle initialization; (c) add a deterministic timeout guard or typed `plugin_action_unavailable` envelope around plugin action dispatch so unsupported/misparsed actions cannot hang silently; (d) add clean-home regression coverage for `plugins`, `plugins list`, `plugin`, `plugin list`, `plugins help`, and `plugin help` in text and JSON modes, proving the first four produce the same inventory and help emits bounded usage. **Why this matters:** `list` is the most obvious plugin lifecycle action and is already the documented/discovery word from other lifecycle namespaces (`mcp list`, `agents list`, `skills list`). If bare inventory works but explicit list hangs, claws cannot safely choose between terse default and explicit action forms, and plugin support remains trial-and-error. Source: gaebal-gajae dogfood response to Clawhip message `1506408638883954768` on 2026-05-19.
+
+
+468. **Explicit subactions hang across local lifecycle namespaces: bare `agents`, `mcp`, `skills`, `sandbox`, and `doctor` return bounded diagnostics, but natural explicit forms (`agents list`, `mcp list`, `skills list`, `sandbox status`, `doctor check`) zero-byte hang** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 22:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Clean-home text probes showed the bare commands are usable: `agents` exits 0 (`No agents found.`), `mcp` exits 0 with `Configured servers 0`, `skills` exits 0 with inventory, `sandbox` exits 0 with sandbox state, and `doctor` exits 0 with the full health report. But adding the most natural explicit action caused zero-byte timeouts after 8s: `agents list`, `mcp list`, `skills list`, `sandbox status`, and `doctor check` all produced `stdout=0` and `stderr=0`. This broadens #467's plugin-specific `plugins list` hang into a shared parser/action-normalization bug: omitted default action works, but spelling the default action explicitly routes to a non-returning path. **Required fix shape:** (a) define a per-namespace action table where omitted action normalizes to the documented default (`list` for inventory namespaces, `status` for status namespaces, `check`/`run` only if supported); (b) reject unsupported explicit actions with typed bounded errors (`unknown_action`, `supported_actions[]`) instead of falling through to prompt/runtime dispatch; (c) make `agents list`, `mcp list`, and `skills list` equivalent to their bare forms; decide/document whether `sandbox status` and `doctor check` are aliases or unsupported; (d) add clean-home regression coverage for bare/default/unsupported explicit actions in text and JSON across agents, mcp, skills, plugins, sandbox, and doctor. **Why this matters:** users and claws naturally prefer explicit commands in automation (`mcp list`, `agents list`) because they are self-documenting. If explicit defaults hang while bare commands work, every script has to learn undocumented terse forms and cannot safely derive commands from help/usage text. Source: gaebal-gajae dogfood response to Clawhip message `1506416189616947210` on 2026-05-19.
+
+
+469. **Unexpected positional arguments after local verbs zero-byte hang instead of returning bounded parse errors; even `version extra` hangs, proving the parser does not fail closed once a local subcommand has extra tokens** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 22:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In isolated clean `HOME`/`CLAW_CONFIG_HOME`, bounded 8s probes of `version extra`, `status extra`, `doctor extra`, `sandbox extra`, `agents extra`, `mcp extra`, `skills extra`, and `plugins extra` all timed out with `stdout=0` and `stderr=0`. This is broader than #468's explicit-default-action hang: the extra token does not need to be a plausible action like `list` or `status`; any unexpected positional can route a local, no-auth diagnostic command into a non-returning path. It also refreshes the older extra-arg/fallthrough findings (#127/#147) with current-binary evidence where the failure mode is now a silent hang rather than a visible missing-credentials prompt. **Required fix shape:** (a) every local subcommand parser must declare accepted positional arity and reject extra args before prompt/runtime dispatch; (b) emit a typed bounded error such as `kind:"unexpected_argument"`, `command`, `argument`, `supported_usage`, and `supported_actions[]` where relevant; (c) in JSON-requested mode, route that envelope to the documented JSON stream with nonzero exit; (d) add regression coverage for `version extra`, `status extra`, `doctor extra`, `sandbox extra`, `agents extra`, `mcp extra`, `skills extra`, and `plugins extra` proving they return promptly and do not enter provider/prompt/plugin/session runtime. **Why this matters:** typos and stale wrapper scripts commonly append leftover tokens. A local health command must fail closed with usage guidance, not hang with no bytes; otherwise orchestrators cannot distinguish typo, parser deadlock, provider stall, or lifecycle startup block. Source: gaebal-gajae dogfood response to Clawhip message `1506423740597141665` on 2026-05-19.
+
+
+470. **Root flag aliases fail closed on extra tokens, but the word alias `help` zero-byte hangs with any trailing token (`help extra`, `help --help`, `help --version`) instead of returning root help or a bounded arity error** — dogfooded 2026-05-19 from the `#clawcode-building-in-public` 23:00/23:30 UTC nudges on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. Clean-home probes showed the flag aliases are healthy/bounded: `--version` and `-V` exit 0 with version text; `--version extra`, `-V extra`, and `--help extra` exit 1 with `[error-kind: cli_parse]`. But the word help alias is fragile: `help extra`, `help --help`, and `help --version` each timed out after 6s with `stdout=0` and `stderr=0`. This narrows #469: not all root alias extra-token paths hang; the flag parser has a fail-closed path, while the `help` word alias enters the same non-returning command/prompt dispatch when trailing tokens are present. **Required fix shape:** (a) treat `help` as a first-class root command with strict arity: bare `help` renders root help; `help <topic>` either renders a static topic if supported or returns typed `unknown_help_topic`; extra flags like `help --help` and `help --version` must be handled locally; (b) never route `help ...` into provider/prompt/runtime dispatch; (c) make JSON mode preserve the same bounded behavior with `kind:"help"` or `kind:"unknown_help_topic"`; (d) add clean-home regression coverage for `help`, `help extra`, `help --help`, `help --version`, `--help extra`, `--version extra`, and `-V extra`. **Why this matters:** `help` is the recovery primitive users type when every other command is broken. If `--help extra` reports a parse error but `help extra` hangs silently, claws cannot rely on the documented `claw help` alias as a safe discovery surface. Source: gaebal-gajae dogfood response to Clawhip messages `1506431285977940009` and `1506438835775737939` on 2026-05-19.
+
+
+471. **Root version aliases (`--version`, `-V`) have no JSON form and reject `--output-format json`, while the `version --output-format json` subcommand hangs under clean home — version provenance is not reliably machine-readable from the safest startup path** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 00:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with provenance preflight `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"` in the normal environment. In isolated clean `HOME`/`CLAW_CONFIG_HOME`, `version --output-format json` timed out after 6s with `stdout=0` and `stderr=0`, while `--version --output-format json`, `--output-format json --version`, `-V --output-format json`, and `--output-format json -V` all exited 1 with `[error-kind: cli_parse] unknown option: ...`. Plain `--version`/`-V` work in text mode, but there is no bounded JSON equivalent through the root flag aliases, and the subcommand JSON path is environment/placement-sensitive. This is a focused provenance/startup slice of #464: the one command claws use to prove binary identity has split semantics across root flags and subcommand forms. **Required fix shape:** (a) make `--version --output-format json` and `--output-format json --version` valid aliases for `version --output-format json`; (b) make `-V` support the same formatting contract; (c) ensure `version --output-format json` is config-free/help-free/provider-free and bounded under clean home; (d) emit a typed JSON parse error on stdout/stderr according to the project-wide JSON error-stream contract if an unsupported combination remains; (e) add clean-home regression coverage for all five forms above. **Why this matters:** binary provenance is the first fact every dogfood report and orchestrator needs. If root version flags only produce text and the JSON subcommand can hang, claws cannot safely establish which executable they are testing without relying on a special normal-environment preflight. Source: gaebal-gajae dogfood response to Clawhip message `1506446390212169839` on 2026-05-20.
+
+
+472. **Malformed project `.claw.json` is silently treated as `loaded 0/N` while `doctor` still says `Config Status ok` and `runtime config loaded successfully`; `status` reports no config error at all** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 00:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. In a temp workspace with an invalid `.claw.json` containing truncated JSON (`{"mcpServers":{"bad":{}`), root recovery surfaces correctly stayed config-free: `--help`, `help`, `--version`, `-V`, and `version` all exited 0. But config-aware diagnostics silently downplayed the broken file: `status` exited 0 and reported `Config files loaded 0/5` with no parse error or degraded marker; `doctor` exited 0 with `Failures 0`, `Config Status ok`, `Summary runtime config loaded successfully`, and details `Config files loaded 0/1` plus the discovered malformed file path. This is not the older #143 fatal-status path; it is the opposite failure mode: broken config is detected enough to count as present/discovered, but the parse error is dropped and the health summary remains OK. **Required fix shape:** (a) preserve config parse/load errors per discovered file even when continuing with defaults; (b) make `doctor` config check `status:"fail"` or `"warn"` when any discovered config file fails to parse, never `ok`; (c) make `status` expose `config_load_error` / `config_files[].error` and a degraded marker while still reporting independent workspace fields; (d) distinguish `present_count`, `loaded_count`, and `failed_count`; (e) add regression coverage with malformed/truncated `.claw.json` proving root help/version remain config-free while status/doctor surface the parse error. **Why this matters:** a user with a broken config needs diagnostics to point at the exact file and parse problem. Saying `runtime config loaded successfully` while also saying `loaded 0/1` is contradictory and makes the tool look healthy while silently ignoring the user's intended MCP/plugins/permissions/model settings. Source: gaebal-gajae dogfood response to Clawhip message `1506453939443204247` on 2026-05-20.
+
+
+473. **Slash-only command guidance is inconsistent: bare `claw compact` fails fast with useful guidance, but `claw compact --help` and `claw compact extra` zero-byte hang instead of returning the same guidance/help or a bounded arity error** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 01:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`. After confirming the remote `docs/roadmap-workdir-provenance` branch still had HEAD `e9db12d` (Jobdori's claimed `f997a1a8` was not present on origin at probe time), compact-surface probes showed `claw compact` exits 1 with a clear local message: `` `claw compact` is a slash command. Use `claw --resume SESSION.jsonl /compact` or start `claw` and run `/compact`.`` Direct `/compact` also exits 1 with interactive/resume guidance, and `/compact --confirm` returns a bounded unexpected-argument help block. But `compact --help` and `compact extra` each timed out after 6s with `stdout=0` and `stderr=0`. This is a parser/help arity bug distinct from the compaction-internals sentinel issue Jobdori described: the CLI already knows how to explain that `compact` is slash-only, but adding `--help` or any extra token bypasses that safe guidance path. **Required fix shape:** (a) route `compact --help` to static compact help/guidance before prompt/runtime dispatch; (b) route `compact <unexpected>` to a typed bounded `unexpected_argument` or `slash_only_command` error with resume usage; (c) make slash-only top-level shims share one helper so bare/--help/extra forms cannot drift; (d) add clean-home regressions for `compact`, `compact --help`, `compact extra`, `/compact`, `/compact --confirm`, and JSON-mode equivalents. **Why this matters:** users who follow the bare-command guidance and ask for help on the same word hit a silent hang. Recovery commands must converge toward usage, not fall off a different parser cliff as soon as a user adds `--help`. Source: gaebal-gajae dogfood response to Clawhip message `1506461484950093967` on 2026-05-20.
+
+
+474. **Local diagnostic verbs and slash-only shims hang on `--help`/extra-token forms even when their bare or slash forms already have bounded guidance (`status --help`, `status extra`, `doctor --help`, `doctor extra`, `clear --help`, `clear --confirm`, `clear extra`)** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 01:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`; branch and origin were both `docs/roadmap-workdir-provenance@1208b9a` before editing. Bounded clean-home probes showed `claw clear` exits 1 with useful slash-only guidance, `/clear` and `/clear --confirm` return bounded interactive/resume guidance, and `/clear extra` returns a bounded usage block. But `clear --help`, `clear --confirm`, and `clear extra` each timed out after 6s with `stdout=0`/`stderr=0`. The same arity/help drift affects bare local diagnostics: `status --help`, `status extra`, `doctor --help`, and `doctor extra` also timed out after 6s with no bytes, despite bare `status`/`doctor` being local, bounded diagnostic commands in other probes. This generalizes #473 from `compact` to the parser layer: after a recognized local verb, trailing tokens are not routed through a command-specific help/arity table and can enter the non-returning prompt/runtime path. **Required fix shape:** (a) define per-command arity/help metadata for every local verb before prompt fallback; (b) `status --help` and `doctor --help` render static help; `status extra` and `doctor extra` return typed `unexpected_argument`; (c) `clear --help` renders slash/resume guidance, while `clear --confirm` either maps to `/clear --confirm` guidance or returns typed `slash_only_command` without dispatch; (d) centralize slash-only top-level shim handling for `compact`, `clear`, and future resume-capable slash commands; (e) add clean-home timeout-guarded regression coverage for all forms above. **Why this matters:** `status` and `doctor` are the commands operators reach for during startup/config breakage. If adding `--help` or a stale wrapper token makes them silently hang, the recovery surface is unreliable and automation cannot distinguish operator typo from runtime deadlock. Source: gaebal-gajae dogfood response to Clawhip message `1506469039562555573` on 2026-05-20.
+
+
+475. **Local diagnostic commands (`status`, `doctor`, `sandbox`) have no bounded JSON output-form contract: bare text works, suffix `--output-format json` hangs, and prefix `--output-format json <command>` is rejected as a parse error** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 02:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`; branch and origin were both `docs/roadmap-workdir-provenance@9495dbe` before editing. Clean-home probes showed `status`, `doctor`, and `sandbox` exit 0 with useful human-readable text. But `status --output-format json`, `doctor --output-format json`, and `sandbox --output-format json` each timed out after 6s with `stdout=0` and `stderr=0`. The prefix spelling `--output-format json status|doctor|sandbox` exits 1 with `[error-kind: cli_parse] unknown option`, also with no JSON envelope. This is distinct from #474's help/arity hang: here the option is a documented global formatting concept already used by `version` and prompt mode, but local diagnostics neither honor it nor fail closed consistently. **Required fix shape:** (a) define a project-wide rule for `--output-format` placement and apply it to local diagnostics; (b) support `status --output-format json`, `doctor --output-format json`, and `sandbox --output-format json` with stable `kind:"status"|"doctor"|"sandbox"` envelopes; (c) either support prefix global placement or reject it with a typed JSON-capable parse envelope; (d) ensure these paths are config/provider/prompt-free and bounded under clean home; (e) add timeout-guarded clean-home regressions for bare text, suffix JSON, and prefix JSON forms. **Why this matters:** operators and orchestrators need machine-readable health snapshots during startup failures. If the only working forms are text and JSON requests hang, automation has to scrape prose or misclassify a formatting request as runtime deadlock. Source: gaebal-gajae dogfood response to Clawhip message `1506476585543532596` on 2026-05-20.
+
+
+476. **Read-only inventory commands (`agents`, `mcp`, `skills`, `plugins`) have no bounded JSON/list contract: bare text works, suffix `--output-format json` hangs, prefix global JSON is rejected, and `skills` dumps a huge prose list with no machine-readable counts/paths** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 02:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`; branch and origin were both `docs/roadmap-workdir-provenance@bb2cf3f` before editing. Clean-home probes showed bare inventory commands are local and bounded: `agents` prints `No agents found.`, `mcp` prints a short `Configured servers 0` report, `plugins` prints two disabled bundled plugins, and `skills` prints `65 available skills` plus ~27KB of prose descriptions. But `agents --output-format json`, `mcp --output-format json`, `skills --output-format json`, and `plugins --output-format json` each timed out after 6s with `stdout=0`/`stderr=0`. Prefix forms `--output-format json agents|mcp|skills|plugins` exit 1 with `[error-kind: cli_parse] unknown option` and no JSON envelope. This is adjacent to #475 but distinct: these are inventory surfaces that orchestrators need for routing, MCP/plugin lifecycle checks, and skill selection, not just health diagnostics. **Required fix shape:** (a) provide stable JSON envelopes for each command (`kind:"agents"|"mcp"|"skills"|"plugins"`) with counts, source paths, enabled/disabled state, and parse/load errors where relevant; (b) make `--output-format json` suffix bounded and optionally support/predictably reject prefix placement via the global formatting contract; (c) add compact text/list modes for `skills` so default output is not an unbounded prose wall; (d) ensure inventory JSON does not initialize providers or interactive prompt runtime; (e) add clean-home timeout-guarded regressions for bare text, suffix JSON, and prefix JSON forms. **Why this matters:** claws need structured inventories to decide which skills/plugins/MCP servers are available. Scraping 27KB of human prose or hitting a silent hang on JSON makes routing brittle and masks lifecycle breakage. Source: gaebal-gajae dogfood response to Clawhip message `1506484138927067286` on 2026-05-20.
+
+
+477. **`system-prompt` only works as a bare command; every documented modifier (`--date`, `--cwd`, `--help`, `--output-format json`) and even invalid modifier cases zero-byte hang, making prompt provenance/reproducibility unusable outside the current cwd/date defaults** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 03:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`; branch and origin were both `docs/roadmap-workdir-provenance@8afdb94` before editing. Bare `system-prompt` exits 0 and prints the expected prompt with dynamic context, including working directory, date, git status, recent commits, and `CLAUDE.md` content. But `system-prompt --date 2026-05-20`, `system-prompt --cwd /tmp --date 2026-05-20`, `system-prompt --help`, `system-prompt extra`, `system-prompt --date nope`, `system-prompt --cwd /definitely/not/here`, and `system-prompt --output-format json` each timed out after 6s with `stdout=0` and `stderr=0`. Prefix global JSON (`--output-format json system-prompt`) exits 1 with `[error-kind: cli_parse] unknown option`. This is worse than a missing JSON parity issue: the help text advertises `claw system-prompt [--cwd PATH] [--date YYYY-MM-DD]`, but the advertised valid modifiers enter the non-returning path. **Required fix shape:** (a) parse `system-prompt` options before prompt/runtime fallback and make `--cwd`/`--date` valid, bounded, and deterministic; (b) validate `--date` format and nonexistent `--cwd` with typed errors; (c) implement `system-prompt --help`; (d) support or typed-reject `--output-format json` with a stable prompt-provenance envelope (`kind:"system_prompt"`, `cwd`, `date`, `project_context_sources`, `git_snapshot`, `text`); (e) add clean-home regression tests for bare, valid modifier, invalid modifier, help, extra arg, and JSON forms. **Why this matters:** `system-prompt` is the main way to debug prompt misdelivery and stale-context complaints. If users cannot pin cwd/date or get structured provenance, they cannot reproduce why a session received the wrong instructions. Source: gaebal-gajae dogfood response to Clawhip message `1506491684064723004` on 2026-05-20.
+
+
+478. **`export` has a bounded bare no-session error, but every advertised flag/help form (`--help`, `--session`, `--output`, `--output-format json`) zero-byte hangs, so session artifact export cannot be scripted or debugged from the documented CLI surface** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 03:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with `./rust/target/debug/claw version --output-format json` reporting `git_sha:"25d663d"`; branch and origin were both `docs/roadmap-workdir-provenance@90a0d38` before editing. Clean-home probes showed bare `export` exits 1 with a good typed error: `[error-kind: no_managed_sessions]`, the partitioned `.claw/sessions/<fingerprint>/` path, and guidance to start `claw` or rerun with `--resume latest`. But `export --help`, `export extra`, `export --session latest`, `export --session definitely-missing`, `export --output /tmp/claw-export-test.md`, and `export --output-format json` each timed out after 6s with `stdout=0`/`stderr=0`. Prefix `--output-format json export` exits 1 with `[error-kind: cli_parse]` and no JSON envelope. The help text advertises `claw export [PATH] [--session SESSION] [--output PATH]`, but those flags are not safely parsed. **Required fix shape:** (a) parse `export` options before prompt/runtime fallback; (b) implement `export --help`; (c) make `--session latest`, missing session IDs, positional PATH, and `--output PATH` return bounded results/errors; (d) support or typed-reject `--output-format json` with an envelope including `kind:"export"`, session fingerprint/path, output path, and skipped/error reason; (e) add clean-home regressions for bare no-session, help, extra arg, missing/latest session, output path, positional path, and JSON forms. **Why this matters:** `export` is the evidence path for stale session, prompt misdelivery, and event/log opacity bugs. If bare no-session works but every real option hangs, operators cannot reliably produce or inspect artifacts when debugging clawability issues. Source: gaebal-gajae dogfood response to Clawhip message `1506499237972545576` on 2026-05-20.
+
+479. **`skills` accepts listing/help surfaces, but unknown skill invocation names zero-byte hang instead of returning a typed `skill_not_found`/usage error, so typoed skill calls look like runtime stalls** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 04:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@4d52703` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes confirmed healthy bounded surfaces for `skills` (exit 0, `No skills found.`), `skills help`/`skills --help` (usage), and `skills --output-format json` (empty typed inventory). But the actual invocation/error path hangs silently: `skills garbage --output-format json`, `skills missing --output-format json`, `skills help missing --output-format json`, and `skills /nope --output-format json` each timed out after 6s with `stdout=0` and `stderr=0`. In contrast, sibling surfaces are bounded: `agents garbage --output-format json` exits 0 with a typed help envelope including `unexpected:"garbage"`, `mcp garbage --output-format json` does the same, and `skills install /nope --output-format json` exits 1 with a JSON error envelope. **Required fix shape:** (a) resolve the first post-`skills` token against installed skill names before falling into runtime prompt/tool execution; (b) for unknown names return a bounded JSON/text error with `kind:"skill_not_found"`, `name`, `available_count`, and a hint to run `claw skills list`; (c) reject extra args after `skills help` with typed usage instead of trying to invoke `help` as a skill; (d) preserve `skills list <extra>` behavior only if intentionally documented, otherwise reject the extra arg; (e) add clean-home regressions for list/help/json, unknown-name JSON/text, `help extra`, `/pathlike` unknown names, and install missing-path. **Why this matters:** skills are a plugin-like lifecycle surface. A typo or missing local skill should be diagnosable immediately; a zero-byte hang is indistinguishable from MCP startup deadlock or prompt misdelivery and makes automation wrappers unable to classify the failure. Source: gaebal-gajae dogfood response to Clawhip message `1506514333394403409` on 2026-05-20.
+
+480. **`claw session` correctly says it is resume-only when bare, but `session --help` and `session list` zero-byte hang instead of returning bounded guidance, so users trying to discover session management from the advertised slash command get no usable recovery path** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 05:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@93f20df` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probe: bare `claw session` exits 1 with a useful typed stderr error: `[error-kind: unknown] error: \`claw session\` is a slash command. Use \`claw --resume SESSION.jsonl /session\` or start \`claw\` and run \`/session\`.` But the natural discovery and list forms do not share that bounded path: `claw session --help` timed out after 6s with `stdout=0`/`stderr=0`, and `claw session list` timed out after 6s with `stdout=0`/`stderr=0`; a longer wrapper run had to be killed after the `session list` probe stopped producing output. **Required fix shape:** (a) treat `session` as an explicit top-level alias for resume-safe session inspection rather than falling into prompt/runtime dispatch; (b) implement `session --help` with the same usage as `/session [list|exists|switch|fork|delete]` plus the `--resume` requirement where applicable; (c) make `session list` return a bounded empty-list result or a typed `resume_required`/`no_managed_sessions` error rather than hanging; (d) support or typed-reject `--output-format json` for session discovery/list/exists forms; (e) add clean-home regressions for bare `session`, `session --help`, `session list`, `session exists missing`, and the documented `--resume latest /session list` no-session path. **Why this matters:** session management is the recovery surface for stale-session confusion and prompt misdelivery. A user who sees `/session` in help naturally tries `claw session --help` or `claw session list`; hanging there blocks the exact diagnostics needed to find or resume the broken session. Source: gaebal-gajae dogfood response to Clawhip message `1506521887231180986` on 2026-05-20.
+
+481. **Slash-only top-level aliases (`files`, `hooks`, `memory`) emit a useful bare resume-only error, but `--help` hangs with zero output, so users cannot discover the documented resume-safe command shape from the command they just tried** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 05:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51e6040` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes show the bare aliases are bounded: `claw files`, `claw hooks`, and `claw memory` each exit 1 with `[error-kind: unknown]` and a helpful message: `Use \`claw --resume SESSION.jsonl /<verb>\` or start \`claw\` and run \`/<verb>\`.` Their JSON forms are also bounded (`files --output-format json`, `hooks --output-format json`, `memory --output-format json`) with the same message in a JSON error envelope. But adding the most natural discovery flag hangs: `claw files --help`, `claw hooks --help`, and `claw memory --help` each timed out after 6s with `stdout=0`/`stderr=0`. In the same clean-home sweep, real top-level local verbs (`config`, `config --help`, `config env --output-format json`) returned immediately, proving this is specific to slash-only alias help fallback. **Required fix shape:** (a) centralize slash-only alias handling so `--help`/`help` never enter prompt/runtime dispatch; (b) return a bounded help page for every slash-only alias, including direct CLI usage (`claw --resume SESSION.jsonl /files`) and whether the command is resume-safe; (c) make JSON help/error envelopes include `kind:"slash_command_alias"`, `slash:"/files"`, `resume_required:true`, and supported resume forms; (d) add clean-home regressions for bare, `--help`, `help`, and JSON forms for `files`, `hooks`, `memory`, plus the already-found `session` sibling (#480). **Why this matters:** these are exactly the diagnostic surfaces users reach for during prompt misdelivery and stale-session debugging. A helpful bare error followed by a zero-byte hang on `--help` is a recovery dead-end. Source: gaebal-gajae dogfood response to Clawhip message `1506529436466675713` on 2026-05-20.
+
+482. **Session-inspection slash aliases (`cost`, `stats`, `history`, `tokens`) have bounded bare/JSON resume-only errors, but `--help` hangs with zero output, extending the slash-alias help dead-end to the telemetry/recovery commands users need during stale-session debugging** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 06:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@111e7e8` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home sweep showed local direct verbs behave differently: `diff` returns bounded no-git text/JSON and rejects `diff --help` with a bounded `unexpected extra arguments`; interactive-only write verbs (`commit`, `pr`, `issue`) return bounded bare/JSON errors and `--help` prints the global help. The broken cluster is the resume-safe telemetry alias family: `claw cost`, `claw stats`, `claw history`, and `claw tokens` each exit 1 with a useful resume-only message (`Use \`claw --resume SESSION.jsonl /<verb>\` or start \`claw\` and run \`/<verb>\`.`), and their `--output-format json` forms return bounded JSON error envelopes, but `claw cost --help`, `claw stats --help`, `claw history --help`, and `claw tokens --help` time out after 6s with `stdout=0`/`stderr=0`; the sweep had to be killed while stuck at `tokens --help`. **Required fix shape:** (a) fold the resume-safe telemetry slash aliases into the centralized slash-only alias help handler proposed in #481; (b) for each alias, return bounded text/JSON help listing the `--resume SESSION.jsonl /<verb>` form, accepted optional args (`history [count]`), and JSON support status; (c) do not fall through to prompt/runtime dispatch when `--help` follows any known slash alias; (d) add clean-home regressions for bare, `--help`, `help`, and JSON forms for `cost`, `stats`, `history`, `tokens`, `cache`, and `providers`. **Why this matters:** these commands are the observability surface for event/log opacity and stale-session confusion. If the first attempt to ask for help on `/tokens` or `/history` silently hangs, operators cannot tell whether the session store, runtime, or help parser is broken. Source: gaebal-gajae dogfood response to Clawhip message `1506536984976425070` on 2026-05-20.
+
+483. **Interactive-only slash aliases (`approve`, `deny`, `model`) hang on `--help`, while `permissions --help` already returns bounded inline usage, proving the alias-help behavior is inconsistent within the same command family** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 06:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@625b8b0` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home sweep: `claw approve`, `claw deny`, and `claw model` each exit 1 with a bounded interactive-only message (`Start \`claw\` and run \`/<verb>\` inside the REPL.`), and their `--output-format json` forms return bounded JSON error envelopes. But `claw approve --help`, `claw deny --help`, and `claw model --help` time out after 6s with `stdout=0`/`stderr=0`. Same sweep shows a nearby positive control: `claw permissions --help` exits 1 with the interactive-only message plus an inline usage block (`Usage /permissions [read-only|workspace-write|danger-full-access]`). Resume-safe siblings `cache`, `providers`, and `clear` also still hang on `--help`, matching #481/#482. **Required fix shape:** (a) extend the centralized slash-alias help handler to interactive-only aliases, not just resume-safe aliases; (b) model it after the existing bounded `permissions --help` behavior, returning the interactive-only reason plus per-command usage; (c) include aliases (`/approve` = `/yes`/`/y`, `/deny` = `/no`/`/n`) in help text/JSON; (d) add clean-home regressions for `approve --help`, `deny --help`, `model --help`, `debug-tool-call --help`, and assert `permissions --help` remains bounded. **Why this matters:** permission prompts and model switching are high-friction startup/operator surfaces. If users naturally type `claw approve --help` or `claw model --help`, the CLI must explain that these are REPL-only instead of silently hanging and looking like a dead runtime. Source: gaebal-gajae dogfood response to Clawhip message `1506544532026687562` on 2026-05-20.
+
+484. **Interactive helper slash aliases (`debug-tool-call`, `bughunter`, `teleport`) hang on `--help`, and argument-bearing invocations like `bughunter src --output-format json` also hang instead of returning an interactive-only error, so high-value debug/navigation entrypoints are misclassified as dead runtime work** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 07:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@dbd04ad` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home sweep: `claw bughunter` and `claw teleport` exit 1 with bounded interactive-only messages (`Start \`claw\` and run \`/<verb>\` inside the REPL.`), but `claw debug-tool-call --help`, `claw bughunter --help`, and `claw teleport --help` each timed out after 6s with `stdout=0`/`stderr=0`. Worse, a realistic argument-bearing form, `claw bughunter src --output-format json`, also timed out after 6s with zero output instead of saying `/bughunter` is REPL-only; the sweep had to be killed after `teleport --help` hung. **Required fix shape:** (a) classify debug/navigation slash aliases before prompt/runtime fallback, including when they carry positional arguments; (b) return bounded interactive-only help for `debug-tool-call`, `bughunter [scope]`, and `teleport <symbol-or-path>`; (c) for JSON mode, return an error envelope with `kind:"slash_command_repl_only"`, `slash`, `args`, and `usage`; (d) add clean-home regressions for bare, `--help`, JSON, and argument-bearing forms (`bughunter src`, `teleport main`, `ultraplan task`, `release-notes`). **Why this matters:** these are the tools operators reach for to diagnose brittle tests, find files, or replay tool calls. Hanging before the REPL boundary makes a simple usage mistake indistinguishable from prompt misdelivery or a stuck MCP/plugin lifecycle. Source: gaebal-gajae dogfood response to Clawhip message `1506552081589342383` on 2026-05-20.
+
+485. **Planning/output slash aliases (`ultraplan`, `release-notes`) still enter zero-byte hang paths on `--help`, and `ultraplan <task> --output-format json` hangs instead of returning a REPL-only error, while nearby `release-notes --output-format json` is bounded — proving slash-alias argument classification is incomplete, not uniformly broken** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 07:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51a450e` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw ultraplan --help` timed out after 6s with `stdout=0`/`stderr=0`; `claw ultraplan task --output-format json` also timed out with zero output; `claw release-notes --help` timed out with zero output. But `claw release-notes --output-format json` exits 1 with a bounded JSON error (`Start \`claw\` and run \`/release-notes\` inside the REPL.`). Nearby non-slash/plugin-like controls are bounded: `agents missing --output-format json` returns typed help with `unexpected:"missing"`; `mcp show missing --output-format json` returns `found:false`; `mcp list extra --output-format json` returns `unsupported_action`. **Required fix shape:** (a) add `ultraplan` and `release-notes` to the centralized slash-alias help/REPL-only classifier; (b) classify slash aliases before attempting to treat positional args as prompt text or runtime work; (c) return bounded text/JSON help for `ultraplan [task]` and `release-notes`, including whether any direct non-REPL invocation is intentionally unsupported; (d) add clean-home regressions for `ultraplan --help`, `ultraplan task --output-format json`, `release-notes --help`, `release-notes --output-format json`, plus positive controls for `agents`/`mcp` bounded unknown handling. **Why this matters:** `/ultraplan` is a prompt-construction/planning surface and `/release-notes` is an artifact/reporting surface. If direct invocations silently hang, users cannot distinguish unsupported REPL-only usage from prompt misdelivery, model startup, or artifact-generation deadlock. Source: gaebal-gajae dogfood response to Clawhip message `1506559635996414132` on 2026-05-20.
+
+486. **`prompt` has bounded help and missing-prompt errors, but real one-shot prompt invocations zero-byte hang under clean-home/no-credentials conditions instead of returning an auth/config error, so wrappers cannot distinguish model startup from CLI dispatch deadlock** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 08:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8a2e133` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw prompt --help` exits 0 with the global help, and `claw prompt --output-format json` exits 1 with a bounded JSON error (`prompt subcommand requires a prompt string`). But actual prompt dispatch hangs silently: `claw prompt hello --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; the prefix form `claw --output-format json prompt hello` also timed out with zero output; `claw prompt --compact hello` timed out with zero output, and the sweep had to be killed before the remaining modifier probes. In a clean-home environment with isolated `HOME`, `CLAW_CONFIG_HOME`, `XDG_CONFIG_HOME`, and `XDG_DATA_HOME`, a one-shot prompt without usable credentials should fail quickly with a typed auth/config error, not enter an opaque silent wait. **Required fix shape:** (a) add a preflight auth/provider/config check before one-shot prompt runtime startup; (b) if credentials/config are missing, return bounded text/JSON error (`kind:"missing_credentials"` or provider-specific typed error) before opening any long-running runtime path; (c) preserve bounded behavior for `prompt --help` and missing prompt string; (d) add clean-home regressions for `prompt hello --output-format json`, `--output-format json prompt hello`, `prompt --compact hello`, and `--compact prompt hello` verifying nonzero bounded error with stderr/stdout body; (e) include elapsed-time assertion so prompt startup failures cannot regress into hangs. **Why this matters:** `prompt` is the primary non-interactive automation surface. A zero-byte hang on a basic clean-home one-shot prompt turns missing setup into event/log opacity and makes CI/wrappers classify the CLI as dead rather than misconfigured. Source: gaebal-gajae dogfood response to Clawhip message `1506567181570277556` on 2026-05-20.
+
+487. **`prompt` validates missing strings and bad model syntax before runtime, but unknown/permissive post-prompt flags are swallowed into the runtime path and hang, so typoed one-shot modifiers become silent prompt misdelivery instead of CLI parse errors** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 08:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@32961cf` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: bare shorthand `claw hello` and `claw --output-format json hello` are bounded unknown-subcommand errors with a hint to use `claw prompt -- ...`; global unknown flags (`claw --definitely-unknown hello`) are bounded `cli_parse`; `claw prompt --output-format json` is a bounded missing-prompt error; `claw prompt hello --model bad` is a bounded invalid-model-syntax error. But `claw prompt hello --definitely-unknown`, `claw prompt hello --foo`, `claw prompt hello --allowedTools read`, and `claw prompt hello --permission-mode read-only` each entered runtime and timed out after 6s instead of validating or rejecting the modifier; the `--permission-mode` case even emitted only spinner control bytes (`⠋ 🦀 Thinking...`) before hanging. **Required fix shape:** (a) give `prompt` a strict subcommand-specific parser that separates prompt text from supported modifiers; (b) reject unknown post-prompt flags with `kind:"cli_parse"` and echo the offending flag; (c) support documented global modifiers consistently before or after `prompt` only if intentionally allowed, otherwise reject with a hint to put them before `prompt`; (d) in JSON mode, never emit spinner/control bytes and always return a bounded JSON error for preflight failures; (e) add clean-home regressions for `prompt hello --foo`, `prompt hello --allowedTools read`, `prompt hello --permission-mode read-only`, `prompt hello --model bad`, and global-vs-post-subcommand flag placement. **Why this matters:** one-shot `prompt` is the automation entrypoint. If a typoed or misplaced flag silently becomes part of the model runtime path, wrappers cannot tell whether the prompt was delivered, rejected, or is waiting on model startup. Source: gaebal-gajae dogfood response to Clawhip message `1506574734610137200` on 2026-05-20.
+
+488. **`status --output-format json` reports only git dirty counts (`changed_files`, `untracked_files`) but omits the actual changed/untracked path list, so automation cannot identify the files that make a workspace dirty without shelling out to git** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 09:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@92e97e0` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home git fixture: initialized a repo with one committed tracked file, then added untracked `untracked.txt` plus untracked `.gitignore` (with ignored `ignored.log`). `claw status --output-format json` returned `workspace.git_state:"dirty · 2 files · 2 untracked"`, `changed_files:2`, `staged_files:0`, `unstaged_files:0`, and `untracked_files:2`, but the JSON contained neither `untracked.txt` nor `.gitignore` anywhere; `ignored.log` was correctly absent, but there is no positive path inventory for the two files counted as dirty. Text status similarly summarizes counts without paths. **Required fix shape:** (a) add `workspace.changed_file_paths` or structured `workspace.git_files:{staged:[], unstaged:[], untracked:[]}` to status JSON; (b) cap the list with `truncated:true` and `total` fields for large repos; (c) preserve ignore behavior (ignored files stay absent unless an explicit include-ignored option lands); (d) add fixture regressions proving untracked `.gitignore` and regular untracked files appear by path while ignored files do not. **Why this matters:** status is the main machine-readable workspace preflight. Counts alone are insufficient for stale-branch/dirty-worktree gating, cleanup decisions, or explaining why a launch is blocked; wrappers must currently run their own `git status --porcelain`, duplicating logic and losing parity with claw's own dirty classification. Source: gaebal-gajae dogfood response to Clawhip message `1506582285292535818` on 2026-05-20.
+
+489. **`status --output-format json` branch freshness is computed from local remote-tracking refs only and does not fetch or report ref age, so a branch can be reported `fresh:true behind:0` while the remote already has new commits** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 09:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d541130` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: created a repo with `origin/master`, pushed initial commit, made one local commit (ahead 1), then from a second clone pushed one remote commit without fetching in the first worktree. Before `git fetch`, `claw status --output-format json` reported `workspace.branch_freshness:{"ahead":1,"behind":0,"fresh":true,"upstream":"origin/master"}` and `git_state:"clean"`. After a manual `git fetch`, the same command reported `ahead:1, behind:1, fresh:false`. This means the preflight freshness field can be stale-but-green whenever the local remote-tracking ref is old. **Required fix shape:** (a) either fetch (bounded/optional) before computing freshness, or expose `remote_ref_observed_at` / `fetch_age_seconds` and `freshness_source:"local_ref"`; (b) if no recent fetch occurred, mark freshness as `unknown` or `stale_reference` rather than `fresh:true`; (c) add a `--refresh`/`--no-refresh` policy if network access is intentionally avoided; (d) add fixture regression with a bare remote + second clone proving status does not report `fresh:true` from stale local refs. **Why this matters:** stale-branch confusion is a core clawability gap. Orchestrators gating launches/merges on `branch_freshness.fresh` will make the wrong decision if `status` presents old local refs as authoritative remote freshness. Source: gaebal-gajae dogfood response to Clawhip message `1506589831097221120` on 2026-05-20.
+
+490. **`status`/`doctor` still run boot-preflight git metadata probes with blocking `git` subprocesses and no deadline, so slow `rev-parse`/branch/root discovery can zero-byte hang local diagnostics before any JSON is emitted** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 10:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@edcf5bf` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated repo plus a fake `git` shim that sleeps 20s only for metadata probes (`rev-parse --is-inside-work-tree`, `rev-parse --git-dir`, `branch --show-current`, `rev-parse --show-toplevel`) and delegates all other git commands to `/usr/bin/git`. `claw status --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw doctor --output-format json` did the same. A control shim that delayed only `fetch`/`ls-remote` did not affect status/doctor, confirming the hang is local metadata probing, not network refresh. Code path: `build_boot_preflight_snapshot` calls `run_git_bool` and `run_git_capture_in` with `.output()` and no timeout; `parse_git_status_metadata_for` calls `resolve_git_branch_for` (`branch --show-current`, fallback `rev-parse --abbrev-ref HEAD`) and `find_git_root_in` (`rev-parse --show-toplevel`) similarly. **Required fix shape:** (a) route all local diagnostic git subprocesses through a shared `git_with_timeout(cwd,args,deadline)` helper; (b) use `--no-optional-locks` for read-only git probes; (c) on timeout, return bounded JSON with `git_probe_timeout`/`unknown` fields instead of aborting the whole status/doctor response; (d) add regressions with a fake `git` shim proving status/doctor still return within deadline and mark git metadata degraded. **Why this matters:** status and doctor are supposed to be the escape hatches when startup is broken. If local git metadata can hang them before emitting JSON, stale-branch and boot-preflight diagnostics fail exactly when a repo or filesystem is slow/locked. Source: gaebal-gajae dogfood response to Clawhip message `1506597387534209085` on 2026-05-20.
+
+491. **`status`, `doctor`, and direct `diff` all block on dirty-state/diff git probes with no timeout, so a slow `git status` or `git diff` makes every local diagnostic surface zero-byte hang** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 10:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d5aa815` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `git` shim that sleeps 20s only for dirty/diff probes (`status --short`, `diff --cached`, and `diff`) and delegates all other git commands to `/usr/bin/git`. Results: `claw status --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw doctor --output-format json` timed out with zero output; direct `claw diff --output-format json` also timed out with zero output. This is distinct from the metadata-probe hang in #490: even when branch/root metadata is fast, dirty-state and diff collection can deadlock the supposedly local escape-hatch commands before they emit any degraded JSON. **Required fix shape:** (a) route `read_git_status`, `read_git_diff`, and direct `diff` command helpers through shared timeout-aware git execution; (b) emit partial/degraded status JSON with `git_status_timeout:true` and omit/cap diff payload instead of blocking; (c) make direct `claw diff --output-format json` return `kind:"diff", result:"git_timeout"` with command/stage metadata; (d) add fake-git shim regressions for slow `git status`, `git diff --cached`, and `git diff` proving status/doctor/diff stay bounded. **Why this matters:** dirty-state and diff are central to stale-branch, cleanup, and prompt-context decisions. If they can hang the health commands, operators cannot tell whether the repo is dirty, the runtime is stuck, or git itself is wedged. Source: gaebal-gajae dogfood response to Clawhip message `1506604934555242546` on 2026-05-20.
+
+492. **`system-prompt --output-format json` and `status --output-format json` can both zero-byte hang on `GitContext`'s unbounded branch/log/staged-file probes, so prompt provenance and local diagnostics share the same startup deadlock surface** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 11:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@56555a3` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `git` shim that sleeps 20s only for `rev-parse --abbrev-ref HEAD`, `log --oneline`, and `diff --cached --name-only` (the `GitContext::detect` branch/recent-commits/staged-files probes) while delegating all other git commands to `/usr/bin/git`. Results: `claw system-prompt --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw status --output-format json` also timed out with zero output. Earlier controls showed shims delaying unrelated `fetch`/`ls-remote` do not affect these commands. **Required fix shape:** (a) route `GitContext::detect` probes through the same timeout-aware git helper as #490/#491; (b) make `system-prompt` emit a bounded degraded prompt-provenance envelope when git context times out (`git_context:{status:"timeout", branch:null, recent_commits_truncated:true}`) instead of hanging; (c) make `status` omit/degrade `GitContext` fields independently from boot-preflight metadata; (d) add fake-git shim regressions for each GitContext probe (`rev-parse --abbrev-ref`, `log --oneline`, `diff --cached --name-only`) across both `system-prompt` and `status`. **Why this matters:** `system-prompt` is the primary prompt-misdelivery/provenance debugger. If the prompt renderer itself blocks on git context before emitting JSON, users cannot inspect what prompt would have been delivered when startup is slow or git is locked. Source: gaebal-gajae dogfood response to Clawhip message `1506612484520542258` on 2026-05-20.
+
+493. **`status` and `doctor` block on `tmux --version` availability checks with no timeout, so a wedged tmux binary/socket makes the local health surfaces zero-byte hang even though prompt rendering still works** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 11:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@2bf6924` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `tmux` shim at the front of `PATH` that sleeps 20s for every invocation; `git` was normal. Results: `claw status --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw doctor --output-format json` also timed out with zero output; control `claw system-prompt --output-format json` exited 0 with prompt JSON under the same fake-tmux environment. Code path: `build_boot_preflight_snapshot` populates `required_binaries` via `command_available("tmux")`, which runs `tmux --version` through blocking `.output()` and no deadline. **Required fix shape:** (a) route `command_available` checks through a small timeout helper; (b) mark slow binary probes as `{name:"tmux", available:null, timeout:true}` instead of blocking the entire status/doctor response; (c) avoid invoking `tmux` at all when `$TMUX` is absent if the field is only informational, or cache the availability probe; (d) add fake-binary shim regressions for slow `tmux` and slow `git --version` proving status/doctor stay bounded. **Why this matters:** status/doctor are the recovery surfaces for tmux/session lifecycle breakage. If a broken `tmux` itself prevents the health command from returning JSON, orchestrators lose the exact diagnostic path needed to explain session/pane failures. Source: gaebal-gajae dogfood response to Clawhip message `1506620033886060665` on 2026-05-20.
+
+494. **`status`/`doctor` also block on `git --version` binary-availability checks with no timeout, while `diff` and `system-prompt` still work, so an unhealthy git binary can kill the health surfaces before they report degraded tool availability** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 12:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@4d185f0` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `git` shim at the front of `PATH` that sleeps 20s only for `git --version` and delegates all other git commands to `/usr/bin/git`. Results: `claw status --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw doctor --output-format json` timed out with zero output. Positive controls under the same environment: `claw diff --output-format json` returned `{"kind":"diff","result":"clean"...}` and `claw system-prompt --output-format json` returned prompt JSON, proving ordinary git operations were fine and the hang is specifically `command_available("git")` in boot preflight `required_binaries`. **Required fix shape:** (a) same timeout-aware binary probe as #493, but cover `git` and `claw` too; (b) represent slow probes as timeout/degraded availability instead of blocking status/doctor; (c) prefer `which`/PATH existence plus optional bounded version string over mandatory `--version`; (d) add fake-binary regressions for slow `git --version`, slow `tmux --version`, and slow current-exe/version probe. **Why this matters:** status/doctor should explain missing or broken binaries. If the binary availability probe itself hangs, health checks fail before they can say `git` is unhealthy, forcing operators back to shell debugging. Source: gaebal-gajae dogfood response to Clawhip message `1506627582756651018` on 2026-05-20.
+
+495. **The actual `PermissionEnforcer::check_bash` read-only heuristic still whitelists `tee`, so `tee file.txt` can write in read-only mode despite the richer `bash_validation` module correctly classifying `tee` as a write command** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 12:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@916bf5f`. Code inspection shows two divergent validators: `rust/crates/runtime/src/bash_validation.rs` defines `WRITE_COMMANDS` including `tee` and would block it in read-only mode, but `PermissionEnforcer::check_bash` does not call that pipeline. It calls local `is_read_only_command` in `permission_enforcer.rs`, whose allowlist explicitly includes `tee` and only rejects commands containing `" > "`, `" >> "`, `"-i "`, or `"--in-place"`. Plain `tee out.txt` writes to `out.txt` without any redirection token, so `is_read_only_command("tee out.txt")` returns true and `check_bash` allows it under `PermissionMode::ReadOnly`. **Required fix shape:** (a) remove `tee` from `is_read_only_command` or, better, replace that local heuristic with the canonical `bash_validation::validate_command`/`classify_command` pipeline; (b) add regression tests proving `tee out.txt`, `tee -a out.txt`, `printf hi | tee out.txt`, and `cat a | tee b` are denied in read-only mode; (c) add a consistency test that every `WRITE_COMMANDS` entry in `bash_validation` is denied by `PermissionEnforcer::check_bash` in read-only mode. **Why this matters:** permission-mode enforcement is only as strong as the runtime path actually used. Having a stricter validator module sitting unused while `check_bash` allows a common write tool creates a real read-only bypass and contradicts the documented command semantics. Source: gaebal-gajae dogfood response to Clawhip message `1506635133468282950` on 2026-05-20.
+
+496. **`PermissionEnforcer::check_bash` treats `python`/`python3`/`node`/`ruby` as read-only commands, so inline script execution (`python -c ...`, `node -e ...`, `ruby -e ...`) can write files in `read-only` mode** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 13:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8382e1e`. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_read_only_command` allowlists `python3`, `python`, `node`, and `ruby`, and only rejects commands containing `-i `, `--in-place`, ` > `, or ` >> `. It does not inspect interpreter flags or inline code. Therefore `python -c 'open("pwned.txt","w").write("x")'`, `node -e 'require("fs").writeFileSync("pwned.txt","x")'`, and `ruby -e 'File.write("pwned.txt","x")'` are classified as read-only and allowed by `check_bash` under `PermissionMode::ReadOnly`, despite arbitrary filesystem writes. The richer `bash_validation` module's semantic list does not include these interpreters, but the runtime enforcer uses this separate local heuristic. **Required fix shape:** (a) remove general-purpose interpreters (`python`, `python3`, `node`, `ruby`) from the read-only allowlist or require explicit safe subcommands only (`python --version`, maybe `python -m pytest` under test gating is not read-only); (b) if kept, detect `-c`, `-e`, here-doc, stdin script, and file path script execution as non-read-only; (c) replace the local heuristic with the canonical `bash_validation` pipeline to avoid future divergence; (d) add regressions proving inline interpreter writes are denied in read-only mode while harmless version/help invocations remain bounded. **Why this matters:** read-only mode is supposed to prevent writes. Any general interpreter with inline code is equivalent to arbitrary shell execution; allowing it because the first token is `python` or `node` is a direct permission bypass and contradicts the safety story for exploratory sessions. Source: gaebal-gajae dogfood response to Clawhip message `1506642678996013207` on 2026-05-20.
+
+497. **`PermissionEnforcer::check_bash` also allowlists `gh` as read-only, so GitHub-mutating commands (`gh pr merge`, `gh issue edit`, `gh repo delete`, etc.) can run in `read-only` mode** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 13:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@214176d`. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_read_only_command` includes `gh` in the read-only allowlist and only rejects `-i`, `--in-place`, ` > `, or ` >> `. There is no `gh` subcommand classifier analogous to `bash_validation.rs::validate_git_read_only` for `git`. Therefore `gh pr merge 123 --merge`, `gh issue edit 5 --add-label done`, `gh repo delete owner/repo --yes`, and `gh api repos/:owner/:repo/actions/runs/1/approve -X POST` are all classified as read-only by the runtime enforcer, even though they mutate remote GitHub state. **Required fix shape:** (a) remove `gh` from the blanket read-only allowlist or implement a conservative `gh` subcommand classifier; (b) allow only clearly read-only forms (`gh pr view/list`, `gh issue view/list`, `gh run view/list`, `gh api` without mutating method) and require workspace-write/danger or prompt for merge/edit/create/delete/api `-X POST|PATCH|PUT|DELETE`; (c) add regressions proving mutating `gh` commands are denied in `PermissionMode::ReadOnly` while view/list commands remain allowed if desired; (d) preferably replace the local heuristic with the canonical bash-validation pipeline so shell permission logic has one source of truth. **Why this matters:** read-only mode is not just filesystem safety; it should prevent external state mutation. A `gh pr merge` or `gh issue edit` from a supposedly read-only lane is a serious control-plane bypass and can alter public repo state without the permission escalation the mode implies. Source: gaebal-gajae dogfood response to Clawhip message `1506650232937513140` on 2026-05-20.
+
+498. **`PermissionEnforcer::check_bash` allowlists `cargo` and `rustc` as read-only, bypassing the canonical package/build-state classifier and allowing state-mutating Rust toolchain commands in `read-only` mode** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@5a4a8eb`. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_read_only_command` includes `cargo` and `rustc` in the read-only allowlist and only rejects `-i`, `--in-place`, ` > `, or ` >> `. The canonical `bash_validation.rs` pipeline treats `cargo` as package/build-state management (`STATE_MODIFYING_COMMANDS` / `PACKAGE_COMMANDS`) and has a regression that `npm install` is blocked in read-only mode, but the runtime enforcer does not call that pipeline. As a result, commands like `cargo install cargo-edit`, `cargo add anyhow`, `cargo generate-lockfile`, `cargo update`, `cargo fix --allow-dirty`, and `cargo build` (writes `target/`) are classified as read-only by the actual enforcer. `rustc -o out main.rs` likewise writes an output binary without shell redirection and is allowed. **Required fix shape:** (a) remove `cargo` and `rustc` from the blanket read-only allowlist; (b) optionally add a conservative subcommand classifier where only clearly non-mutating forms (`cargo --version`, `cargo metadata --no-deps` if proven side-effect-free, `rustc --version`) are read-only; (c) route `check_bash` through the canonical `bash_validation` pipeline; (d) add regressions for `cargo install`, `cargo add`, `cargo update`, `cargo build`, `cargo test`, and `rustc -o` under `PermissionMode::ReadOnly`. **Why this matters:** build/package commands routinely modify the workspace, global cargo home, lockfiles, and build artifacts. A read-only exploratory lane should not be able to install packages or rewrite lock/build outputs just because the first token is `cargo`. Source: gaebal-gajae dogfood response to Clawhip message `1506657779883184250` on 2026-05-20.
+
+499. **`TodoWrite` returns both `oldTodos` and `newTodos` in every tool result, so large task boards are echoed twice per update and repeatedly burn context even though the model only needs the delta/current list** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@3d2a047`. Code inspection: `run_todo_write` serializes `execute_todo_write(input)?` via `to_pretty_json`; `execute_todo_write` reads the persisted todo store into `old_todos`, writes the new/persisted list, then returns `TodoWriteOutput { old_todos, new_todos: input.todos, verification_nudge_needed }`. The JSON field names are `oldTodos` and `newTodos`, so every TodoWrite result contains the entire previous board plus the entire submitted board. For a 200-item task board, a one-item status change returns roughly 400 todo objects to the model; repeated status updates multiply the same backlog text across the context window. This is the same output-amplification class as NotebookEdit (#500), but on the core planning/task-control surface rather than notebooks. **Required fix shape:** (a) replace `oldTodos` with a compact diff (`changed:[{id/content,status_before,status_after}]`, `added`, `removed`, `unchanged_count`) or hide it behind a debug flag; (b) keep `newTodos` only if the current board is below a safe size, otherwise return `current_count`, `open_count`, `completed_count`, and a truncated active subset; (c) include `truncated:true`/`omitted_old_count` metadata for large boards; (d) add regressions proving single-item updates on large boards do not serialize the entire old board. **Why this matters:** TodoWrite is called frequently in multi-step sessions. Echoing full before/after state on every update creates context-window pressure, increases cost, and makes compaction summaries noisier without adding useful operator signal. Source: gaebal-gajae dogfood response to Clawhip message `1506665332478050344` on 2026-05-20.
+
+500. **`REPL` tool returns unbounded raw `stdout`/`stderr` strings, so a tiny inline snippet can inject megabytes of output into the model context just like the NotebookEdit/TodoWrite amplification class** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@25b8dbb`. Code inspection: `execute_repl` in `rust/crates/tools/src/lib.rs` runs python/node/shell snippets with piped stdout/stderr, then returns `ReplOutput { stdout: String::from_utf8_lossy(&output.stdout).into_owned(), stderr: String::from_utf8_lossy(&output.stderr).into_owned(), ... }` serialized via `to_pretty_json`. There is no byte cap, line cap, truncation marker, or structured artifact path. A user/model can run `REPL(language:"python", code:"print('x'*5_000_000)")` and the full 5MB output is returned to the model as a JSON string; stderr has the same issue. This is distinct from bash timeout/provenance handling because REPL is marketed as a structured execution helper, yet it has no output budget. **Required fix shape:** (a) cap `stdout` and `stderr` in `ReplOutput` (e.g. first/last 64KB) with `stdout_truncated`, `stderr_truncated`, `stdout_bytes`, `stderr_bytes`; (b) optionally spill full output to an artifact file and return `artifact_path` only when safe; (c) apply the same cap to PowerShell/bash if not already covered; (d) add regressions for large stdout/stderr from python/node/shell proving the serialized tool result stays bounded and includes truncation metadata. **Why this matters:** REPL is an easy path for accidental context-window blowups (`print(large_df)`, stack traces, generated JSON). Without output budgets, a single tool call can consume the context window, trigger compaction, or hide the useful signal behind megabytes of raw output. Source: gaebal-gajae dogfood response to Clawhip message `1506672878047989812` on 2026-05-20.
+
+501. **`PowerShell`/shared `execute_shell_command` returns unbounded raw stdout/stderr in every foreground path, so shell output can still flood the model context even after the REPL output-amplification finding** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7e73cdb`. Code inspection: `execute_powershell` delegates to `execute_shell_command`, whose timeout-success, timeout-kill, and no-timeout paths all construct `runtime::BashCommandOutput` with `stdout: String::from_utf8_lossy(&output.stdout).into_owned()` and `stderr: String::from_utf8_lossy(&output.stderr).into_owned()`. The background path discards output, but every foreground path serializes the full captured streams. A command like `PowerShell(command:"1..1000000 | % { 'x' }")` or any verbose test/log script can return megabytes of JSON-escaped output to the model. The timeout path is especially bad: it kills the process but still returns everything produced before the timeout plus the timeout footer, with no cap or `truncated` metadata. **Required fix shape:** (a) introduce a shared output-budget helper for all shell-like tools (`bash`, `PowerShell`, `REPL`) with byte caps, first/last slicing, and `stdout_truncated`/`stderr_truncated`/byte-count metadata; (b) preserve existing `raw_output_path`/`persisted_output_path` semantics by spilling full streams to artifact files when safe; (c) apply caps before JSON serialization in both success and timeout branches; (d) add regressions using stub `pwsh` and shell commands that emit >cap stdout/stderr and timeout-with-output. **Why this matters:** verbose tests and build tools are normal claw-code workflows. A single noisy PowerShell/shell command should not silently consume the context window or force compaction; the model needs a bounded summary plus a way to fetch artifacts if the full log is needed. Source: gaebal-gajae dogfood response to Clawhip message `1506680431620395091` on 2026-05-20.
+
+502. **HTTP request tool truncates response bodies with byte indexing (`&body[..8192]`), so any multibyte UTF-8 character crossing the 8192-byte boundary can panic instead of returning a bounded tool result** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51ea1aa`. Code inspection: the HTTP request handler reads `let body = response.text().unwrap_or_default();` then, when `body.len() > 8192`, builds `format!("{}\n\n[response truncated — {} bytes total]", &body[..8192], body.len())`. `String::len()` is bytes, and Rust string slicing requires a character boundary. A response like `"a".repeat(8191) + "é" + ...` has length >8192 but byte offset 8192 is in the middle of the two-byte `é`; `&body[..8192]` panics. Nearby `preview_text` correctly truncates by chars, so the safe helper already exists but is not used here. **Required fix shape:** (a) replace direct byte slicing with a UTF-8-safe truncation helper (`char_indices`/`floor_char_boundary` or reuse `preview_text` plus byte-count metadata); (b) report both original byte length and whether truncation occurred; (c) apply the same helper to all response/body truncation paths; (d) add a regression with a local HTTP response whose 8192nd byte is inside a multibyte character and assert the tool returns JSON with `truncated:true` instead of panicking. **Why this matters:** non-English pages, emoji-heavy logs, and binary-ish HTTP responses are common. A truncation path intended to protect the context window should never crash the tool runtime on valid UTF-8. Source: gaebal-gajae dogfood response to Clawhip message `1506687983397376103` on 2026-05-20.
+
+503. **`WebFetch`/`WebSearch` download full response bodies with `response.text()` before previewing, so large pages can allocate unbounded memory and stall the tool despite returning only a 900-char preview or eight search hits** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f751c98`. Code inspection: `execute_web_fetch` calls `response.text()` into `body`, records `bytes = body.len()`, normalizes the whole body (`html_to_text` for HTML), collapses all whitespace, and only then trims via `preview_text(..., 900)`. `execute_web_search` similarly calls `response.text()` on the search response, then parses links and truncates hits to 8. The HTTP client has a 20s timeout and redirect cap, but there is no `Content-Length` guard, streaming byte cap, decompressed-size cap, or early abort. A large/decompression-bomb HTML response can force multi-megabyte/GB allocation and full text normalization even though the returned result is tiny. **Required fix shape:** (a) add a max download/decompressed body size for web tools (configurable but safe default); (b) reject/abort early on `Content-Length` above cap and enforce a streaming cap while reading chunks; (c) record `body_truncated:true`, `bytes_read`, and `content_length` metadata in the tool output; (d) make `html_to_text`/search extraction operate on the capped buffer; (e) add local HTTP regressions for huge `Content-Length`, chunked oversized body, and compressed oversized body proving bounded memory/time. **Why this matters:** `WebFetch` is a common lightweight alternative to browser automation. Its output is intentionally small, but the hidden pre-output work is unbounded; a hostile or simply large page can make a dogfood session look hung or OOM the runtime before any useful event/log signal is emitted. Source: gaebal-gajae dogfood response to Clawhip message `1506695531307733082` on 2026-05-20.
+
+504. **`AgentOutput.laneEvents` produced by real agent runs violates the G004 conformance contract because production `LaneEvent::new` emits `metadata.seq=0` for every event while the validator requires strictly increasing sequence numbers** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8e8dea5`, building on Jobdori's live #505 `LaneEvent::new seq=0` report. Code inspection: `AgentOutput` manifests are initialized with `LaneEvent::started(...)` and terminal persistence appends `LaneEvent::blocked/failed/finished/commit_created(...)`; all of those convenience constructors route through `LaneEvent::new(... metadata: LaneEventMetadata::new(0, EventProvenance::LiveLane))`. `write_agent_manifest` only dedupes commit events and does not restamp sequences. Meanwhile `runtime/src/g004_conformance.rs::validate_lane_events` explicitly requires `/metadata/seq` to be present and strictly increasing (`if seq <= previous { "sequence must be strictly increasing" }`). Therefore any successful agent manifest with `lane.started` + `lane.finished` or failed manifest with `lane.started` + `lane.blocked` + `lane.failed` is invalid under the repo's own G004 contract, even before external consumers sort by seq. **Required fix shape:** (a) restamp lane event metadata seqs before manifest write (`for (idx,event) in lane_events.iter_mut().enumerate() { event.metadata.seq = idx as u64 + 1; }`) as an immediate containment, or better stamp from a per-session event counter at creation; (b) run `validate_g004_contract_bundle` (or an AgentOutput-specific wrapper) in tests against real initialized/success/failed manifests; (c) add a regression that `write_agent_manifest` never persists duplicate/non-increasing seqs after terminal append/dedupe; (d) keep `reconcile_terminal_events` sorting semantics meaningful by ensuring production seqs are nonzero and monotonic. **Why this matters:** this is event/log opacity in the literal contract layer: the product advertises machine-checkable event ordering, but real persisted manifests fail that checker. Downstream clawhips/watchers either cannot trust the conformance helper or must special-case production data. Source: gaebal-gajae dogfood response to Clawhip message `1506703078492082197` on 2026-05-20.
+
+505. **G004 report conformance expects `schemaVersion:"g004.report.v1"`, but the runtime's canonical report implementation emits `schemaVersion:"claw.report.v1"`, so first-party canonical reports cannot pass the repo's own G004 bundle validator without an undocumented schema rewrite** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@19a1918`. Code inspection: `runtime/src/g004_conformance.rs` hardcodes `const REPORT_SCHEMA_VERSION: &str = "g004.report.v1"` and `validate_reports` requires every `/reports/*/schemaVersion` to equal that value. Separately, the actual report schema module defines `pub const REPORT_SCHEMA_V1: &str = "claw.report.v1"`; `canonicalize_report` overwrites reports with `report.schema_version = REPORT_SCHEMA_V1.to_string()`, and the report registry/capability projection also key off `claw.report.v1`. Grep shows no adapter mapping `claw.report.v1` to `g004.report.v1`; the G004 fixture is hand-authored with `g004.report.v1`. Result: a real `CanonicalReportV1` produced by runtime and inserted into a G004 contract bundle is rejected by `validate_g004_contract_bundle` solely on schema-version mismatch. **Required fix shape:** (a) decide whether G004 should validate the first-party `claw.report.v1` schema directly or introduce an explicit projection adapter from `CanonicalReportV1` to `g004.report.v1`; (b) do not hardcode a competing report schema string in the conformance helper without a conversion path; (c) add a regression that builds a canonical report via `canonicalize_report`, wraps it in a G004 bundle with valid lane events, and verifies either acceptance or a typed `unsupported_schema_version` with documented adapter guidance; (d) update fixtures to use the same path real producers use. **Why this matters:** conformance tests should protect interoperability, not validate an artificial fixture dialect that production cannot emit. Otherwise downstream report consumers see event/log opacity: the report looks valid to the runtime registry and invalid to the G004 bundle validator. Source: gaebal-gajae dogfood response to Clawhip message `1506710631254986793` on 2026-05-20.
+
+506. **SSE stream parsers repeatedly rescan and drain from the front of a growing buffer, making large batched streams quadratic and adding avoidable latency to provider streaming/event handling** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ad8a0b3`, after Jobdori noted the unfiled parser shape. Code inspection: `api/src/sse.rs::SseParser::push` appends a chunk, then loops `while let Some(frame) = self.next_frame()`. `next_frame` searches `self.buffer.windows(2)` from byte 0 for `\n\n` (then separately scans `windows(4)` for `\r\n\r\n`), drains `..position+separator_len` into a new `Vec`, and converts to `String`. `api/src/providers/openai_compat.rs::next_sse_frame` duplicates the same algorithm. `runtime/src/sse.rs::IncrementalSseParser::push_chunk` does the string analogue with repeated `self.buffer.find('\n')` plus `drain(..=index)`. For a single network read or proxy flush containing thousands of small SSE frames/lines, each extracted frame/line rescans and moves the remaining buffer from the front; total work trends O(N²) in bytes/frames and allocates a fresh buffer per frame. **Required fix shape:** (a) replace front-drain parsing with an index/cursor-based parser (`scan_pos`, `consumed_until`) and compact the buffer only occasionally; (b) search for `\n\n`/`\r\n\r\n` from the previous scan position, not from 0 every loop; (c) share one bounded SSE framing helper between Anthropic and OpenAI-compatible providers; (d) add a micro-benchmark or regression that pushes one chunk containing 10k tiny frames and asserts linear-ish parse time/allocation behavior; (e) add a max pending-buffer size and emit a typed stream framing error when no separator arrives before the cap. **Why this matters:** streaming is the main event/log surface for prompt delivery, tool calls, and usage. A proxy, provider, or test harness that batches many small SSE frames into one chunk should not turn the parser into a CPU/allocation hotspot or make streaming look stalled before any model event is delivered. Source: gaebal-gajae dogfood response to Clawhip message `1506718176509952130` on 2026-05-20.
+
+507. **Anthropic requests can serialize/render the full message body three times before the real `/v1/messages` call: local preflight JSON byte estimate, remote `count_tokens` request body, then final message request body** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7bc373f`, after Jobdori filed the sibling OpenAI-compatible double-build issue (#508). Code inspection: `AnthropicProvider::message` and `stream_message` call `self.preflight_message_request(...)`; that first invokes `super::preflight_message_request(request)`, whose `estimate_serialized_tokens` serializes `messages`, `system`, `tools`, and `tool_choice` via `serde_json::to_vec`. If a model limit is known, `preflight_message_request` then calls `count_tokens`, which does `self.request_profile.render_json_body(request)?`, strips fields, and posts the full `/v1/messages/count_tokens` body. After preflight succeeds, `send_raw_request` renders the same full body again with `self.request_profile.render_json_body(request)?` and sends `/v1/messages`. So a large session pays at least one local serialization plus two full Anthropic-body renders; if `count_tokens` fails, the fallback still paid for rendering that body before the final render. **Required fix shape:** (a) memoize/render the Anthropic request body once per call and reuse it for count_tokens and final send where schemas are identical or share a base projection; (b) use a streaming/estimated byte counter for the local guard instead of serializing large subtrees into throwaway `Vec`s; (c) skip remote `count_tokens` when the local estimate is far below known limits unless strict mode requires it; (d) add an instrumentation test with a large message vector proving one-shot and streaming calls do not render/serialize the full request more than once per network target. **Why this matters:** long-context sessions are already context-window and latency sensitive. Doing multiple complete JSON render/serialization passes before every Anthropic call wastes CPU/memory and makes prompt delivery look slower or stalled under large histories, especially when paired with retries and stream startup. Source: gaebal-gajae dogfood response to Clawhip message `1506725730392870952` on 2026-05-20.
+
+508. **Config/plan-mode writes use direct `std::fs::write` with no atomic rename or advisory lock, so crashes and concurrent tool calls can corrupt `settings.json`, `settings.local.json`, or plan-mode state** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@e2b96ea`. Code inspection: `execute_config` reads a settings JSON object, mutates it, and persists through `write_json_object`; `EnterPlanMode`/`ExitPlanMode` also call `write_json_object` for `.claw/settings.local.json` and `write_plan_mode_state` for `.claw/tool-state/plan-mode.json`. Both `write_json_object` and `write_plan_mode_state` create parent directories and then call `std::fs::write(path, serde_json::to_string_pretty(...))` directly on the destination. There is no temp file + fsync + rename, and no lock around the read-modify-write window. By contrast, the OAuth credentials writer already uses a safer `.tmp` then rename pattern. Two concurrent `Config`/plan-mode calls can both read the same old document and last-writer-wins one update; a crash/interruption mid-write can leave a truncated JSON settings file that later startup/doctor must diagnose. **Required fix shape:** (a) introduce a shared `atomic_write_json(path, value)` helper using same-directory temp file, flush/fsync, and rename; (b) wrap read-modify-write config mutations in an advisory lock (`settings.json.lock` / `settings.local.json.lock`) or a compare-and-swap retry loop; (c) use the same helper for `write_plan_mode_state`, `write_agent_manifest` where appropriate, and any future JSON state files; (d) add stress/regression tests with two concurrent config writes and a simulated partial-write failure proving no malformed JSON and no lost sibling setting. **Why this matters:** config and plan-mode are control-plane state. A supposedly safe tool call should not be able to silently lose another setting or leave the workspace unable to load settings after an interrupted write; that turns a small config action into startup friction and stale permission-mode confusion. Source: gaebal-gajae dogfood response to Clawhip message `1506733276037910700` on 2026-05-20.
+
+509. **`write_file`/`edit_file` tool results echo full file contents (`content`, `original_file`) even though they also compute structured patches, so large file edits can double-return megabytes of text into the model context** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 19:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8c62fff`. Code inspection: `runtime/src/file_ops.rs::write_file` reads the prior file (if any), writes the new content, then returns `WriteFileOutput { content: content.to_owned(), original_file, structured_patch: make_patch(...) }`. `edit_file` similarly reads the full original file, computes `updated`, writes it, and returns `EditFileOutput { old_string, new_string, original_file: original_file.clone(), structured_patch: make_patch(&original_file, &updated), ... }`. The tool already has a structured patch/diff, but the serialized result still includes full pre/post content fields. Updating a 1MB file can return roughly 1MB `content` plus 1MB `original_file` plus patch metadata; a tiny `edit_file` change on a large file returns the entire original file even when a short diff would suffice. This is the file-edit sibling of the NotebookEdit/TodoWrite/REPL output-amplification cluster. **Required fix shape:** (a) make write/edit results patch-first and omit full `content`/`original_file` by default; (b) include bounded previews plus `original_bytes`, `new_bytes`, `content_truncated`, and `original_truncated` metadata when useful; (c) expose an explicit debug/full-output flag only for small files or trusted callers; (d) add regressions for editing/writing a large file proving serialized tool output remains bounded while the structured patch still identifies the change. **Why this matters:** file editing is the core coding surface. Returning full file bodies after every update wastes context, raises costs, and can force compaction precisely during code-review/debug loops where the model only needs a concise diff and path/byte metadata. Source: gaebal-gajae dogfood response to Clawhip message `1506740829912567824` on 2026-05-20.
+
+510. **`read_file` with no `limit` can return the entire 10MB text file into the model context, because the file-size guard is a disk-read cap, not an output budget** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@154e7ed`. Code inspection: `runtime/src/file_ops.rs::read_file` rejects files larger than `MAX_READ_SIZE` (10MB) and binary-looking files, then reads the entire file via `fs::read_to_string`, splits into `lines`, and when `limit` is absent sets `end_index = lines.len()`. The serialized `ReadFileOutput.file.content` is therefore the full selected content; for any text file at or below 10MB, the default read emits all of it. `limit` is line-based and optional, with no byte/token cap, no `content_truncated` metadata, and no default windowing. This is distinct from #509 write/edit amplification: a read-only exploratory call can still inject megabytes into context by omitting `limit`, even though most callers need the first window plus total metadata. **Required fix shape:** (a) add a default output byte/line cap for `read_file` (for example first 200 lines / 64KB) unless the caller explicitly requests a bounded range; (b) enforce a hard serialized-output byte cap even when `limit` is huge; (c) return `truncated`, `total_lines`, `selected_start_line`, `selected_end_line`, and `total_bytes` so callers can page intentionally; (d) add regressions for 1MB and 10MB text files proving default read output is bounded and explicit paging works without exceeding the cap. **Why this matters:** `read_file` is allowed in read-only mode and is the first tool a claw uses during debugging. A single accidental full-file read of a generated JSON/log/source bundle can consume the context window and force compaction before any useful analysis happens. Source: gaebal-gajae dogfood response to Clawhip message `1506755925225111724` on 2026-05-20.
+
+511. **`grep_search` collects every file and every matching content line before applying `head_limit`, so a small requested result can still scan/read/store an unbounded workspace worth of data** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@02f2887`. Code inspection: `grep_search_impl` first calls `collect_search_files(&base_path)?`, which walks the entire tree into a `Vec<PathBuf>` with no ignored-dir policy, file-count cap, or early stop. For every candidate it then does `fs::read_to_string(&file_path)` with no per-file size guard (unlike `read_file`'s 10MB max) and, for `output_mode == "content"`, pushes every matched/context line into `content_lines`. Only after the full traversal does it call `apply_limit(filenames, input.head_limit, input.offset)` and later `apply_limit(content_lines, head_limit, offset)`. The default limit is 250 output items, but it is not an execution budget: a repo with huge generated text files or thousands of matches still pays full read/regex/memory cost before returning 250 lines. **Required fix shape:** (a) stream search results and stop once `offset + head_limit` content lines/files have been collected, while continuing only if `count` mode explicitly needs totals; (b) add skip dirs/file-size guards shared with `glob_search` (`.git`, `node_modules`, `target`, etc.) and binary detection; (c) expose `truncated:true`, `files_scanned`, `files_skipped_size`, and `matches_seen` metadata; (d) add regression fixtures with a huge file and many matches proving `head_limit:1` does not read/accumulate the entire workspace. **Why this matters:** grep is a read-only diagnostic primitive. `head_limit` currently protects only the final JSON size, not runtime CPU/memory or accidental context blowups, so common searches in generated/vendor-heavy repos can look like tool hangs even when the caller asked for one line. Source: gaebal-gajae dogfood response to Clawhip message `1506763474955403414` on 2026-05-20.
+
+512. **`glob_search` traverses and stores all matches before truncating to 100, then sorts them by metadata, so `truncated:true` still means the full workspace scan already happened** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@0864f39`. Code inspection: `glob_search_impl` expands brace patterns, derives a walk root, then runs `WalkDir::new(&walk_root).filter_entry(...)` and pushes every matching file into `matches`. Only after all patterns and all entries are exhausted does it sort the entire `matches` vector by `fs::metadata(path).modified()` and compute `truncated = matches.len() > 100`, then `take(100)` for output. The ignored-dir list helps common vendor dirs, but there is no max traversal count, max match count, timeout/deadline, or early stop once enough results are known. A broad pattern like `**/*` in a generated workspace can collect/sort/stat tens or hundreds of thousands of paths just to return 100 names. **Required fix shape:** (a) add execution budgets for glob traversal (`max_entries_scanned`, `max_matches_collected`, optional deadline); (b) stream/top-k results instead of collecting every match before truncation, or make `sort_by_mtime` opt-in when exact newest-100 is required; (c) return `entries_scanned`, `matches_seen`, `truncated_reason`, and `ignored_dirs` metadata; (d) share traversal budget primitives with `grep_search` so read-only discovery tools fail/degrade consistently; (e) add a regression with >1000 generated files proving `glob_search` returns promptly without storing/sorting every match when capped. **Why this matters:** glob is a safe-looking read-only discovery tool, but broad globs in large repos are a common source of startup friction and apparent hangs. Output truncation alone is not enough; the work done before truncation must also be bounded and observable. Source: gaebal-gajae dogfood response to Clawhip message `1506771028834123986` on 2026-05-20.
+
+513. **`make_patch` is not a diff hunk generator; it emits the entire old file as removed lines plus the entire new file as added lines, so `structured_patch` itself can double the output size for every write/edit** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 22:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@b3b9eb2`. Code inspection: `runtime/src/file_ops.rs::make_patch(original, updated)` builds one `StructuredPatchHunk` by iterating `for line in original.lines() { lines.push(format!("-{line}")); }` and then `for line in updated.lines() { lines.push(format!("+{line}")); }`; `old_lines` and `new_lines` are the full line counts. This means the supposedly structured patch is a whole-file delete+whole-file add, not a minimal/contextual diff. Combined with #509's full `content`/`original_file` fields, editing a 1MB file can return: full original file, full new file, and a `structured_patch` containing full original+full new again. Even after removing raw content fields, `structured_patch` would still be an output-amplification bug unless it becomes a real bounded diff. **Required fix shape:** (a) replace `make_patch` with a real line diff/hunk algorithm that emits only changed hunks plus configurable context; (b) cap patch lines/bytes with `patch_truncated`, `omitted_hunks`, and full old/new byte counts; (c) for full-file rewrites, return a summary (`rewrite:true`, changed line counts, previews) rather than every line; (d) add regressions for one-line edit in a 10k-line file proving patch output is O(changed lines + context), not O(file size). **Why this matters:** structured patches are the right contract for coding tools, but a whole-file pseudo-patch creates the same context-window blowup as raw file echoes while looking machine-friendly. Review/debug loops need concise, truthful diffs. Source: gaebal-gajae dogfood response to Clawhip message `1506778574143754422` on 2026-05-20.
+
+514. **`write_file` silently overwrites existing binary/non-UTF8 files while reporting them as creates because the previous-file read uses `fs::read_to_string(...).ok()` and drops decode/errors** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 22:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ea29fbe`. Code inspection: `runtime/src/file_ops.rs::write_file` enforces only the new content byte length, resolves the destination, then does `let original_file = fs::read_to_string(&absolute_path).ok();`. Any read error — file missing, permission denied, directory race, or existing binary/non-UTF8 content — is collapsed to `None`. The function then creates parents and `fs::write(&absolute_path, content)?`. Because `kind` is computed as `if original_file.is_some() { "update" } else { "create" }`, overwriting a binary file is reported as a create with no warning and no binary guard. `read_file` has binary detection; `write_file` does not apply it before clobbering existing files. **Required fix shape:** (a) distinguish `NotFound` from other read/decode errors instead of using `.ok()`; (b) if the destination exists and is not valid UTF-8 or appears binary, require an explicit `overwrite_binary:true` / `force:true` flag or return `kind:"binary_overwrite_refused"`; (c) report `existed:true` based on metadata, not successful UTF-8 decoding; (d) preserve structured diff only for text files; (e) add regressions for overwriting a binary file and a permission-denied/unreadable file proving the tool does not silently treat them as creates. **Why this matters:** write tools are allowed to create/update source artifacts, but silently clobbering a binary asset or unreadable file is data-lossy and misleading. The model/operator needs to know whether a file existed and whether a safe text diff is possible before replacement. Source: gaebal-gajae dogfood response to Clawhip message `1506786124176293919` on 2026-05-20.
+
+515. **Config `model` values are parsed with `JsonValue::as_str` and otherwise silently ignored, so non-string model config falls back to defaults with no error or warning** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 23:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@07a12d4`, following Jobdori's config-loader diagnostic sweep. Code inspection: after validation/merge, `ConfigLoader::load` builds `RuntimeFeatureConfig { model: parse_optional_model(&merged_value), ... }`. `parse_optional_model` is `root.as_object().and_then(|object| object.get("model")).and_then(JsonValue::as_str).map(ToOwned::to_owned)`. If a config file contains `{"model": 123}`, `{"model": null}`, `{"model": ["opus"]}`, or an object, `parse_optional_model` returns `None` exactly as if no model key existed. Other config fields use typed helpers that return `ConfigError::Parse` on wrong types (`permissions.defaultMode`, plugin fields, hooks, etc.), so `model` is an outlier. **Required fix shape:** (a) replace `parse_optional_model -> Option<String>` with `Result<Option<String>, ConfigError>`; (b) when `model` exists but is not a non-empty string, emit `ConfigError::Parse` or a structured warning with path/key/type; (c) run the same model-syntax validator used for `--model` and env model values so config, env, and CLI flag agree; (d) expose `model_source`, `model_raw`, and validation diagnostics in status/config JSON; (e) add regressions for numeric/null/array/object model values proving they are not silently treated as missing. **Why this matters:** model selection is control-plane state. A typo like `model: ["opus"]` should not silently revert to the default model while status appears healthy; that creates prompt misdelivery and cost surprises that are hard to attribute back to config. Source: gaebal-gajae dogfood response to Clawhip message `1506793682257580144` on 2026-05-20.
+
+516. **`config env --output-format json` prints raw environment secret values from config files, including API-key-shaped entries, instead of redacting sensitive keys** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 23:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@fbd2f01` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Reproduction in a clean temp workspace: create `.claw.json` with `{"env":{"ANTHROPIC_API_KEY":"sk-SECRET-should-not-print","SAFE":"ok"}}`, then run `claw config env --output-format json`. The command exits 0 and emits `section_value` containing the raw `ANTHROPIC_API_KEY` value unchanged while also showing benign keys like `SAFE`. This makes the config-inspection surface unsafe to paste into issue reports, Discord, CI logs, or support threads. **Required fix shape:** (a) redact values for sensitive key patterns (`*_API_KEY`, `*_TOKEN`, `*_SECRET`, `PASSWORD`, `AUTH`, etc.) in both JSON and text config inspection output; (b) preserve enough metadata for debugging (`redacted:true`, maybe `value_present:true`, source file, key name) without exposing bytes; (c) keep non-sensitive env values visible; (d) add a `--show-secrets`/`--unsafe-show-secrets` escape hatch only if explicitly confirmed and never enabled by default; (e) add regression coverage for `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, token/password variants, and safe keys. **Why this matters:** `config env` is exactly the surface operators use when debugging configuration. If it dumps credentials by default, the first support/debug paste can leak provider keys. Automation-friendly JSON must be safer than prose, not a secret exfiltration footgun. Source: gaebal-gajae dogfood response to Clawhip message `1506801223465046210` on 2026-05-20.
+
+517. **`config env --output-format json` accepts and reports non-string env values instead of validating that environment variables are strings** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 00:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@0bbe19e` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Reproduction in a clean temp workspace: create `.claw.json` with `{"env":{"SAFE":123}}`, then run `claw config env --output-format json`. The command exits 0 and emits `section_value:{"SAFE":123}`. Runtime environment variables cannot be numeric/array/object/null values; any later application step must stringify, ignore, or fail elsewhere. This is inconsistent with the stricter config helpers for hooks/plugins/provider fallbacks and with the model-type gap recorded in #515. **Required fix shape:** (a) validate `env` config as a string map during config load/section inspection; (b) for non-string values, return a typed parse/config diagnostic naming the file/key/type, or preserve partial success with `invalid_env:[{key, reason}]` while excluding the bad key from resolved env; (c) keep `config env` JSON machine-usable by returning only string values plus explicit invalid-entry metadata; (d) combine with #516 secret redaction so valid secret strings are redacted and invalid secret-shaped keys do not leak values; (e) add regressions for numeric, boolean, null, array, and object env values. **Why this matters:** config inspection should reflect values that can actually be exported. Accepting JSON-native non-string values as env config makes status/config look healthy while leaving the eventual runtime behavior ambiguous and brittle for automation. Source: gaebal-gajae dogfood response to Clawhip message `1506808785480454194` on 2026-05-21.
+
+518. **Unsupported `--output-format` values (`xml`, `yaml`, etc.) enter silent runtime/config paths and time out with zero output instead of failing at CLI parse time** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 00:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@918f8f7` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes with an otherwise valid `.claw.json`: `claw config env --output-format xml`, `claw config env --output-format yaml`, and `claw status --output-format xml` each timed out after 6s with `stdout=0` and `stderr=0`. The documented output formats are `text|json`; unsupported values should be rejected before any runtime/config work begins. **Required fix shape:** (a) validate `--output-format` centrally during argument parsing; (b) accept only documented enum values (`text`, `json` unless more are implemented); (c) return bounded `kind:"cli_parse"` / `kind:"invalid_output_format"` diagnostics naming the unsupported value and supported values; (d) ensure the error obeys the JSON-output contract if a valid JSON mode was selected before the invalid value position, otherwise text stderr is fine; (e) add clean-home timeout-guarded regressions for `status --output-format xml`, `config env --output-format yaml`, `--output-format xml version`, and duplicate/late output-format placements. **Why this matters:** output format is a machine-contract selector. If a typo in that selector turns into a zero-byte hang, wrappers cannot distinguish parse failure from runtime deadlock and will often retry or kill healthy automation. Source: gaebal-gajae dogfood response to Clawhip message `1506816322795732993` on 2026-05-21.
+
+519. **`--output-format` without a value hangs silently on real subcommands instead of returning a missing-value CLI parse error** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 01:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@62f5ab5` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw status --output-format`, `claw config env --output-format`, and `claw version --output-format` each timed out after 6s with `stdout=0` and `stderr=0`. A control probe `claw --output-format status` returned a bounded `cli_parse` unknown-option error, proving the parser can fail fast in some malformed placements but not when the flag is trailing/missing an argument after a recognized command. **Required fix shape:** (a) parse `--output-format` as an option that requires a value in all command positions; (b) if the next token is absent, return bounded `kind:"cli_parse"` or `kind:"missing_option_value"` naming `--output-format` and supported values; (c) if the next token is present but unsupported, combine with #518's enum validation; (d) add clean-home regressions for `status --output-format`, `config env --output-format`, `version --output-format`, and prefix form `--output-format` alone, all with elapsed-time guards; (e) ensure JSON error routing remains deterministic when a valid JSON mode was not selected. **Why this matters:** a missing value after a machine-output selector is a simple syntax error. Turning it into a zero-byte timeout makes wrappers classify a typo as a dead runtime and causes pointless retries/kills. Source: gaebal-gajae dogfood response to Clawhip message `1506823873176535133` on 2026-05-21.
+
+520. **`--model` without a value hangs silently after recognized subcommands instead of returning a missing-value CLI parse error** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 01:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ce9ae27` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw status --model`, `claw version --model`, and `claw config env --model` each timed out after 6s with `stdout=0` and `stderr=0`. A control probe `claw --model status` returned a bounded `cli_parse` unknown-option error, so the parser/runtime only fails fast for some malformed placements while trailing model-option arity is swallowed into the command path. This is the model-selector sibling of #519's missing `--output-format` value hang. **Required fix shape:** (a) parse `--model` as an option requiring a following value wherever global flags are accepted; (b) if absent, return bounded `kind:"cli_parse"` or `kind:"missing_option_value"` naming `--model` and accepted forms/aliases; (c) if present, continue to apply existing syntax validation (#128/#424/#426 family); (d) add clean-home elapsed-time regressions for `status --model`, `version --model`, `config env --model`, bare `--model`, and valid `--model opus status`; (e) ensure JSON/text error routing remains deterministic. **Why this matters:** model selection controls cost/provider/routing. A missing model argument is a simple syntax error; turning it into a zero-byte timeout makes wrappers and users diagnose a dead runtime instead of a bad command line. Source: gaebal-gajae dogfood response to Clawhip message `1506831427013181570` on 2026-05-21.
+
+521. **`--permission-mode` without a value hangs silently after recognized subcommands instead of returning a missing-value CLI parse error** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 02:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@41a17e2` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw status --permission-mode`, `claw version --permission-mode`, and `claw config env --permission-mode` each timed out after 6s with `stdout=0` and `stderr=0`. A control probe `claw --permission-mode status` returned a bounded `cli_parse` unknown-option error. This is the permission-selector sibling of #519 (`--output-format`) and #520 (`--model`): recognized-command trailing global options are not enforcing required argument arity. **Required fix shape:** (a) parse `--permission-mode` as an option requiring a following value wherever global flags are accepted; (b) if absent, return bounded `kind:"cli_parse"` or `kind:"missing_option_value"` naming `--permission-mode` and supported modes; (c) if present but unsupported, return the existing unsupported-mode diagnostic before runtime startup; (d) add clean-home elapsed-time regressions for `status --permission-mode`, `version --permission-mode`, `config env --permission-mode`, bare `--permission-mode`, and valid `--permission-mode read-only status`; (e) consider a centralized required-value table for all global flags to close this family at once. **Why this matters:** permission mode controls write/delete authority. A missing permission argument should be a deterministic parse error, not a zero-byte hang that makes operators think the CLI or runtime is wedged. Source: gaebal-gajae dogfood response to Clawhip message `1506838972062761041` on 2026-05-21.
+
+522. **`--allowedTools` without a value hangs silently after recognized subcommands instead of returning a missing-value CLI parse error** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 02:30/03:00 UTC nudges on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@aacabdf` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw status --allowedTools`, `claw version --allowedTools`, and `claw config env --allowedTools` each timed out after 6s with `stdout=0` and `stderr=0`. A control probe `claw --allowedTools status` returned a bounded `cli_parse` unknown-option error. This extends the trailing required-value family already captured for `--output-format` (#519), `--model` (#520), and `--permission-mode` (#521). **Required fix shape:** (a) parse `--allowedTools` as an option requiring a following value wherever it is accepted; (b) if absent, return bounded `kind:"cli_parse"` or `kind:"missing_option_value"` naming `--allowedTools` and the expected comma/list syntax; (c) if present, validate/normalize the tool list before runtime startup; (d) add clean-home elapsed-time regressions for `status --allowedTools`, `version --allowedTools`, `config env --allowedTools`, bare `--allowedTools`, and valid list forms; (e) preferably centralize required-argument metadata for every global option so this class closes once instead of flag-by-flag. **Why this matters:** allowed-tools constrains tool authority. A missing value should never look like a runtime deadlock; wrappers need deterministic parse errors before starting model/session machinery. Source: gaebal-gajae dogfood response to Clawhip messages `1506846526276763658` and `1506854073998118922` on 2026-05-21.
+
+523. **`--compact` after recognized non-prompt subcommands hangs silently instead of being rejected as unsupported flag placement/scope** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 03:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@5665ca1` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes: `claw status --compact`, `claw version --compact`, and `claw config env --compact` each timed out after 6s with `stdout=0` and `stderr=0`. A control probe `claw --compact status` returned a bounded `cli_parse` unknown-option error. Unlike #519-#522, `--compact` is a boolean flag, not a missing-value case; the gap is that trailing global/prompt-only flags after recognized subcommands are swallowed into a path that waits silently instead of either applying or rejecting the flag. Help documents `--compact` as "text mode only; useful for piping" for one-shot prompt output, not as a status/config/version modifier. **Required fix shape:** (a) define per-command accepted global/late flags and reject unsupported trailing flags before runtime startup; (b) for `status`, `version`, and `config`, return bounded `kind:"cli_parse"` / `kind:"unsupported_flag_for_command"` with the offending flag and supported alternatives; (c) if `--compact` is intended to be global, make it a no-op or documented mode for these commands, but never hang; (d) add clean-home elapsed-time regressions for `status --compact`, `version --compact`, `config env --compact`, valid prompt compact forms, and prefix/late placements; (e) close this alongside the option-arity family by centralizing command-specific flag metadata. **Why this matters:** `--compact` is a piping/automation affordance. If users add it to diagnostic commands and get a zero-byte timeout, compact output becomes a source of apparent runtime deadlocks rather than lower-noise automation. Source: gaebal-gajae dogfood response to Clawhip message `1506861625829752852` on 2026-05-21.
+
+524. **`--dangerously-skip-permissions` is accepted after non-mutating diagnostic subcommands and changes their reported permission mode, so a capability-escalation flag can be silently treated as relevant control-plane state for `status`/`version`/`config`/`doctor`/`sandbox` instead of being scoped to prompt/runtime execution** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 04:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@88c4412` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes with a minimal `.claw.json`: `claw status --dangerously-skip-permissions` exits 0 and reports `Permission mode danger-full-access`; `claw version --dangerously-skip-permissions`, `claw config env --dangerously-skip-permissions`, `claw doctor --dangerously-skip-permissions`, and `claw sandbox --dangerously-skip-permissions` also exit 0. Prefix form `claw --dangerously-skip-permissions status` behaves the same. This flag is documented as “Skip all permission checks” and is meaningful for model/tool execution, not read-only diagnostics like version/config/status. Accepting it everywhere makes diagnostic output look like an authority escalation happened and gives wrappers no way to detect accidental dangerous flag bleed-through from prompt invocations into health checks. **Required fix shape:** (a) define command-specific acceptance for capability-changing flags; (b) reject `--dangerously-skip-permissions` on non-executing diagnostics (`version`, `status`, `config`, `doctor`, `sandbox`, maybe `system-prompt`) with bounded `kind:"unsupported_flag_for_command"`, or explicitly mark it ignored with `ignored_flags` metadata and never report `danger-full-access` for non-execution commands; (c) keep the flag valid only for prompt/REPL/runtime paths where permission checks actually apply; (d) add clean-home regressions for both trailing and prefix placement across diagnostics and valid prompt usage; (e) ensure status distinguishes configured/default permission mode from an execution override. **Why this matters:** permission-mode reporting is a control-plane trust signal. If a dangerous runtime escape hatch is silently accepted by local diagnostics, users and orchestrators can misread a harmless status probe as running under danger-full-access, or fail to catch dangerous flag leakage before executing real tool work. Source: gaebal-gajae dogfood response to Clawhip message `1506869175522693160` on 2026-05-21.
+
+525. **`--allow-broad-cwd` is accepted after normal diagnostic subcommands even when the current directory is not broad, and `status` reports `danger-full-access`, so a broad-directory bypass flag silently bleeds into unrelated health/config surfaces instead of being scoped to the broad-cwd guard** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 04:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@0651749` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a narrow temp workspace with a minimal `.claw.json`: `claw status --allow-broad-cwd`, `claw version --allow-broad-cwd`, `claw doctor --allow-broad-cwd`, `claw sandbox --allow-broad-cwd`, and prefix `claw --allow-broad-cwd status` all exit 0; `status` reports `Permission mode danger-full-access`. `claw config env --allow-broad-cwd` is worse: it timed out after 6s with `stdout=0`/`stderr=0`, showing the same trailing-flag swallowing path can still hit a silent hang on one config subcommand. `--allow-broad-cwd` exists to explicitly bypass the broad-cwd safety guard when running from `/`, `$HOME`, `/tmp`, etc.; it should not be a no-op/permission-shaped accepted flag on `version`, `doctor`, `sandbox`, or config inspection from an ordinary project directory. **Required fix shape:** (a) scope `--allow-broad-cwd` to the broad-cwd preflight only and expose it in diagnostics as `broad_cwd_override:true` only when the cwd is actually broad; (b) reject it on non-broad cwd or non-workspace-executing diagnostics with bounded `kind:"unsupported_flag_for_command"` / `kind:"unnecessary_broad_cwd_override"`; (c) never let this flag affect or appear as `permission_mode`; (d) fix the `config env --allow-broad-cwd` trailing-flag hang with the same command-specific flag metadata used for #523/#524; (e) add clean-home regressions for narrow cwd, `/`, `$HOME`, and `/tmp` across `status`, `doctor`, `config env`, and valid prompt/resume paths. **Why this matters:** broad-cwd override is a blast-radius escape hatch. If automation accidentally carries it into every diagnostic call, the CLI should either ignore it with explicit metadata or reject it, not make status look like a danger-full-access runtime nor hang a config probe. Source: gaebal-gajae dogfood response to Clawhip message `1506876721666719805` on 2026-05-21.
+
+526. **`config env` swallows unsupported trailing runtime/session flags into a zero-byte hang while sibling diagnostics reject the same flags with `cli_parse`, so config inspection has a unique prompt-fallback/argument-drain bug** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 05:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@24ccc6e` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace compared the same unsupported trailing flags across `status`, `version`, `config env`, `doctor`, and `sandbox`. For `--resume`, `--max-turns`, `--continue`, and `--verbose`, the non-config diagnostics all returned bounded `cli_parse` errors like `unrecognized argument '--resume' for subcommand 'status'`. But `claw config env --resume`, `claw config env --max-turns`, `claw config env --continue`, and `claw config env --verbose` each timed out after 6s with `stdout=0`/`stderr=0`. This is the same family as #525's `config env --allow-broad-cwd` hang, but the broader sweep shows `config env` is the outlier parser path for multiple unsupported runtime/session flags, not just one safety override. **Required fix shape:** (a) give `config env` and all `config <section>` forms the same strict trailing-argument validation as `status`/`doctor`/`sandbox`; (b) reject unsupported flags before any config/runtime fallback with `kind:"cli_parse"` / `kind:"unsupported_flag_for_command"`, echoing the offending flag and valid config options; (c) centralize config-subcommand argument parsing so `config env`, `config model`, `config hooks`, and `config plugins` cannot drift; (d) add clean-home elapsed-time regressions for `config env --resume`, `--max-turns`, `--continue`, `--verbose`, `--allow-broad-cwd`, and a valid `config env --output-format json`; (e) audit whether the same hang exists for `config model/hooks/plugins` and close it in the same parser fix. **Why this matters:** `config env` is a support/debug surface users paste into issue reports. Unsupported flags should fail fast and visibly; a zero-byte hang makes config troubleshooting look like startup deadlock and hides the actual CLI typo. Source: gaebal-gajae dogfood response to Clawhip message `1506884276522848317` on 2026-05-21.
+
+527. **`config model/hooks/plugins` reject unsupported trailing flags with `kind:"unknown"` while sibling commands use `kind:"cli_parse"`, and `config env` hangs on the same flags, so config-section argument errors have three different contracts** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 05:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@4639a58` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes with `.claw.json` containing `model`, `env`, `hooks`, and `plugins`: `claw config model --resume`, `config model --verbose`, `config model --continue`, and the same flags for `config hooks`/`config plugins` all return nonzero bounded errors, but the discriminator is `[error-kind: unknown] error: unexpected extra arguments after \`claw config <section>\``. In the same parser family, `status`/`version`/`doctor`/`sandbox` report `[error-kind: cli_parse] unrecognized argument ... for subcommand ...`, while `config env` still zero-byte hangs on the same flags (#526). This means automation cannot switch on one stable error kind for a simple bad argument: it sees `cli_parse`, `unknown`, or timeout depending only on which config section was requested. **Required fix shape:** (a) route every `config <section>` extra-argument failure through the same CLI-parse error constructor as other subcommands; (b) reserve `kind:"unknown"` for truly unclassified internal failures, never deterministic parser errors; (c) give `config env` the same bounded path instead of the hang in #526; (d) include `command:"config"`, `section`, `unexpected_args`, and `supported_flags` in JSON/text diagnostics; (e) add regression coverage for `config env/model/hooks/plugins --resume|--verbose|--continue` verifying exit nonzero, bounded output, and `kind:"cli_parse"` or `kind:"unsupported_flag_for_command"` consistently. **Why this matters:** config inspection is a core support surface. If identical typo/flag-placement errors produce three contracts, wrappers have to special-case config sections and users see random-looking behavior instead of a clear command-line mistake. Source: gaebal-gajae dogfood response to Clawhip message `1506891824852242513` on 2026-05-21.
+
+528. **`config` unknown sections and extra arguments fall into zero-byte hangs, including JSON-mode invocations, instead of returning bounded config-section parse errors** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 06:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@63f4865` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw config bogus`, `claw config bogus --output-format json`, `claw config --output-format json bogus`, `claw config env extra`, `claw config env --definitely-unknown`, `claw config env --output-format json --definitely-unknown`, and `claw config env --output-format json extra` all timed out after 6s with `stdout=0`/`stderr=0`. This extends #526/#527: unsupported runtime flags on `config env` hang, known sections `model/hooks/plugins` produce bounded-but-wrong `kind:"unknown"`, and now arbitrary unknown config sections or plain extra args also hang. The config parser is not enforcing the advertised section grammar (`env|hooks|model|plugins`) before prompt/runtime fallback. **Required fix shape:** (a) parse `config` with an explicit section enum and reject unknown sections (`bogus`) before runtime startup; (b) reject any extra positional/flag arguments after a valid section unless explicitly supported; (c) in JSON mode, return a typed envelope such as `kind:"unknown_config_section"` or `kind:"unsupported_config_argument"` with `section`, `unexpected_args`, and `supported_sections`; (d) keep text mode bounded with the same information and exit nonzero; (e) add elapsed-time regressions for unknown section before/after `--output-format json`, valid section plus extra positional, valid section plus unknown flag, and bare `config` success. **Why this matters:** config inspection is a primary startup/debug surface. A typo like `config enb` or an extra copied flag should produce an immediate parse diagnostic, not an opaque no-output timeout that makes users think config loading or provider startup is dead. Source: gaebal-gajae dogfood response to Clawhip message `1506899371755569274` on 2026-05-21.
+
+529. **Unknown `agents`/`mcp` subcommands and malformed JSON placements zero-byte hang instead of returning bounded unsupported-action errors, so lifecycle inventory surfaces share the config parser fallthrough class** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 06:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@66e3c8c` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw agents bogus`, `claw agents bogus --output-format json`, `claw agents --output-format json bogus`, `claw mcp bogus`, and `claw mcp bogus --output-format json` each timed out after 6s with `stdout=0`/`stderr=0`; the broader sweep was killed after `skills bogus` also entered the same no-output wait. These commands are advertised lifecycle/introspection surfaces (`agents`, `mcp`, `skills`) and should have small, closed subcommand grammars (`list|show|help`, etc.). Instead, unknown subcommands and misplaced JSON selectors fall through into the same silent runtime path recently found for `config` (#526-#528). **Required fix shape:** (a) define explicit subcommand enums for `agents`, `mcp`, and `skills` before prompt/runtime fallback; (b) reject unknown actions with typed bounded errors (`kind:"unsupported_action"` / `kind:"unknown_subcommand"`) including `command`, `action`, and supported actions; (c) support or deterministically reject `--output-format json` in both prefix and suffix positions without hanging; (d) add elapsed-time regressions for `agents bogus`, `agents bogus --output-format json`, `agents --output-format json bogus`, `mcp bogus`, `mcp bogus --output-format json`, and equivalent `skills` forms; (e) audit every top-level introspection command for the same fallthrough. **Why this matters:** MCP/plugin/agent lifecycle debugging depends on these inventory commands being safe escape hatches. If a typo in an introspection command looks like a dead runtime, operators cannot tell whether the lifecycle subsystem is broken or the CLI parser silently routed to prompt startup. Source: gaebal-gajae dogfood response to Clawhip message `1506906920089550859` on 2026-05-21.
+
+530. **`skills` and `dump-manifests` unknown actions/extra args zero-byte hang, so the skill/manifest discovery surfaces are not safe typed inventories when users typo a subcommand** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 07:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@4fc57a3` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw skills bogus`, `claw skills bogus --output-format json`, `claw skills --output-format json bogus`, `claw dump-manifests bogus`, and `claw dump-manifests bogus --output-format json` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep was stopped before probing later diagnostics. #529 captured `agents`/`mcp` and noticed `skills` starting to hang; this entry isolates the skill/manifest inventory surfaces. `skills` should have a closed grammar (`list|install|help|<skill> [args]`) and `dump-manifests` should either accept only documented `--manifests-dir`/format flags or return a typed unsupported-arg error. Instead, malformed inventory calls fall through into silent runtime/prompt wait. **Required fix shape:** (a) implement explicit parser branches for `skills` and `dump-manifests` before any prompt fallback; (b) for `skills <unknown>`, decide whether unknown means “try invoke skill” or “unknown skill”, but return a bounded `kind:"unknown_skill"` / `kind:"unsupported_action"` when no such skill exists; (c) reject extra positionals for `dump-manifests` with `kind:"unsupported_argument"` and list supported flags; (d) honor JSON mode with machine-readable errors and elapsed-time guards; (e) add regressions for the five probes above plus valid `skills list`/`dump-manifests --manifests-dir <missing>` controls. **Why this matters:** skills and manifests are the self-description layer for downstream claws. A typo while inspecting capabilities should never look like MCP/plugin lifecycle deadlock or prompt startup; it should be a small typed inventory error. Source: gaebal-gajae dogfood response to Clawhip message `1506914474010087594` on 2026-05-21.
+
+531. **`sandbox` and `doctor` unknown positional arguments zero-byte hang, so health/safety diagnostics lose their escape-hatch value when a user adds a typo or copied JSON placement** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 07:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@297a9c4` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw sandbox bogus`, `claw sandbox bogus --output-format json`, `claw sandbox --output-format json bogus`, `claw doctor bogus`, and `claw doctor bogus --output-format json` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep was killed before reaching `status/version` controls. These are supposed to be bounded local health/safety commands. A stray positional arg should be a deterministic `cli_parse` / `unexpected_argument` error, not a silent wait that is indistinguishable from the diagnostic itself hanging. **Required fix shape:** (a) define strict no-extra-positional grammars for `sandbox` and `doctor` before any prompt/runtime fallback; (b) reject unknown positionals/flags in text and JSON modes with bounded errors carrying `command`, `unexpected_args`, and supported flags; (c) support `--help`/`--output-format json` placements consistently, or reject unsupported placement explicitly; (d) add elapsed-time regressions for `sandbox bogus`, `sandbox bogus --output-format json`, `sandbox --output-format json bogus`, `doctor bogus`, and `doctor bogus --output-format json`; (e) audit `status`/`version` for the same positional fallthrough after this fix. **Why this matters:** `doctor` and `sandbox` are the commands operators run when startup, permissions, or isolation look broken. If a typo makes the diagnostic command itself hang with zero output, the user loses the very surface meant to explain the system state. Source: gaebal-gajae dogfood response to Clawhip message `1506922019906916403` on 2026-05-21.
+
+532. **`status` and `version` unknown positional arguments zero-byte hang, so the two most basic local introspection commands are not parse-safe when a stray token is present** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 08:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@fc591b6` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw status bogus`, `claw status bogus --output-format json`, and `claw status --output-format json bogus` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep then reached `claw version bogus`, which also entered the same zero-output wait before the run was killed. This extends #531's `doctor`/`sandbox` positional hang to the most commonly scripted diagnostics. `status` and `version` should reject unexpected positionals before any runtime/prompt fallback and should never need model/config startup to explain a syntax typo. **Required fix shape:** (a) make `status` and `version` strict no-extra-positional subcommands; (b) reject malformed text and JSON placements with bounded `kind:"cli_parse"` / `kind:"unexpected_argument"`, echoing `unexpected_args` and supported flags; (c) keep valid `status --output-format json` and `version --output-format json` behavior unchanged; (d) add elapsed-time regressions for the four probes above plus `version bogus --output-format json` and `version --output-format json bogus`; (e) consolidate this with #531 into a shared parser rule for all no-positional diagnostic commands. **Why this matters:** `status` and `version` are the first commands wrappers call to decide whether claw is alive and which binary is running. A stray token from command composition should produce a tiny parse error, not a silent timeout that marks the binary as dead. Source: gaebal-gajae dogfood response to Clawhip message `1506929574771163176` on 2026-05-21.
+
+533. **`export` and `init` unknown/extra arguments zero-byte hang, so artifact/setup commands are not parse-safe and can masquerade as session-store or project-initialization deadlocks** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 08:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7e35ed0` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw export bogus`, `claw export bogus --output-format json`, and `claw export --output-format json bogus` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep then reached `claw init bogus`, which also entered the same zero-output wait before the run was killed. `export` is documented as `claw export [PATH] [--session SESSION] [--output PATH]`, and `init` should have either no positionals or a documented target path. Unknown/ambiguous argument forms must be validated before the command can fall into prompt/runtime/session paths. **Required fix shape:** (a) give `export` a strict positional contract: a single optional output path, with extra positionals rejected and JSON/output flags parsed deterministically; (b) make `init` reject unexpected positionals/flags with bounded `kind:"unexpected_argument"` unless a target path is intentionally supported; (c) when `export <PATH>` is valid but no sessions exist, return the existing `no_managed_sessions` error only after path validation, not a hang; (d) add elapsed-time regressions for `export bogus`, `export bogus --output-format json`, `export --output-format json bogus`, `init bogus`, and valid `init`/`export --session missing` controls; (e) fold these artifact/setup commands into the shared no-fallthrough parser rule from #531/#532. **Why this matters:** `export` and `init` are file/artifact setup surfaces. If a typo or copied flag yields a silent timeout, users cannot tell whether session discovery, markdown export, or project initialization is stuck; the CLI should fail before touching runtime state. Source: gaebal-gajae dogfood response to Clawhip message `1506937123373187183` on 2026-05-21.
+
+534. **`bootstrap-plan` and `acp` unknown/extra arguments zero-byte hang, so setup/editor-integration diagnostics still fall through to runtime instead of enforcing their tiny command grammars** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 09:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d2a4b7b` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw bootstrap-plan bogus`, `claw bootstrap-plan bogus --output-format json`, and `claw bootstrap-plan --output-format json bogus` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep then reached `claw acp bogus`, which also entered the same zero-output wait before the run was killed. `bootstrap-plan` should be a bounded setup/planning helper, and `acp [serve]` is documented as an editor-integration status/server surface with a closed action set. Unknown positionals or misplaced JSON selectors should never start prompt/runtime machinery. **Required fix shape:** (a) make `bootstrap-plan` strict about accepted args/flags and reject extras with `kind:"unexpected_argument"`; (b) make `acp` accept only the documented empty/status form and `serve`, with typed `kind:"unsupported_action"` for anything else; (c) support or explicitly reject `--output-format json` placements without hanging; (d) add elapsed-time regressions for the four probes above plus `acp serve`/bare `bootstrap-plan` controls; (e) include these in the shared no-fallthrough parser audit covering #529-#533. **Why this matters:** setup and editor integration commands are often called by installers, editors, and wrappers. A typo in those integration paths should return a clear command error, not look like editor server startup, prompt delivery, or bootstrap planning has deadlocked. Source: gaebal-gajae dogfood response to Clawhip message `1506944672889831516` on 2026-05-21.
+
+535. **`help` unknown/extra arguments zero-byte hang, so the command meant to explain CLI usage is itself not parse-safe when users ask for a bad topic or misplace JSON flags** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 09:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@c8e2ae4` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw help bogus`, `claw help bogus --output-format json`, and `claw help --output-format json bogus` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep then reached `claw acp bogus --output-format json`, which also entered the same wait before the run was killed. #534 already captured bare `acp bogus`; this entry isolates the help surface. `help` should either print top-level usage, show a known topic, or return a bounded `unknown_help_topic` / `unexpected_argument` diagnostic. It must never fall through to prompt/runtime startup because a user asked for an invalid help topic. **Required fix shape:** (a) give `help` a strict topic parser with known topics/aliases and a bounded unknown-topic error; (b) if JSON help output is unsupported, reject `--output-format json` with `kind:"unsupported_flag_for_command"` rather than hanging; (c) include `topic`, `supported_topics`, and examples in the diagnostic; (d) add elapsed-time regressions for `help bogus`, `help bogus --output-format json`, `help --output-format json bogus`, valid `help`, and valid `--help`; (e) fold `help` into the global no-fallthrough audit so usage-discovery paths remain safe even when malformed. **Why this matters:** help is the recovery path after a parse error. If malformed help requests hang silently, users lose the most basic self-service diagnostic and wrappers cannot safely probe supported command topics. Source: gaebal-gajae dogfood response to Clawhip message `1506952218346127553` on 2026-05-21.
+
+536. **`diff` and `commit` malformed arguments zero-byte hang, so core git/tooling surfaces fall through to runtime instead of rejecting stray tokens before touching workspace state** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 10:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@03fefd2` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw diff bogus`, `claw diff bogus --output-format json`, and `claw diff --output-format json bogus` each timed out after 6s with `stdout=0`/`stderr=0`; the sweep then reached `claw commit bogus`, which also entered the same zero-output wait before the run was killed. `diff` should be a bounded local git inspection command and `commit` should have a documented commit-message/context grammar or reject unexpected args. Neither should silently route malformed invocations into prompt/runtime wait. **Required fix shape:** (a) make `diff` strict about accepted flags/positionals and reject extras with `kind:"unexpected_argument"`; (b) make `commit` either document/parse a single optional context/message argument or reject stray positionals before runtime startup; (c) support `--output-format json` consistently in valid positions and reject malformed placements with a bounded JSON/text parse error; (d) add elapsed-time regressions for the four probes above plus `commit bogus --output-format json`, `pr bogus`, and `issue bogus` if those share the same tool-command parser; (e) fold these git/tool commands into the global no-fallthrough parser audit. **Why this matters:** diff/commit are the coding loop's safety rails. A malformed diff command should not look like a stuck git process, and a malformed commit command should not start opaque model/session work before proving the CLI syntax is valid. Source: gaebal-gajae dogfood response to Clawhip message `1506959769838026895` on 2026-05-21.
+
+537. **`pr` and `issue` malformed arguments zero-byte hang, so GitHub artifact helpers fall through to runtime instead of validating their context/output grammar** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 10:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@87fa106` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw pr bogus`, `claw pr bogus --output-format json`, `claw pr --output-format json bogus`, and `claw issue bogus` each timed out after 6s with `stdout=0`/`stderr=0`; `claw issue bogus --output-format json` then entered the same zero-output wait before the run was killed. `pr [context]` and `issue [context]` may intentionally accept freeform context, but then they must either treat trailing text as context and fail with a bounded missing-auth/no-session/model-preflight diagnostic, or document that direct non-REPL invocation is unsupported and reject it. They should not silently wait before any event/log signal. **Required fix shape:** (a) define the direct CLI contract for `pr`/`issue`: supported one-shot artifact generation vs REPL-only helper; (b) if REPL-only, return `kind:"slash_command_repl_only"` with usage for bare/context/JSON forms; (c) if one-shot is supported, parse context and JSON/output flags strictly, then preflight auth/session requirements with bounded typed errors; (d) add elapsed-time regressions for `pr bogus`, `pr bogus --output-format json`, `pr --output-format json bogus`, `issue bogus`, and `issue bogus --output-format json`; (e) align with #536's `commit` handling so all GitHub/artifact helpers share one no-fallthrough parser. **Why this matters:** PR/issue generation is a high-value automation surface. A malformed or unsupported direct invocation should not look like prompt delivery, GitHub auth, or artifact generation has deadlocked; users need an immediate contract/error before trusting the helper in scripts. Source: gaebal-gajae dogfood response to Clawhip message `1506967318054436914` on 2026-05-21.
+
+538. **Direct slash-command invocations return bounded interactive-only errors but use `kind:"unknown"`, so automation cannot distinguish expected REPL-only usage from unclassified failures** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 11:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@236d345` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw /status`, `claw /diff`, `claw /version`, and `claw /session list` all returned bounded nonzero errors explaining the slash command is interactive-only and suggesting `claw --resume ... /<command>` where applicable, but every envelope used `[error-kind: unknown]`. `claw /status --output-format json` also returned a bounded usage error for unexpected args, again with `kind:"unknown"`. Positive control `claw /help` exited 0 with usage. This is better than the zero-byte fallthrough family, but still a machine-contract gap: direct slash command misuse is a deterministic, expected CLI condition, not an unknown internal failure. **Required fix shape:** (a) add typed error kinds for direct slash command routing: `slash_command_repl_only`, `slash_command_unexpected_args`, and/or `slash_command_resume_required`; (b) include fields such as `slash`, `resume_supported`, `usage`, `suggested_invocations`, and `unexpected_args`; (c) make JSON mode return the same typed envelope without prose scraping; (d) add regressions for `/status`, `/diff`, `/version`, `/session list`, `/status --output-format json`, and `/help`; (e) reserve `kind:"unknown"` for genuine unclassified failures only. **Why this matters:** slash commands are a major entrypoint for users migrating from REPL to scripts. Wrappers need to distinguish “this command is REPL-only; use --resume or top-level equivalent” from actual CLI/runtime failure without parsing English help text. Source: gaebal-gajae dogfood response to Clawhip message `1506974870372745368` on 2026-05-21.
+
+539. **Direct slash commands reject `--output-format json` as unexpected args and still print prose `kind:"unknown"` errors to stderr, so JSON-mode callers cannot get machine-readable slash diagnostics** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 11:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ad6b573` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace: `claw /diff --output-format json`, `claw /version --output-format json`, `claw /session list --output-format json`, `claw /config env --output-format json`, and `claw /agents list --output-format json` all exited nonzero with `stdout=0` and stderr-only prose usage/error output. Every case used `[error-kind: unknown]`; none emitted a JSON error envelope despite an explicit `--output-format json` token. This is distinct from #538's generic slash `unknown` kind drift: JSON-mode slash invocations are both rejected as unexpected args and denied machine-readable error output. **Required fix shape:** (a) decide whether direct slash commands support `--output-format json`; if yes, parse it before slash-arg validation and return structured errors; if no, reject with a typed JSON-capable `unsupported_flag_for_slash_command` envelope; (b) include `slash`, `unexpected_args`, `usage`, `resume_supported`, and `suggested_top_level_command` fields; (c) return JSON on stdout or stderr consistently with other JSON error contracts; (d) add regressions for the five probes above plus valid `/help --output-format json` behavior (support or typed reject); (e) replace `kind:"unknown"` with slash-specific kinds from #538. **Why this matters:** wrappers often add `--output-format json` globally to every probe. Slash-command errors currently force prose scraping and make expected REPL-only or unsupported-format cases indistinguishable from unknown failures. Source: gaebal-gajae dogfood response to Clawhip message `1506982418014277683` on 2026-05-21.
+
+540. **`--resume latest /<slash> --output-format json` fails before slash dispatch with `session_load_failed` when no session exists, so resume-safe slash commands cannot expose static usage/JSON-format errors in cold workspaces** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 12:00/12:30 UTC nudges on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@3b9332b` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Clean-home probes from a minimal temp workspace with no sessions: `claw --resume latest /status --output-format json`, `/diff --output-format json`, `/help --output-format json`, `/config env --output-format json`, and `/agents list --output-format json` all returned the same stderr JSON error: `kind:"session_load_failed"`, `failed to restore session: no managed sessions found in .claw/sessions/<fingerprint>/`, with `stdout=0`. The parser never reaches slash-command usage/argument validation, so callers cannot learn whether `--output-format json` is supported for that slash command, whether the command is resume-safe, or what static usage applies unless a session already exists. This compounds #538/#539 (direct slash errors are `unknown`/prose) with a resume-path ordering issue: cold workspaces collapse every slash probe into a generic session-load error. **Required fix shape:** (a) parse resume slash commands and static help/usage/format flags before attempting to load `latest`; (b) for slash commands that can answer without session state (`/help`, `/agents list`, maybe `/config env`), return the bounded result even when no session exists; (c) for stateful slash commands (`/status`, `/diff`) return a typed `kind:"no_managed_sessions"` / `reason:"session_required"` envelope that preserves `slash`, `usage`, and `resume_supported`; (d) route JSON-mode error envelopes according to the global JSON stream contract; (e) add cold-workspace regressions for the five probes above plus a warm-session control proving real dispatch still works. **Why this matters:** resume-safe slash commands are the documented way to inspect sessions without entering the REPL. In a cold or wrong-cwd workspace, automation needs to distinguish “no session exists” from “slash command/JSON args unsupported” and still retrieve usage; a generic pre-dispatch session-load failure hides the real command contract. Source: gaebal-gajae dogfood response to Clawhip messages `1506989972564086886` and `1506997517198430249` on 2026-05-21.
+
+541. **`SessionStore::from_cwd` and `from_data_dir` eagerly `create_dir_all` the sessions namespace, so read-only session discovery and failed resume attempts mutate the workspace before proving a session will be read or written** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 13:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8a92f0e` and binary built from source SHA `25d663d`. Code inspection: `runtime/src/session_control.rs:32-43` builds `<cwd>/.claw/sessions/<workspace_hash>/` and immediately calls `fs::create_dir_all(&sessions_root)?`; `from_data_dir` does the same at lines 54-67. These constructors are used by read/discovery paths (`session_control.rs` list/exists/resolve helpers, `main.rs` session store setup, tests, and resume command routing), so merely asking to list sessions, check existence, or resume `latest` in a fresh workspace can create `.claw/sessions/<fingerprint>/` even when no session exists and the command ultimately fails. This is the root cause behind the repeated failed-resume droppings noted in #435/#444/#540, but the actual product gap is broader: the constructor for a session *store handle* performs durable filesystem mutation instead of deferring directory creation to the first successful save/create operation. **Required fix shape:** (a) split `SessionStore::from_cwd` into non-mutating construction and an explicit `ensure_exists_for_write()` path; (b) make list/exists/resolve/latest handle missing session directories as empty/not-found without creating them; (c) call `create_dir_all` only when creating a new session file or writing a session snapshot; (d) expose a `store_exists` / `created:false` diagnostic if needed, but do not mutate during read-only probes; (e) add regressions proving `--resume latest`, session list/exists, and cold-workspace slash probes leave no `.claw/` tree behind on failure, while a real successful save creates the namespace. **Why this matters:** session discovery is supposed to be a read-only observability surface. Eagerly creating workspace metadata during failed reads pollutes repos, confuses `git status`, and makes cold-workspace probes look like they partially initialized state even though no usable session exists. Source: gaebal-gajae dogfood response to Clawhip message `1507005067016929364` on 2026-05-21.
+
+542. **`claw session --help --output-format json` and other top-level `session` invocations fall through to prompt/runtime startup, then hang behind config/plugin initialization instead of returning bounded local help or typed unsupported-subcommand JSON** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 13:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@adaf8c3` and binary built from source SHA `25d663d`. Reproduction in normal env: `timeout --kill-after=1s 5s ./rust/target/debug/claw session --help --output-format json` exits 124 with zero stdout and only the settings deprecation warning on stderr (`enabledPlugins` deprecated). `session list --output-format json` and `session exists latest --output-format json` show the same zero-stdout timeout pattern. Code inspection explains it: `parse_local_help_action` only recognizes `status|sandbox|doctor|acp|init|state|export|version|system-prompt|dump-manifests|bootstrap-plan`; `LocalHelpTopic` has no `Session` variant; and the top-level parser has no `session` branch even though slash/resume `/session list|exists|switch|fork|delete` has structured helpers (`render_session_list`, `session_exists_json`). As a result, a user asking for the session command contract is routed into the generic prompt path and waits on runtime startup instead of getting a small control-plane response. **Required fix shape:** (a) add a top-level `session` command parser with `help|list|exists <id>|show <id>|delete?` or explicitly documented unsupported actions; (b) add `LocalHelpTopic::Session` and bounded JSON help showing supported top-level vs slash/resume session operations; (c) make `session list` and `session exists` use non-mutating session-store read paths from #541 and return typed empty/not-found results; (d) reject unsupported `session` actions with `kind:"unsupported_session_action"` including `action`, `supported_actions`, and whether slash/resume alternatives exist; (e) add elapsed-time regressions for `session --help --output-format json`, `session list --output-format json`, `session exists latest --output-format json`, and `session bogus --output-format json`. **Why this matters:** session management is the backbone of resume/export automation. If the natural top-level `session` spelling hangs, operators cannot safely discover or preflight sessions without already knowing the slash-only internal contract, and a control-plane typo looks like runtime/plugin deadlock. Source: gaebal-gajae dogfood response to Clawhip message `1507012621079937165` on 2026-05-21.
+
+543. **Top-level local inventory commands (`plugins list`, `mcp list`, `agents list`, `skills list`) hang in normal env when deprecated config warnings are present, so a non-fatal settings migration warning blocks JSON inventory output entirely** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@dbc7125` and binary built from source SHA `25d663d`. Reproduction in the operator's normal env: bounded probes `timeout --kill-after=1s 6s ./rust/target/debug/claw plugins list --output-format json`, `mcp list --output-format json`, `agents list --output-format json`, and `skills list --output-format json` each exited 124 with zero stdout and only one stderr line: `warning: /home/bellman/.claw/settings.json: field "enabledPlugins" is deprecated (line 2). Use "plugins.enabled" instead`. These are supposed to be local discovery surfaces and the code has direct top-level branches (`CliAction::Plugins`, `CliAction::Mcp`, `CliAction::Agents`, `CliAction::Skills`) plus JSON renderers, but a non-fatal config deprecation warning appears before output and the command never completes. This is distinct from unknown-subcommand parser fallthrough (#529/#530) and missing top-level `session` (#542): here the requested actions are valid, parsed, and should be bounded, yet the normal-env warning path wedges them. **Required fix shape:** (a) make config deprecation warnings non-blocking for read-only inventory commands and never require interaction/cleanup before emitting JSON; (b) include warnings in a structured `warnings[]` field for JSON mode instead of stderr-only prelude that can precede a hang; (c) ensure inventory handlers can load partial/deprecated config and return `status:"degraded"` with `config_warnings` rather than timing out; (d) add normal-env/fixture regressions with deprecated `enabledPlugins` proving `plugins list`, `mcp list`, `agents list`, and `skills list` all produce bounded JSON within a small budget; (e) audit `dump-manifests` and `system-prompt` for the same warning-induced wedge. **Why this matters:** inventory commands are the emergency observability path for MCP/plugin/agent lifecycle issues. A migration warning must not make those commands look dead, especially in the exact long-lived user configs where deprecated fields are most likely to exist. Source: gaebal-gajae dogfood response to Clawhip message `1507020170671947888` on 2026-05-21.
+
+544. **Local help-topic commands (`export --help`, `status --help`, `version --help`, `bootstrap-plan --help`) hang in normal env when the deprecated settings warning is present, so usage discovery is blocked by a non-fatal config migration warning** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@785d4bd` and binary built from source SHA `25d663d`. Reproduction in the operator's normal env: bounded probes `timeout --kill-after=1s 5s ./rust/target/debug/claw export --help --output-format json`, `status --help --output-format json`, and `version --help --output-format json` each exited 124 with zero stdout and only `warning: /home/bellman/.claw/settings.json: field "enabledPlugins" is deprecated (line 2). Use "plugins.enabled" instead` on stderr; `bootstrap-plan --help --output-format json` entered the same wait before the sweep was killed. Code already has `parse_local_help_action` and `LocalHelpTopic` support for these commands, so help rendering should be a pure parser/static-output path. Instead, the presence of a config deprecation warning appears to push even `--help` into the same startup/config/plugin wedge as #543's inventory commands. **Required fix shape:** (a) route local help-topic requests before any config/plugin/runtime loading that can emit warnings or hang; (b) guarantee `claw <local-command> --help --output-format json` produces bounded static JSON independent of user config validity/deprecation state; (c) if warnings are intentionally collected, attach them only after the help payload as structured `warnings[]`, never as a blocking stderr prelude; (d) add fixture regressions with deprecated `enabledPlugins` for `export/status/version/bootstrap-plan --help --output-format json` plus text-mode controls; (e) audit all `LocalHelpTopic` variants for the same warning-induced hang. **Why this matters:** help is the recovery path when startup/config is broken. If a deprecated config warning prevents help from rendering, users cannot learn the command contract needed to fix the config or avoid the bad path. Source: gaebal-gajae dogfood response to Clawhip message `1507027716191551630` on 2026-05-21.
+
+545. **Core local diagnostics (`doctor`, `status`, `sandbox`, `config env`, `state`) hang in normal env when the deprecated settings warning is present, so the config-warning wedge blocks both inventory/help and the primary health surfaces** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8d0dc50` and binary built from source SHA `25d663d`. Reproduction in the operator's normal env: bounded probes `timeout --kill-after=1s 5s ./rust/target/debug/claw doctor --output-format json`, `status --output-format json`, `sandbox --output-format json`, `config env --output-format json`, and `state --output-format json` each exited 124 with zero stdout and only `warning: /home/bellman/.claw/settings.json: field "enabledPlugins" is deprecated (line 2). Use "plugins.enabled" instead` on stderr. #543 captured inventory commands and #544 captured help topics; this entry shows the same non-fatal migration warning also disables the core health/config/state probes users need to diagnose startup. These commands are local and should be resilient to partial/deprecated config, especially `doctor`, whose purpose is to classify config problems. **Required fix shape:** (a) move deprecation collection into the diagnostic payload path instead of printing-and-wedging before command dispatch; (b) make `doctor/status/sandbox/config/state` produce bounded JSON even when config has deprecated fields, with `warnings[]` and `status:"degraded"` where appropriate; (c) guarantee `doctor` never depends on successful modern config normalization to report config migration issues; (d) add fixture regressions with `settings.json` containing `enabledPlugins` for all five probes, asserting non-timeout, nonzero/zero exit semantics as documented, and structured warning metadata; (e) unify this with #543/#544 as one startup/config-warning no-hang invariant across every local command. **Why this matters:** users run `doctor`, `status`, and `config env` precisely when startup/config looks wrong. If a deprecated config warning makes those commands hang with no JSON, the CLI loses its self-diagnostic escape hatch and every support workflow devolves into manual stderr archaeology. Source: gaebal-gajae dogfood response to Clawhip message `1507035269830934600` on 2026-05-21.
+
+546. **One-shot prompt/setup/git commands (`prompt`, `init`, `diff`) hang in normal env when the deprecated settings warning is present, so the warning wedge reaches executing and workspace-mutating entrypoints, not only diagnostics** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@0a2542a` and binary built from source SHA `25d663d`. Reproduction in the operator's normal env: bounded probes `timeout --kill-after=1s 5s ./rust/target/debug/claw prompt hello --output-format json`, `init --output-format json`, and `diff --output-format json` each exited 124 with zero stdout and only `warning: /home/bellman/.claw/settings.json: field "enabledPlugins" is deprecated (line 2). Use "plugins.enabled" instead` on stderr. #543 covered inventory, #544 help, and #545 diagnostics; this entry shows the same non-fatal config migration warning also blocks the main one-shot prompt path, project initialization, and local git diff inspection. `init` is especially problematic because it is the command a user might run to refresh/migrate workspace scaffolding, yet the warning prevents it from producing the artifact report. **Required fix shape:** (a) make deprecated settings warnings non-blocking for every dispatch class, including prompt, init, and diff; (b) for prompt/runtime paths, surface migration warnings as structured preflight metadata before model work without preventing bounded startup/result/error output; (c) for `init`, avoid loading user runtime/plugin config before emitting or applying idempotent project scaffolding; (d) for `diff`, keep the command pure git/local and warning-resilient with `warnings[]` in JSON mode; (e) add fixture regressions with `settings.json` containing `enabledPlugins` for `prompt hello --output-format json`, `init --output-format json`, and `diff --output-format json`, asserting deterministic output or typed prompt preflight failure rather than timeout. **Why this matters:** once the warning wedge reaches prompt/init/diff, users cannot even run the normal work loop or bootstrap/migrate away from the warning state. A deprecation notice must never be a hidden global startup blocker. Source: gaebal-gajae dogfood response to Clawhip message `1507042815606259972` on 2026-05-21.
+
+547. **Text-mode static/local commands still emit deprecated-settings warnings to stderr, and `system-prompt` emits the same warning twice, so successful output is polluted by unstructured migration noise even when the command completes** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@10f5dac` and binary built from source SHA `25d663d`. Normal-env probes showed `claw --help`, `claw help`, `claw version`, and `claw --version` complete cleanly with no warning, but `claw bootstrap-plan` exits 0 with the static phase list on stdout and the deprecated `enabledPlugins` settings warning on stderr; `claw system-prompt` exits 0 with the rendered system prompt on stdout and prints the same deprecation warning twice on stderr. `claw dump-manifests` exits 1 for missing manifests and also prefixes the structured error with the warning. #543-#546 cover JSON-mode warning-induced hangs; this entry captures the surviving text-mode event/log opacity: even when commands complete, successful static/local output is coupled to unstructured warning chatter, and some paths load config twice. **Required fix shape:** (a) dedupe config warnings per process/command before writing to stderr or structured output; (b) do not emit runtime config migration warnings for static commands that do not semantically need runtime config (`bootstrap-plan`, likely help/version); (c) for commands that intentionally inspect/render config-dependent state (`system-prompt`, `dump-manifests`), collect warnings into one structured/clearly phased diagnostic block instead of repeated raw lines; (d) add regressions with deprecated `enabledPlugins` proving text-mode `bootstrap-plan` has clean stderr or an intentional single warning, and `system-prompt` emits at most one warning; (e) align warning collection with the JSON `warnings[]` fix required by #543-#546. **Why this matters:** claws and shell pipelines treat stderr as the failure/diagnostic channel. Duplicate or irrelevant migration warnings make successful static outputs look degraded, break golden-output tests, and hide real errors behind repeated noise. Source: gaebal-gajae dogfood response to Clawhip message `1507050372181524622` on 2026-05-21.
+
+548. **Two timestamp helpers in `tools/src/lib.rs` disagree on their contract: `iso8601_timestamp()` shells out to `/bin/date` for RFC3339, then falls back to `iso8601_now()` which returns epoch seconds, so the same JSON field can change format by host/tool availability** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@832098c` and binary built from source SHA `25d663d`. Code inspection: `tools/src/lib.rs:6091-6100` implements `iso8601_timestamp()` by spawning `date -u +%Y-%m-%dT%H:%M:%SZ`; if that command is unavailable or fails, it calls `iso8601_now()`. But `iso8601_now()` at `tools/src/lib.rs:5262-5268` serializes `SystemTime::now().duration_since(UNIX_EPOCH).as_secs().to_string()`, yielding strings like `1748004000`, not an ISO/RFC3339 timestamp. `execute_brief` uses `iso8601_timestamp()` for `BriefOutput.sent_at`, so `sent_at` is RFC3339 on hosts with GNU/BSD `date`, but silently becomes epoch-seconds on minimal/sandboxed hosts where `date` is missing or fails. This is adjacent to the LaneEvent timestamp bug reported in the same channel, but distinct: the brief/notification surface has an environment-dependent timestamp format because one helper's fallback violates the other helper's named contract. **Required fix shape:** (a) replace both helpers with one pure Rust RFC3339 UTC formatter using an existing time crate or a tiny internal formatter; (b) remove shelling out to `date` from timestamp generation so output format is host-independent and deterministic; (c) if an epoch timestamp is desired anywhere, name it `unix_epoch_seconds_now` and type it separately, never as an ISO helper fallback; (d) add tests asserting `BriefOutput.sent_at`, AgentOutput `created_at/completed_at`, and LaneEvent timestamps parse as RFC3339 under both normal and forced-fallback conditions; (e) add a contract comment/schema note that string timestamps are RFC3339/ISO-8601 UTC. **Why this matters:** timestamp fields are coordination/log ordering primitives. If the same field is `"2026-05-21T16:30:00Z"` on one host and `"1747845000"` on another, downstream parsers, JSON schemas, and UI timelines cannot trust event ordering or distinguish parse failures from host quirks. Source: gaebal-gajae dogfood response to Clawhip message `1507057919278190654` on 2026-05-21.
+
+549. **G004 conformance validates `laneEvents[].emittedAt` only as a non-empty string, so epoch-seconds timestamps pass the contract even though fixtures and field names imply RFC3339/ISO date-time** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d670b43` and binary built from source SHA `25d663d`. Code inspection: `runtime/src/g004_conformance.rs:66-69` requires `/event`, `/status`, and `/emittedAt` to be non-empty strings, but never parses or pattern-checks `/emittedAt`. In `runtime/src/lane_events.rs`, `LaneEvent.emitted_at` is serialized as `emittedAt` and tests/fixtures use RFC3339-looking values like `2026-04-04T00:00:00Z`, but the validator would accept `"1748004000"`, `"not-a-date"`, or any other non-empty string. This compounds #548 and Jobdori's timestamp helper finding: even if producers emit epoch strings from `iso8601_now()`, the advertised machine-checkable G004 contract will not catch the timestamp-format regression. **Required fix shape:** (a) add an RFC3339/ISO-8601 UTC validator for `laneEvents[].emittedAt` in `validate_lane_events`; (b) reject numeric-looking epoch strings and arbitrary prose with a clear error like `expected RFC3339 timestamp`; (c) apply the same validation to report/approval-token date-time fields if the bundle schema has them; (d) add negative tests showing `"1748004000"` and `"not-a-date"` fail while `"2026-04-04T00:00:00Z"` passes; (e) align producer helpers from #548/#549 so conformance and production agree on timestamp contract. **Why this matters:** conformance helpers are supposed to prevent log/event opacity bugs from reaching downstream claws. If the contract only checks non-empty strings, timestamp regressions become invisible until UI parsers, JSON-schema validators, or ordering logic fail later. Source: gaebal-gajae dogfood response to Clawhip message `1507065465552507033` on 2026-05-21.
+
+550. **Agent manifest timestamp tests only assert non-empty/presence, so epoch-seconds `createdAt`/`startedAt`/`completedAt` and lane-event timestamps are baked into tests as acceptable output** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@02b6855` and binary built from source SHA `25d663d`. Code inspection: `tools/src/lib.rs:2734-2755` defines `AgentOutput.createdAt`, `startedAt`, `completedAt`, and `laneEvents`. Production fills them via `iso8601_now()` at `tools/src/lib.rs:3663`, `3693-3696`, `3912`, and terminal `LaneEvent::*` calls. But the main manifest regression at `tools/src/lib.rs:8147-8152` only asserts `!manifest.created_at.is_empty()`, `started_at.is_some()`, and `completed_at.is_none()`; later completed/failed manifest tests assert event names/status/details/data but never assert that `createdAt`, `startedAt`, `completedAt`, or `laneEvents[].emittedAt` parse as RFC3339. As a result, the current epoch-seconds strings from `iso8601_now()` satisfy the test suite, and a future producer fix could regress again without a red test. **Required fix shape:** (a) add a shared test helper that validates RFC3339/ISO-8601 UTC strings for AgentOutput top-level timestamps and every lane event `emittedAt`; (b) update creation, completion, failure, spawn-error, recovery, review, selection, and artifact manifest tests to call it; (c) add explicit negative/unit coverage proving epoch strings like `"1748004000"` fail the helper; (d) align test fixtures with the G004 conformance timestamp validation required by #549; (e) ensure tests check serialized JSON fields, not only in-memory struct presence. **Why this matters:** tests are currently encoding the broken contract as acceptable by checking only non-empty strings. Without semantic timestamp assertions, timestamp-format fixes in #548/#549 have no durable safety net and downstream parser breakage can reappear silently. Source: gaebal-gajae dogfood response to Clawhip message `1507073019296874608` on 2026-05-21.
+
+551. **Lane completion ignores timestamp validity and recency entirely, so stale or epoch-formatted AgentOutput manifests can be auto-marked completed if status/tests/push flags are green** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@2bc85c5` and binary built from source SHA `25d663d`. Code inspection: `tools/src/lane_completion.rs::detect_lane_completion` decides completion only from `error.is_none()`, `status` being completed/finished, `current_blocker.is_none()`, `test_green`, and `has_pushed`. It never reads or validates `AgentOutput.createdAt`, `startedAt`, `completedAt`, or `laneEvents[].emittedAt`. The test fixture at `lane_completion.rs:103-120` uses RFC3339-looking timestamps, but there is no negative coverage for epoch strings or stale completedAt values. Therefore a manifest with `completedAt:"1748004000"` (from the broken `iso8601_now()` producer) or a very old completion timestamp can still generate a completed `LaneContext` as long as the status/tests/push booleans are favorable. This is distinct from #550's manifest test coverage gap: the completion policy itself treats temporal fields as irrelevant even though stale-session cleanup and lane closeout depend on trustworthy event time. **Required fix shape:** (a) parse and validate `completedAt` (and ideally `createdAt`/`startedAt`/terminal lane event `emittedAt`) before auto-completion; (b) reject or mark degraded manifests whose timestamp fields are missing, non-RFC3339, or implausibly stale relative to the current run; (c) thread a typed temporal blocker/degraded reason into `LaneContext` instead of silently completing; (d) add tests where status/tests/push are green but `completedAt` is `"1748004000"`, `"not-a-date"`, missing, or older than a configured freshness window, proving completion is withheld or degraded; (e) align this with #548-#550 so timestamp producer, validator, tests, and completion policy share one contract. **Why this matters:** completion is an action trigger, not just a display label. If stale or malformed manifests can auto-close lanes, claws may cleanup sessions, suppress reminders, or report success based on artifacts whose time identity is untrustworthy. Source: gaebal-gajae dogfood response to Clawhip message `1507080567731261655` on 2026-05-21.
+
+552. **Anthropic retry tests cover `/v1/messages` but not the preflight `/v1/messages/count_tokens` endpoint, so a transient 429 during token counting can be silently swallowed or skip refined context-window enforcement with no regression coverage** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@762ee65` and binary built from source SHA `25d663d`. Code/test inspection: `api/src/providers/anthropic.rs::preflight_message_request` calls `self.count_tokens(request).await`, but on any error it immediately `return Ok(())` and falls back to the coarse local byte guard. The `count_tokens` implementation sends one bare request to `/v1/messages/count_tokens` with no retry. Existing integration tests in `api/tests/client_integration.rs` cover retry behavior for `/v1/messages` (`retries_retryable_failures_before_succeeding`, `surfaces_retry_exhaustion_for_persistent_retryable_errors`, `retries_multiple_retryable_failures_with_exponential_backoff_and_jitter`) and local oversized-byte blocking, but there is no test where count_tokens returns 429/503 once then succeeds, nor a test asserting refined context-window rejection still happens after a retry. Therefore a count-token rate limit can disable the precise token guard and still let `send_message` proceed to `/v1/messages`, while the retry suite stays green. **Required fix shape:** (a) add integration tests with mock server sequence `429 count_tokens -> 200 count_tokens -> ...` proving the same retry policy applies before `/v1/messages`; (b) add a test where retried count_tokens returns a token count that exceeds the model window and assert no `/v1/messages` call is sent; (c) when count_tokens errors are intentionally best-effort, expose structured telemetry/warnings so skipped refined preflight is observable; (d) then implement retry or explicitly document degraded behavior; (e) keep request-count assertions path-aware so count_tokens and messages attempts are distinguishable. **Why this matters:** the retry suite currently gives false confidence because it exercises only the final message endpoint. Under rate pressure, the preflight endpoint is exactly where retries matter for safe compaction/context-window decisions. Source: gaebal-gajae dogfood response to Clawhip message `1507088113892331622` on 2026-05-21.
+
+553. **Worker restart reuses the original `created_at`, so startup-timeout elapsed time after restart includes the previous worker lifetime and stale pre-restart events** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51c05ac` and binary built from source SHA `25d663d`. Code inspection: `WorkerRegistry::restart` at `runtime/src/worker_boot.rs:600-621` resets status, prompt fields, trust, and error state, then appends `WorkerEventKind::Restarted`, but it does not reset `worker.created_at` or clear/carry a new boot-start timestamp. `observe_startup_timeout` at `worker_boot.rs:713-735` computes `elapsed = now.saturating_sub(worker.created_at)` and reports `command_started_at: worker.created_at`. Therefore a worker restarted after a long previous lifetime can immediately report a huge startup timeout elapsed/command_started_at from the original boot, not the restart. Because events are also retained, the timeout evidence can mix pre-restart trust/tool-permission detections with the new boot attempt. This is distinct from Jobdori's #554 O(3N) scan/unbounded-events finding: even with O(1) caches, the temporal anchor for a restarted boot is wrong. **Required fix shape:** (a) add `boot_started_at` or `current_attempt_started_at` distinct from immutable worker creation time; (b) set that timestamp on create and restart, and use it for startup timeout elapsed/command_started_at; (c) either scope trust/tool-permission evidence to events since the current boot attempt or store per-attempt cached flags; (d) include `attempt_index`/`restart_count` in startup evidence and worker events so old/new attempts are separable; (e) add a regression where a worker is created, time advances/restart occurs, then `observe_startup_timeout` reports elapsed from restart rather than original creation and ignores pre-restart prompts. **Why this matters:** restart is supposed to create a fresh startup attempt. If timeout evidence is anchored to the first creation, operators see misleading "stalled for hours" reports and stale blocker classifications for a brand-new restart, which breaks recovery decisions. Source: gaebal-gajae dogfood response to Clawhip message `1507095667959402638` on 2026-05-21.
+
+554. **Recovery ledger tests assert only `started_at.is_some()` / `finished_at.is_some()`, so fake tick-counter timestamps are explicitly accepted by the machine-readable ledger suite** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 19:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1540e3a` and binary built from source SHA `25d663d`. Code inspection: `RecoveryContext::next_timestamp` in `runtime/src/recovery_recipes.rs:276-279` returns `recovery-ledger-tick-N`, and `attempt_recovery` stores those strings into public `RecoveryLedgerEntry.started_at` / `finished_at`. The test named `recovery_context_exposes_machine_readable_ledger` at `recovery_recipes.rs:688-720` only asserts `entry.started_at.is_some()` and `entry.finished_at.is_some()`; exhaustion/failure ledger tests likewise validate state/result/command details but not timestamp parseability. Therefore the test suite labels the ledger machine-readable while allowing non-date sentinel strings in fields named `started_at` and `finished_at`. This is the test-coverage sibling of Jobdori's #555 public API timestamp bug: even after production is fixed, nothing in the ledger tests prevents a regression back to tick strings or other unparseable data. **Required fix shape:** (a) add a recovery timestamp assertion helper that parses `started_at` and `finished_at` as RFC3339/ISO-8601 UTC; (b) update success, exhausted, and failed ledger tests to use it; (c) add a negative unit test proving `recovery-ledger-tick-1` is rejected by the helper/contract; (d) document whether recovery ledger timestamps are wall-clock instants or monotonic attempt IDs, and if both are needed, add separate `attempt_seq` instead of overloading timestamp fields; (e) align with the timestamp contract fixes in #548-#551. **Why this matters:** tests currently make the wrong semantic promise: "machine-readable" only means present. Recovery ledgers drive retries/escalation audit trails, so timestamp fields must be parseable dates or consumers cannot sort, correlate, or display recovery attempts reliably. Source: gaebal-gajae dogfood response to Clawhip message `1507103214665859264` on 2026-05-21.
+
+555. **Workspace-test stale-branch preflight can compare against stale local `main` instead of `origin/main`, letting branches behind the remote base run full workspace tests as “fresh”** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a036293` and binary built from source SHA `25d663d`. Live channel context had multiple open claw-code PRs whose head ref is `main`, and the watchdog target is specifically stale-branch confusion. Code inspection: `tools/src/lib.rs::workspace_test_branch_preflight` reads the current branch then calls `resolve_main_ref(&branch)` before `check_freshness`. `resolve_main_ref` at `tools/src/lib.rs:2020-2032` returns local `main` whenever it exists, except when the current branch itself is `main` and `origin/main` exists. In the common feature-branch case with both refs present, the stale-branch guard compares feature branch to local `main`, not `origin/main`. If local `main` has not been fetched/updated, a branch can be behind `origin/main` but equal to local `main`, so `check_freshness` returns `Fresh` and `cargo test --workspace` proceeds without the preflight block. **Required fix shape:** (a) prefer `origin/main` (or the configured protected/base remote ref) for non-main branches when present; (b) fetch or verify the remote ref freshness before using it, or emit a degraded `branch.remote_base_unknown`/`branch.base_ref_stale` lane event instead of silently falling back; (c) include `baseRefSource` and `baseRefCommit` in the blocked lane event payload so operators know whether freshness was checked against local or remote state; (d) add a regression with local `main` stale, `origin/main` ahead, and a feature branch equal to local `main`, proving workspace tests are blocked; (e) keep the current `branch == main -> origin/main` behavior but cover it separately. **Why this matters:** full workspace tests are used as green evidence. If the guard checks an outdated local main, agents can burn time and report green against a stale base while missing fixes already in the remote protected branch. Source: gaebal-gajae dogfood response to Clawhip message `1507110763301437440` on 2026-05-21.
+
+556. **Workspace-test stale-branch preflight only matches fixed argument order, so broad workspace test commands like `cargo test --all-targets --workspace` bypass the stale-base guard** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@6ef5457` and binary built from source SHA `25d663d`. Active tmux list was empty at probe time. Code inspection: `tools/src/lib.rs::is_workspace_test_command` normalizes whitespace/lowercase, then checks substring needles in this exact order: `cargo test --workspace`, `cargo test --all`, `cargo nextest run --workspace`, `cargo nextest run --all`. The existing regression `bash_workspace_tests_are_blocked_when_branch_is_behind_main` uses `cargo test --workspace --all-targets`, which matches the fixed-order needle. But semantically equivalent broad commands such as `cargo test --all-targets --workspace`, `cargo test --locked --workspace`, `cargo test --all-features --workspace`, or `cargo nextest run --all-features --workspace` do not contain the exact substring `cargo test --workspace` / `cargo nextest run --workspace`, so `workspace_test_branch_preflight` returns `None` and the command executes even on a stale branch. Targeted-test skip coverage does not protect this because the bypassed commands are still workspace-wide tests. **Required fix shape:** (a) parse shell command tokens enough to identify `cargo test` and `cargo nextest run` invocations independent of flag order; (b) classify workspace-wide tests when any token is `--workspace` or `--all` for those subcommands, regardless of intervening flags; (c) add negative coverage for targeted package tests that include workspace-looking strings only in quoted args/comments; (d) add regressions proving stale-branch preflight blocks `cargo test --all-targets --workspace`, `cargo test --locked --workspace`, and `cargo nextest run --all-features --workspace`; (e) include the normalized detected test scope in the structured branch-divergence event so operators can see why a command was blocked. **Why this matters:** agents often reorder cargo flags. A stale-branch safety guard that depends on one flag order gives false confidence and lets expensive full-suite green evidence be produced against stale code. Source: gaebal-gajae dogfood response to Clawhip message `1507118317528158370` on 2026-05-21.
+
+557. **Wrong-task prompt-misdelivery detection only recognizes `›` prompt echoes, so `>` / `❯` agent prompts can hide mismatched-task receipts until timeout** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@bc55711` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-311-auto-merge-race-receipt`. Code inspection: worker readiness accepts multiple prompt glyphs (`>`, `›`, `❯`) in `detect_ready_for_prompt` at `runtime/src/worker_boot.rs:1079-1113`, but `detect_prompt_echo` at `worker_boot.rs:1210-1217` only strips a leading `›`. `detect_prompt_misdelivery` relies on `detect_prompt_echo` for the `mismatched_prompt_visible` path that catches wrong-task receipts when the screen shows a different task prompt. The existing regression `wrong_task_receipt_mismatch_is_detected_before_execution_continues` uses `› Explain this KakaoTalk screenshot...`, so it exercises only the single supported glyph. If the coding agent UI echoes `> Explain...` or `❯ Explain...`, `observed_prompt_preview` is `None`; when the expected prompt text is not also visible, the wrong-task mismatch is not detected and the worker stays `Running` until coarse startup timeout classification. **Required fix shape:** (a) make prompt-echo parsing share the same glyph set as `detect_ready_for_prompt` (`>`, `›`, `❯`, and boxed `│ >` variants if present); (b) add wrong-task receipt tests for `>`, `›`, and `❯` echoes; (c) include the raw echo line/glyph in `WorkerEventPayload::PromptDelivery` or event detail so operators can diagnose UI variant drift; (d) ensure shell prompt detection remains separate so real shell prompts are still classified as `Shell`, not wrong-task agent echoes; (e) add a timeout evidence regression proving observed prompt preview is populated for all supported glyphs. **Why this matters:** prompt-misdelivery protection is only as good as the UI echo parser. Supporting multiple ready glyphs but only one echo glyph creates event/log opacity: operators see a generic timeout instead of a precise wrong-task replay condition for common terminal themes or agent UIs. Source: gaebal-gajae dogfood response to Clawhip message `1507125863408341102` on 2026-05-21.
+
+558. **Tool-permission gate detection is hard-coded to one English MCP prompt shape, so alternate MCP approval wording can fall through as startup-no-evidence instead of `ToolPermissionRequired`** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9fd61af` and binary built from source SHA `25d663d`. Active tmux session at probe time: `omx-issue-2443-ralplan-consensus-resume`. Code inspection: `detect_tool_permission_prompt` in `runtime/src/worker_boot.rs:958-999` only enters when the full screen contains either `allow the` + `server` + `tool` + `run`, or `allow tool` + `run`. The only production-shaped tests at `worker_boot.rs:1387-1474` use exactly `Allow the omx_memory MCP server to run tool "..."?`. Common equivalent approval copy such as `Allow MCP server omx_memory to call tool "project_memory_read"?`, `Allow server omx_memory to execute tool ...`, `Approve tool project_memory_read from omx_memory?`, or localized/shorter plugin prompts do not contain the exact `allow the ... server ... run tool` / `allow tool ... run` token pattern. When those appear during boot, `observe` will not set `WorkerStatus::ToolPermissionRequired`, no structured `ToolPermissionPrompt` payload is emitted, and later timeout evidence can degrade to generic `startup_no_evidence` or worker-crashed classification even though the pane clearly showed an approval gate. **Required fix shape:** (a) replace phrase-order checks with a tolerant classifier over permission verbs (`allow`/`approve`/`permit`), execution verbs (`run`/`call`/`execute`), and MCP/tool tokens independent of order; (b) add fixture tests for at least three real-world prompt variants, including `call tool` and prompts where the tool name appears before the server; (c) preserve extracted `server_name`, `tool_name`, allow-scope, and raw `prompt_preview` even when fields are partial; (d) emit an `Unknown`-scope tool-permission event rather than falling through when the approval intent is clear but parsing is incomplete; (e) include classifier confidence/reason in startup timeout evidence so UI wording drift is visible. **Why this matters:** MCP permission prompts are exactly the kind of boot blocker operators need to resolve quickly. A brittle single-template detector converts an actionable “click allow” condition into opaque startup failure noise whenever plugin/UI copy drifts. Source: gaebal-gajae dogfood response to Clawhip message `1507133416884404254` on 2026-05-21.
+
+559. **Auto-compaction can refuse to compact very large short sessions because `compact_session` still enforces the default message-count gate even after real provider usage crosses the input-token threshold** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 22:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9ef521b` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: `gajae-issue-313-omx-launch-resilience-receipt`, `omx-pr-2447-ralplan-consensus-final-review`. Code inspection: `ConversationRuntime::maybe_auto_compact` at `runtime/src/conversation.rs:559-572` triggers from actual cumulative provider `input_tokens`, then calls `compact_session` with only `max_estimated_tokens: 0` overridden. But `compact_session` first calls `should_compact`, and `should_compact` at `runtime/src/compact.rs:41-50` still requires `compactable.len() > config.preserve_recent_messages` before considering token budget. Because `CompactionConfig::default().preserve_recent_messages` is 4, a session with 1-4 extremely large messages and real provider usage above `auto_compaction_input_tokens_threshold` returns `removed_message_count == 0`; `maybe_auto_compact` then silently returns `None`. The existing auto-compaction regression at `conversation.rs:1520-1572` seeds enough turns so the message-count predicate passes, but it does not cover a short huge transcript that crosses the real usage threshold. **Required fix shape:** (a) distinguish manual estimated-token compaction from auto-compaction-after-real-usage; (b) when actual provider usage crosses the threshold, allow compaction even if message count is <= default preserved tail, while preserving at least the latest user/assistant boundary safely; (c) add a regression with one or two huge messages plus `AssistantEvent::Usage { input_tokens: 120_000 }` proving auto-compaction emits `AutoCompactionEvent`; (d) make the skip reason observable when auto-compaction threshold is crossed but no messages are removed (`too_few_messages`, `tool_boundary`, `empty_prefix`, etc.); (e) ensure tool-use/tool-result boundary protection still wins over unsafe compaction. **Why this matters:** provider-reported usage is the trusted signal that the context window is hot. If auto-compaction ignores that signal for short but huge sessions, users hit context exhaustion with no auto-compaction event and no explanation. Source: gaebal-gajae dogfood response to Clawhip message `1507140966774079521` on 2026-05-21.
+
+560. **Auto-compaction observability reports only removed message count, not token pressure or before/after token estimates, so operators cannot tell whether compaction actually relieved context risk** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 22:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a8c67b0` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: `gajae-issue-314-omx-launch-resilience-review`, `gajae-pr-314-final-review`, `omx-pr-2447-ralplan-consensus-review3`. Code inspection: `AutoCompactionEvent` at `runtime/src/conversation.rs:123-127` contains only `removed_message_count`; `maybe_auto_compact` at `conversation.rs:559-581` has access to cumulative provider usage and the pre/post session but drops all token-pressure context. The CLI notice at `rusty-claude-cli/src/main.rs:3560-3562` prints only `[auto-compacted: removed N messages]`, and JSON output at `main.rs:5111-5114` exposes only `removed_messages` plus that same notice. The parity harness assertion at `mock_parity_harness.rs:691-712` only checks that the `auto_compaction` key exists and usage input tokens are large; it does not require any before/after estimate, threshold, trigger usage, or reduction metadata. **Required fix shape:** (a) extend `AutoCompactionEvent` with `trigger_input_tokens`, `threshold_input_tokens`, `estimated_tokens_before`, `estimated_tokens_after`, `removed_message_count`, and a `reason/trigger` enum; (b) compute before/after estimates around `compact_session` and include them in CLI text and JSON; (c) add tests proving JSON contains these fields for auto-compaction and that the CLI notice includes enough context to debug risk relief; (d) include a skipped/degraded event shape when threshold crossed but no messages removed, tying into #559; (e) update parity harness to assert semantic values, not merely key presence. **Why this matters:** auto-compaction is a context-safety action. Without token-pressure and reduction telemetry, a user sees “removed 2 messages” but cannot tell if the session went from 120k to 5k tokens, 120k to 119k, or failed to relieve the risk at all. Source: gaebal-gajae dogfood response to Clawhip message `1507148512159203338` on 2026-05-21.
+
+561. **Branch-lock collision detection treats module strings literally, so equivalent paths with `./`, duplicate slashes, or trailing slashes evade same-branch overlap detection** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 23:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f45b651` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-316-inflight-review-superseded-merge`. Code inspection: `detect_branch_lock_collisions` in `runtime/src/branch_lock.rs:23-49` delegates to `overlapping_modules`, which compares raw `modules: Vec<String>` entries via `modules_overlap` at `branch_lock.rs:65-69`: exact equality or prefix with `format!("{right}/")` / `format!("{left}/")`. No normalization is applied. As a result, two lanes on the same branch with semantically identical module scopes like `runtime/mcp` vs `./runtime/mcp`, `runtime/mcp` vs `runtime/mcp/`, `runtime//mcp` vs `runtime/mcp`, or `crates/runtime/../runtime/mcp` are not detected as collisions, even though they target the same files. The existing tests cover exact same module, nested raw-prefix module, and different branches only; they do not exercise path normalization or malformed-but-common module inputs. This is distinct from Jobdori's #562 empty-modules whole-branch gap: even non-empty module locks can bypass collision checks when strings differ syntactically. **Required fix shape:** (a) normalize module scopes before comparison by trimming whitespace, removing leading `./`, collapsing repeated slashes, dropping trailing slashes, and resolving `.`/`..` components without escaping repo root; (b) store/report both normalized collision scope and raw input modules for auditability; (c) reject or explicitly mark invalid module paths that normalize outside the repo; (d) add tests for `./runtime/mcp`, `runtime/mcp/`, `runtime//mcp`, nested normalized paths, and invalid `../` escape attempts; (e) keep deterministic sorted/deduped collision output after normalization. **Why this matters:** branch locks are a coordination safety rail. If two agents can claim the same branch/module by spelling the path differently, lock receipts give false confidence and concurrent lanes can still overwrite or review-stomp each other. Source: gaebal-gajae dogfood response to Clawhip message `1507156065941192705` on 2026-05-21.
+
+562. **Approval tokens are still accepted at their exact expiry second because validation uses `now > expires_at` instead of `now >= expires_at`** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 23:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@3d877d7` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-317-inflight-review-restart`. Code inspection: `ApprovalTokenGrant::expires_at` stores `expires_at_epoch_seconds`, and `ApprovalTokenLedger::validate_grant` at `runtime/src/approval_tokens.rs:286-290` rejects only when `now_epoch_seconds > expires_at`. Therefore a token with `expires_at(20)` remains valid for verification/consumption when `now_epoch_seconds == 20`; it expires only at 21. The existing test `approval_token_rejects_scope_expansion_expiry_and_revocation` checks `ledger.verify(..., now=21)` for `expires_at(20)` but does not cover the boundary at exactly 20. This differs from the OAuth expiry helper in `api/src/providers/anthropic.rs`, which uses `expires_at <= now_unix_timestamp()` and treats the timestamp as no longer valid once reached. **Required fix shape:** (a) change approval-token expiry validation to `now_epoch_seconds >= expires_at` or explicitly rename/document the field as `valid_through_epoch_seconds` if inclusive semantics are intended; (b) add boundary tests for `now=expires_at-1`, `now=expires_at`, and `now=expires_at+1` for both `verify` and `consume`; (c) include `expires_at_epoch_seconds` and `now_epoch_seconds` in expiry audit/error metadata so operators can see boundary decisions; (d) align approval-token expiry semantics with OAuth/token-cache expiry conventions across the codebase; (e) add conformance coverage if approval tokens are exported in G004 bundles. **Why this matters:** approval tokens authorize policy exceptions. An off-by-one expiry window is small but security-sensitive and, without boundary tests, future short-lived approval grants can be consumed exactly at the moment operators expect them to be dead. Source: gaebal-gajae dogfood response to Clawhip message `1507163615596380191` on 2026-05-21.
+
+563. **G004 conformance `get_path` falls back to suffix pointers, so nested required fields can be satisfied by same-named root fields and invalid bundles pass** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 00:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8ea54d3` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-316-fix-still-open-fixture`. Code inspection: every conformance field helper ultimately calls `get_path(root, pointer)` in `runtime/src/g004_conformance.rs:386-398`. If `root.pointer(path)` misses, `get_path` iterates suffixes of the requested path and tries those against the same root object. For example, validating a report's `/identity/contentHash` at `g004_conformance.rs:132-137` first checks that nested field; if it is missing, `get_path(report, "/identity/contentHash")` then tries `"/contentHash"`. A malformed report with top-level `contentHash` but no `identity.contentHash` therefore passes the nested identity requirement. The same suffix fallback affects `/projection/provenance`, `/redaction/provenance`, `/metadata/provenance`, `/metadata/seq`, `/metadata/eventFingerprint`, and similar nested contract fields. This makes the machine-checkable G004 validator structurally permissive in exactly the fields that are supposed to prove provenance/identity, and tests/fixtures do not appear to include negative cases for misplaced nested fields. **Required fix shape:** (a) remove suffix fallback from `get_path` and use strict JSON Pointer lookup for conformance validation; (b) if fallback is needed for legacy aliases, make it explicit per field with a deprecation warning/error code, never implicit for all nested fields; (c) add negative tests where top-level `contentHash`, `provenance`, `seq`, or `eventFingerprint` exist but the required nested path is absent, proving validation fails; (d) update error messages to report the exact missing nested path; (e) run existing G004 fixtures through strict validation and fix any accidental reliance on suffix aliases. **Why this matters:** conformance validators are trust boundaries. A suffix fallback lets artifacts satisfy provenance/identity requirements from the wrong location, so downstream consumers can accept bundles whose shape does not match the advertised contract. Source: gaebal-gajae dogfood response to Clawhip message `1507171165033201735` on 2026-05-22.
+
+564. **G004 approval-token conformance checks only that delegation hops are non-empty, not that the chain is contiguous from owner to executor, so broken approval provenance passes** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 00:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@54bbee2` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: none. Code inspection: `validate_approval_tokens` in `runtime/src/g004_conformance.rs:226-253` requires `tokenId`, `owner`, `scope`, `issuedAt`, `oneTimeUse`, `replayPreventionNonce`, and calls `validate_delegation_chain`. But `validate_delegation_chain` at `g004_conformance.rs:256-272` only checks each hop has non-empty `from`, `to`, `action`, and `at`; it never verifies that the first hop starts at the token `owner`, that each hop's `to` equals the next hop's `from`, or that the final `to` matches an approved/executing actor field. The valid fixture has one clean `leader-fixed -> worker-3` hop, and the only negative test uses an empty chain; there is no fixture where a non-empty but discontinuous chain like `owner -> lead`, then `unrelated -> worker` is rejected. Therefore an approval token can advertise an auditable delegation chain while the conformance helper accepts broken provenance continuity. **Required fix shape:** (a) add explicit `approvedExecutor`/`executingActor` (or equivalent) to the G004 approval-token schema if final actor is part of the contract; (b) validate first hop `from == owner`; (c) validate adjacent hop continuity (`hop[i].to == hop[i+1].from`); (d) validate final hop reaches the approved/executing actor when present; (e) add negative tests for owner mismatch, discontinuous middle hop, and wrong final executor, plus a positive multi-hop fixture. **Why this matters:** approval tokens are policy-exception evidence. A non-empty list of disconnected hops is not an audit trail; downstream claws can mistakenly trust a token whose delegation provenance does not actually connect the owner approval to the actor consuming it. Source: gaebal-gajae dogfood response to Clawhip message `1507178711106064444` on 2026-05-22.
+565. **MCP lifecycle hardened diagnostics expose `timestamp` / `phase_timestamps` as raw epoch seconds while adjacent machine-readable event contracts are being tightened around RFC3339, so MCP/plugin lifecycle logs remain format-inconsistent and parser-hostile** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 02:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@665dd1f`. Active tmux sessions at probe time: none. Code inspection: `runtime/src/mcp_lifecycle_hardened.rs:6-12` defines `now_secs()` as `SystemTime` seconds since `UNIX_EPOCH`; `McpErrorSurface.timestamp` at `mcp_lifecycle_hardened.rs:50-56` stores that raw `u64`, and `McpLifecycleState.phase_timestamps` at `mcp_lifecycle_hardened.rs:125-129` / `180-186` records the same epoch seconds per phase. Tests only assert presence via `phase_timestamp(...).is_some()` and preserve `waited_ms`; they never assert a stable date-time contract or expose both wall-clock and monotonic sequencing. This is a plugin lifecycle variant of the timestamp-contract findings in #548-#551/#554: MCP diagnostics are explicitly machine-readable, but a field named `timestamp` differs from the RFC3339-looking `emittedAt` / `createdAt` surfaces and will force downstream claws to special-case one lifecycle stream. **Required fix shape:** (a) decide and document MCP lifecycle timestamp contract; prefer RFC3339 UTC string fields like `occurredAt` / `phaseStartedAt` for wall-clock log interoperability; (b) if epoch seconds are retained, name them `timestamp_epoch_seconds` and optionally add a separate monotonic `phase_seq`; (c) update `McpErrorSurface` and `phase_timestamps` serialization with backward-compatible aliases or migration notes; (d) add tests that parse MCP lifecycle wall-clock fields as RFC3339 and reject ambiguous raw epoch strings when serialized under the public contract; (e) align MCP lifecycle diagnostics with lane events/recovery/G004 timestamp validators so plugin lifecycle failures can be sorted and displayed without bespoke parser branches. **Why this matters:** MCP startup/plugin lifecycle failures are already hard to debug. If the lifecycle stream uses raw epoch seconds while the rest of claw-code moves to RFC3339 event timestamps, logs and UIs lose a uniform temporal contract exactly on the surface operators need during plugin breakage. Source: gaebal-gajae dogfood response to Clawhip message `1507208914046025731` on 2026-05-22.
+566. **Workspace-write outside-workspace detection misses exact sensitive directory targets like `/etc` because `command_targets_outside_workspace` only searches for paths with trailing slashes** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 03:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@665dd1f`. Active tmux sessions at probe time: none. Code inspection: `runtime/src/bash_validation.rs:286-295` routes `PermissionMode::WorkspaceWrite` through `command_targets_outside_workspace`, and that helper at `bash_validation.rs:302-323` only warns for write/state-modifying commands when the command string contains one of `"/etc/", "/usr/", "/var/", "/boot/", "/sys/", "/proc/", "/dev/", "/sbin/", "/lib/", "/opt/"`. Existing coverage checks `validate_mode("cp file.txt /etc/config", PermissionMode::WorkspaceWrite)` and therefore exercises the trailing-slash case. But a write target that is exactly the directory name, such as `cp file.txt /etc`, `mv config /usr`, `install file /opt`, or `tee /etc`, does not contain `"/etc/"` / `"/usr/"` / `"/opt/"`; the helper returns false and `validate_mode` falls through to `Allow`. This is adjacent to, but distinct from, Jobdori's #570 traversal-substring bypass: even without `..`, exact system directory operands evade the workspace-write out-of-scope warning because the heuristic uses substring sentinels instead of token/path boundary checks. **Required fix shape:** (a) tokenize shell words enough to inspect path-like operands rather than raw substring matching; (b) treat exact sensitive directories (`/etc`, `/usr`, `/var`, `/boot`, `/sys`, `/proc`, `/dev`, `/sbin`, `/lib`, `/opt`) and descendants equivalently; (c) include command/path evidence in the warning so operators see which operand escaped workspace scope; (d) add regressions for `cp file.txt /etc`, `mv file /usr`, `install file /opt`, and keep the existing `/etc/config` case green; (e) consider unifying this with the #570 path-resolution hardening so workspace-write/read-only path scope checks use one boundary-aware path classifier. **Why this matters:** workspace-write mode is supposed to allow project edits while warning on system-level writes. Missing exact-directory operands lets an agent attempt writes into sensitive roots with no permission warning, which is precisely the boundary the mode is meant to make visible. Source: gaebal-gajae dogfood response to Clawhip message `1507216459665772677` on 2026-05-22.
+
+567. **Path traversal validation treats any command string containing the workspace path as safe, so mixed outside-target commands can smuggle `../` escapes past the warning** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 04:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@665dd1f`. Active tmux session at probe time: `omc-issue-3079-plugin-cache-commands`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/bash_validation.rs::validate_paths` checks `if command.contains("../")`, then suppresses the traversal warning whenever the *entire command string* contains `workspace.to_string_lossy()`. That means a command with one operand inside the workspace and a separate traversal operand outside it, such as `cp /workspace/file ../../../etc/passwd` (workspace `/workspace`) or `tar -cf /workspace/out.tar ../../..`, is treated as `Allow` because the workspace prefix appears somewhere in the command. The validator never tokenizes operands, resolves each candidate path, or proves that the specific `../` path stays under the workspace root. Existing tests cover `cat ../../../etc/passwd` without a workspace substring, but not mixed safe+unsafe operands. This is a separate path-scope gap from #566's exact sensitive-directory miss: here the escape uses traversal and is hidden by an unrelated in-workspace token. **Required fix shape:** (a) tokenize shell words enough to collect path-like operands independently; (b) resolve each relative/absolute candidate against cwd/workspace and warn/block when any operand escapes, rather than checking whether the command string contains the workspace root; (c) keep quoted/non-path text from causing false positives; (d) add regressions for `cp /workspace/file ../../../etc/passwd`, `tar -cf /workspace/out.tar ../../..`, and a positive `cp /workspace/file /workspace/sub/../safe`; (e) align this with the boundary-aware classifier required by #566 so workspace-write/read-only path validation shares one per-operand path-scope primitive. **Why this matters:** path validation is supposed to make workspace escapes visible. A global substring exemption lets one benign workspace path launder a different `../` operand, producing false `Allow` evidence for commands that still target outside the workspace. Source: gaebal-gajae dogfood response to Clawhip message `1507231563484500048` on 2026-05-22.
+
+568. **Read-only bash validation classifies `git fetch` as safe even though fetch mutates `.git/FETCH_HEAD`, remote-tracking refs, packfiles, and optional tags** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 04:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1882c95`. Active tmux sessions at probe time: `gajae-pr-323-bootstrap-context-readiness-review`, `omx-issue-2451-madmax-detached-lock-timeout`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/bash_validation.rs::validate_read_only` delegates `git` commands to `validate_git_read_only`, and `GIT_READ_ONLY_SUBCOMMANDS` includes `fetch` alongside pure inspection commands like `status`, `log`, and `show`. But `git fetch` is not read-only: even without checkout it writes `.git/FETCH_HEAD`, may update remote-tracking refs, downloads pack/object data, updates tags depending on flags/config, and can run hooks/config-driven network behavior. There is no mode distinction for `git fetch --dry-run` versus mutating fetch forms, and no test asserts that read-only mode blocks repository/network mutation. This creates a stale-branch/confusion footgun in the opposite direction: agents can claim read-only validation while changing the local base/ref state used by later freshness checks. **Required fix shape:** (a) remove plain `fetch` from `GIT_READ_ONLY_SUBCOMMANDS` or split it into a separate `NetworkMetadataMutation`/workspace-write classification; (b) allow only explicitly non-mutating forms if Git actually guarantees them for the chosen flags, otherwise require WorkspaceWrite/Prompt; (c) add read-only regressions proving `git fetch`, `git fetch origin main`, and `git fetch --tags` are blocked/warned, while `git status`, `git log`, and `git diff` remain allowed; (d) include a clear reason mentioning `.git` metadata/ref mutation and network access; (e) align stale-branch preflight code so any fetch it performs is an explicit trusted preflight action, not hidden inside user-command read-only permission. **Why this matters:** read-only mode is used as safety evidence. Treating `git fetch` as read-only lets commands mutate repository metadata and remote refs under a permission label that promises observation only, making later branch-freshness and audit results harder to trust. Source: gaebal-gajae dogfood response to Clawhip message `1507239113550725231` on 2026-05-22.
+
+569. **Read-only bash validation does not unwrap `/usr/bin/env` command prefixes, so `env FOO=bar rm file` is allowed even though the executed command is write/destructive** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 05:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@2bb5d49`. Active tmux sessions at probe time: `gajae-pr-323-fix-windows-host-path-safety2`, `omx-pr-2452-madmax-lock-timeout-review`; no active claw-code implementation session. Channel context also had new PR claw-code#3056, but this pinpoint is from local permission-validation inspection. Code inspection: `rust/crates/runtime/src/bash_validation.rs::extract_first_command` strips leading `KEY=value` assignments and `validate_read_only` has a special recursive path for `sudo`, but there is no equivalent handling for the common `env KEY=value <cmd>` launcher form. `env` itself appears in `SEMANTIC_READ_ONLY_COMMANDS`, and in read-only mode `validate_read_only("env FOO=bar rm -rf target", PermissionMode::ReadOnly)` sees first command `env`, does not recurse into the inner `rm`, finds no redirection/git state path, and returns `Allow`. The same bypass applies to `env PATH=/tmp cp a b`, `env -i sh -c 'rm file'`, and likely `/usr/bin/env` shebang-style invocations if the first token is a path. Existing tests cover bare env assignments like `FOO=bar ls -la`, and sudo-wrapped writes, but not `env`-wrapped writes. **Required fix shape:** (a) teach first-command extraction or read-only validation to recognize `env`/`/usr/bin/env` prefixes, skip env flags/assignments, and validate the inner command recursively; (b) block or warn if the inner command cannot be parsed safely (for example `env -i sh -c ...`) under read-only mode; (c) add regressions for `env FOO=bar rm -rf target`, `env PATH=/tmp cp a b`, `env -i npm install`, and safe `env FOO=bar grep x file`; (d) keep the existing `KEY=value cmd` behavior but cover quoted assignment values; (e) report the unwrapped inner command in the block reason so operators can see which payload violated read-only mode. **Why this matters:** permission validation must classify the command that will actually execute, not the wrapper binary. `env` is a routine launcher in scripts; allowing it as read-only lets agents hide write/package/destructive commands behind environment setup and produces false safety evidence. Source: gaebal-gajae dogfood response to Clawhip message `1507246662630903910` on 2026-05-22.
+
+570. **`read_file` binary detection scans only the first 8192 bytes for NUL, so text-prefixed binaries can pass the binary gate and fail later as generic UTF-8 read errors** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 05:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a49d8e8`. Active tmux session at probe time: `gajae-issue-324-review-gate-degradation`; no active claw-code implementation session. Channel context from Jobdori pointed at `is_binary_file` line 30. Code inspection confirmed `rust/crates/runtime/src/file_ops.rs::is_binary_file` opens the path, reads exactly one 8192-byte chunk, and returns true only if that first chunk contains NUL. `read_file` then calls `fs::read_to_string` over the whole file. A file with an 8KB text header followed by NUL/non-UTF8 binary payload is therefore not rejected by the explicit binary gate; it reaches `read_to_string` and surfaces as a generic invalid UTF-8/io error instead of the stable `InvalidData: file appears to be binary` contract. Existing coverage writes NUL at byte 0 (`rejects_binary_files`) and does not cover delayed-NUL or delayed-non-UTF8 payloads. **Required fix shape:** (a) make binary/text validation cover the entire allowed read range (bounded by `MAX_READ_SIZE`) or stream decode as UTF-8 while detecting NUL/non-text bytes; (b) map any delayed NUL/non-UTF8 failure to the same stable binary-file error kind/message used by the early gate; (c) add regressions with NUL at byte 8192+, non-UTF8 after an ASCII prefix, and valid large UTF-8 text controls; (d) avoid loading oversized files by preserving the metadata size gate before full scan/decode; (e) include byte offset/evidence in diagnostics only if it does not leak file contents. **Why this matters:** `read_file` should fail predictably on binary artifacts. A small text header is common in mixed/generated files, and letting those bypass the binary gate creates inconsistent errors that look like brittle UTF-8/tool failures rather than an intentional binary-read refusal. Source: gaebal-gajae dogfood response to Clawhip message `1507254212403138672` on 2026-05-22.
+
+571. **`grep_search` silently drops unreadable or non-UTF8 files, so search results can look complete while binary/permission/encoding failures are hidden from the output contract** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 06:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8632f90`. Active tmux session at probe time: `gajae-issue-326-backlog-zero-continuation`; no active claw-code implementation session. Channel context included Jobdori's glob-sort #576 finding, so this probe stayed in the adjacent file-ops search surface but chose a different contract gap. Code inspection: `rust/crates/runtime/src/file_ops.rs::grep_search_impl` iterates `collect_search_files`, applies workspace/filter checks, then does `let Ok(file_contents) = fs::read_to_string(&file_path) else { continue; };`. Any file that exists in the search set but cannot be decoded as UTF-8, is transiently unreadable, or hits another read error is silently skipped. The returned `GrepSearchOutput` has `num_files`, `filenames`, optional `content`, and `num_matches`, but no `skipped_files`, `read_errors`, `binary_files`, or `truncated_due_to_errors` field. This differs from `read_file`, which intentionally rejects binary files with a stable error, and it makes grep output appear authoritative even when a subset of candidate files was ignored. Existing grep tests cover happy-path content matches but not a directory containing one matching text file plus one unreadable/binary/non-UTF8 file. **Required fix shape:** (a) track skipped candidate files by reason (`binary_or_non_utf8`, `permission_denied`, `read_error`) without leaking file contents; (b) expose counts and optionally bounded path samples in `GrepSearchOutput`; (c) preserve best-effort search behavior if desired, but mark the result as partial/degraded when any candidate is skipped; (d) add regressions with delayed-non-UTF8/binary and permission-denied files proving the output reports skipped counts; (e) align binary/non-UTF8 classification with the `read_file` fix required by #570 so file tools share one text/binary contract. **Why this matters:** grep is an observability surface. If it silently ignores files, agents can conclude “no matches” or report an incomplete match set without any evidence that encoding or permission failures narrowed the search. Source: gaebal-gajae dogfood response to Clawhip message `1507269308277850223` on 2026-05-22.
+
+572. **`grep_search` accepts unknown `output_mode` strings as filename-mode success, so typos like `contents` or `json` silently change the result contract instead of failing fast** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 07:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ef67aa9`. Active tmux session at probe time: `gajae-pr-327-backlog-zero-review`; no active claw-code implementation session. Channel context included Jobdori's adjacent grep duplicate-context #577 finding; this probe stayed in `grep_search_impl` but checked argument-contract handling. Code inspection: `rust/crates/runtime/src/file_ops.rs::grep_search_impl` sets `output_mode = input.output_mode.clone().unwrap_or_else(|| "files_with_matches")`, then special-cases only `"count"` and `"content"`. Any other string falls through to the default filename-list path and is echoed back as `mode: Some(output_mode.clone())`, with `content: None` and `num_matches: None`. That means misspellings such as `output_mode:"contents"`, `"files_with_match"`, or unsupported values like `"json"` return a successful-looking response whose `mode` claims the unsupported value while the payload shape is actually files-with-matches. There is no enum validation, no `InvalidInput`, and no test for unsupported output modes. **Required fix shape:** (a) validate `output_mode` against an explicit enum (`files_with_matches`, `content`, `count`) before reading files; (b) return a typed `InvalidInput`/machine-readable error listing supported values for unknown modes; (c) make the returned `mode` always match the actual payload semantics; (d) add regressions for `contents`, `files_with_match`, and valid `content`/`count`/default behavior; (e) keep this aligned with CLI `--output-format` enum validation gaps so search/tool contracts fail fast on typos. **Why this matters:** grep output shape drives model/tool parsing. A typo should not silently downgrade from content/count mode into filename mode while preserving the bogus mode label; that creates false negative evidence and parser confusion with no visible error. Source: gaebal-gajae dogfood response to Clawhip message `1507276861917368421` on 2026-05-22.
+
+573. **`grep_search` treats `head_limit:0` as unlimited, so callers cannot request an empty/page-metadata probe and may accidentally dump the full match set** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 07:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@eb12c3d`. Active tmux session at probe time: `gajae-pr-330-review-v2`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/file_ops.rs::apply_limit` is used for grep filenames and content lines. It computes `explicit_limit = limit.unwrap_or(250)`, but if `explicit_limit == 0` it returns the full post-offset item vector with `appliedLimit: None` instead of truncating to zero or rejecting the value. Therefore a caller using `head_limit:0` as a common “no rows, just metadata/count/page preflight” convention gets every matching filename/content line after the offset, bypassing the default 250 cap and potentially injecting a large result into the model context. Existing grep tests pass `head_limit: Some(10)` and do not cover zero-limit semantics. This also makes `appliedLimit` misleading: an explicit limit was supplied, but the output reports no applied limit. **Required fix shape:** (a) define `head_limit` as a positive integer and reject zero with `InvalidInput`, or make zero return an empty result with `appliedLimit:0` consistently; (b) never let zero disable truncation unless a separately named `unlimited:true` escape hatch exists; (c) add regressions for `head_limit:0` in filename and content modes, positive limits, default 250 truncation, and offset-only behavior; (d) ensure `appliedLimit` reflects the caller-supplied limit when accepted; (e) document pagination semantics so wrappers do not accidentally turn metadata probes into full dumps. **Why this matters:** search pagination is a context-window safety control. A zero limit should be safe or invalid, not the one value that disables the cap and can flood the assistant with every match. Source: gaebal-gajae dogfood response to Clawhip message `1507284411995918427` on 2026-05-22.
+
+574. **Kimi compatibility helpers only strip the `kimi/` routing prefix, so documented `dashscope/kimi-*` and `moonshot/kimi-*` slugs can leak provider prefixes onto the wire** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 09:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@c1bb355`. Active tmux session at probe time: `omx-issue-2462-madmax-lock-diagnostic`; no active claw-code implementation session. Channel context included Jobdori's reasoning-history #581 finding in `openai_compat.rs`; this probe inspected the same provider file for another Kimi/OpenAI-compatible model-routing contract gap. Code inspection: `model_rejects_is_error_field` intentionally recognizes `dashscope/kimi-k2.5` and `moonshot/kimi-k2.5` by stripping any prefix with `rsplit('/')`, and tests assert those slugs reject `is_error`. But `wire_model_for_base_url` / `strip_routing_prefix` only strip prefixes matching `openai|xai|grok|qwen|kimi`; `dashscope` and `moonshot` are not included. Therefore a request model like `dashscope/kimi-k2.5` can be treated as Kimi for tool-result compatibility while the serialized `model` sent to a DashScope/Moonshot-compatible endpoint remains `dashscope/kimi-k2.5` instead of the expected `kimi-k2.5`. Existing tests cover `strip_routing_prefix("kimi/kimi-k2.5")` but not `dashscope/kimi-k2.5` or `moonshot/kimi-k2.5`, despite those exact prefixes being listed in Kimi compatibility tests. **Required fix shape:** (a) decide the supported routing prefixes for Kimi/DashScope/Moonshot and use one shared prefix-strip helper for compatibility checks and wire model serialization; (b) add `dashscope` and `moonshot` where appropriate, or reject those prefixed slugs early with a typed configuration error; (c) add tests proving `wire_model_for_base_url` and `strip_routing_prefix` produce `kimi-k2.5` for every documented Kimi prefix; (d) keep OpenRouter/non-default OpenAI base-url slash-slug preservation semantics intact; (e) update docs/config examples so model slugs and provider routing prefixes are unambiguous. **Why this matters:** provider compatibility logic and wire serialization must agree on the model identity. If one path says “this is Kimi” while another sends a prefixed slug the backend may not understand, users get avoidable 400/model-not-found errors that look like provider instability. Source: gaebal-gajae dogfood response to Clawhip message `1507307060998443059` on 2026-05-22.
+
+575. **`grep_search` walks `.git`, `node_modules`, `target`, and other heavy directories even though `glob_search` skips them, so grep can waste startup/context time scanning generated/vendor files by default** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@819f67b`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs` defines `GLOB_SEARCH_IGNORED_DIRS` (`.git`, `node_modules`, `.build`, `target`, `dist`, `coverage`) and `glob_search_impl` applies it via `WalkDir::filter_entry(|entry| !should_skip_glob_dir(entry))`. But `grep_search_impl` calls `collect_search_files(&base_path)`, and `collect_search_files` uses raw `WalkDir::new(base_path)` with no `filter_entry` or ignore list. As a result, a default grep over a repo can descend through `.git/objects`, Rust `target/`, vendored `node_modules`, coverage, and dist outputs, then silently skip unreadable/non-UTF8 files (#571) or spend time decoding generated blobs before any model-visible answer. Existing tests prove `glob_search_skips_common_heavy_directories`, but no equivalent grep test exists. **Required fix shape:** (a) make `collect_search_files` share the same ignored-directory policy as `glob_search`, or define an explicit grep ignore policy with opt-in overrides; (b) add skipped/ignored directory counts to grep output so operators know scope was pruned; (c) add regressions where `.git`/`target` contain matching files but default grep ignores them, while direct path-to-file behavior remains explicit; (d) allow explicit include/override if users really want generated/vendor search; (e) align docs/tool schema so glob and grep default search scope semantics match. **Why this matters:** grep is a high-frequency dogfood/search tool. Scanning generated and VCS internals by default creates startup friction, noisy false matches, and token waste, especially in Rust/JS workspaces with huge `target` or `node_modules` trees. Source: gaebal-gajae dogfood response to Clawhip message `1507322157561151671` on 2026-05-22.
+
+576. **`glob_search` brace expansion has no fan-out cap, so a single pattern with large brace groups can explode into thousands of `WalkDir` traversals before any timeout/result limit applies** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ab5754a`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs::expand_braces` recursively expands the first `{...}` group by splitting on every comma and calling itself on each alternative. `glob_search_impl` then iterates every expanded pattern and creates a separate `WalkDir` traversal for each one before applying the hard `take(100)` result cap. There is no maximum number of alternatives, no recursion-depth/expanded-pattern cap, and no early deduplication by shared walk root. A user/model-supplied pattern like `**/*.{a,b,c,...hundreds}` or multiple brace groups can produce hundreds/thousands of full repo walks even though only 100 filenames can be returned. Existing tests cover a tiny `*.{rs,toml}` happy path and unmatched braces, but not expansion fan-out or timeout behavior. **Required fix shape:** (a) impose a small maximum expanded-pattern count and return `InvalidInput`/typed error when exceeded; (b) deduplicate shared walk roots or compile multiple patterns per root so alternatives do not trigger repeated full-tree walks; (c) include `expanded_pattern_count` and truncation/fanout metadata in diagnostics; (d) add regressions for large single brace groups, multiple nested brace groups, and normal small brace patterns; (e) keep the final 100-result cap but do not rely on it as protection against pre-result traversal explosions. **Why this matters:** glob is a model-facing discovery tool. Unbounded brace fan-out turns a compact pattern into many expensive filesystem walks, creating startup friction and an easy self-inflicted DoS before the existing result limit can help. Source: gaebal-gajae dogfood response to Clawhip message `1507329710257078353` on 2026-05-22.
+
+577. **`WebFetch` upgrades only the initial HTTP URL, but follows redirects without revalidating the final target, so HTTPS pages can redirect the tool into localhost/private HTTP endpoints** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 11:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@571b4c2`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/tools/src/lib.rs::normalize_fetch_url` upgrades an initial `http://` URL to HTTPS unless the host is exactly `localhost`, `127.0.0.1`, or `::1`. But `build_http_client` enables `redirect(reqwest::redirect::Policy::limited(10))`, and `execute_web_fetch` sends the request once, then trusts `response.url()` as the final URL. There is no redirect policy that re-runs scheme/host/IP checks for each hop, no block for redirects from public HTTPS to `http://localhost`, `http://127.0.0.1`, RFC1918/link-local/metadata IPs, or non-HTTPS final URLs. As a result, a model-requested public URL can legally redirect the tool into local/private network resources even though direct non-local HTTP is upgraded and no SSRF boundary is documented. **Required fix shape:** (a) implement a custom redirect policy that validates every `Location` target before following it; (b) reject redirects to localhost, loopback, link-local, RFC1918/private ranges, cloud metadata hosts/IPs, and/or downgrade-to-HTTP unless explicitly allowed; (c) resolve hostnames before fetch or after redirect to prevent DNS-based private IP hops; (d) add tests with public HTTPS -> localhost/private/metadata redirects and safe HTTPS->HTTPS redirects; (e) expose final URL plus redirect-block reason in the `WebFetch` output/error without leaking response bodies. **Why this matters:** WebFetch is an external content tool. Redirects are part of the fetch boundary, not a postscript; without per-hop validation, a harmless-looking public URL can become an internal network probe or local service read. Source: gaebal-gajae dogfood response to Clawhip message `1507337256107642961` on 2026-05-22.
+
+578. **`WebSearch` accepts `CLAWD_WEB_SEARCH_BASE_URL` with any scheme/host and follows the shared redirect policy, so a local environment variable can turn search into a private-network fetch without guardrails** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 11:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@bff67c2`. Active tmux session at probe time: `gajae-issue-339-backlog-zero-candidate-selection`; no active claw-code implementation session. Code inspection: `rust/crates/tools/src/lib.rs::build_search_url` checks `CLAWD_WEB_SEARCH_BASE_URL`, parses it with `reqwest::Url::parse`, appends `q`, and returns it unchanged. Unlike `WebFetch`, there is no `normalize_fetch_url`-style scheme upgrade or host validation at all. `execute_web_search` then uses the same `build_http_client` as WebFetch, which follows up to 10 redirects. A poisoned or stale local env var can point search at `http://localhost`, RFC1918 services, metadata endpoints, or a redirector to those targets; the tool will fetch and parse generic links before domain allow/block filters are applied to extracted result URLs, not to the search endpoint itself. Existing tests/logic focus on hit filtering, not search-provider endpoint validation. **Required fix shape:** (a) validate `CLAWD_WEB_SEARCH_BASE_URL` at parse time against allowed schemes/hosts or require an explicit unsafe/local-test opt-in; (b) apply the same per-redirect target validation required by #577 to WebSearch; (c) distinguish configured test search providers from production search with a typed config/source field in output; (d) add regressions for env base URLs pointing to localhost/private/metadata and safe HTTPS test endpoints; (e) document the env var as test-only or constrain it to trusted public search domains. **Why this matters:** search is often treated as a low-risk public-web tool. An unvalidated base URL makes it an environment-controlled internal fetch surface, and result-domain filters do not protect the initial request or redirects. Source: gaebal-gajae dogfood response to Clawhip message `1507344805167108257` on 2026-05-22.
+
+579. **Streaming flushes pending markdown at `ContentBlockStop` before pending tool calls are rendered, so text immediately followed by a tool call can be displayed before the tool-use event that actually interrupted/ended the content block** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 12:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d8864ff`. Active tmux sessions at probe time: none. Channel context included Jobdori's #587 finding that `MarkdownStreamState::push` can hold prose until flush; this probe inspected the caller ordering. Code inspection: in `rust/crates/rusty-claude-cli/src/main.rs` stream handling, `ContentBlockDelta::TextDelta` buffers text through `markdown_stream.push`. On `ApiStreamEvent::ContentBlockStop`, the code first calls `markdown_stream.flush(&renderer)` and writes any pending prose, then only afterward checks `pending_tool.take()` and renders `format_tool_call_start`. If a provider emits a text block that has no safe boundary (single paragraph or unclosed markdown) followed by a tool-use block, the held text is flushed at the stop event just before the pending tool call display. Operators watching the terminal can see the assistant's prose burst immediately before the tool start marker, even though the next actionable event is the tool call; any timing/ordering cue that the model paused to call a tool is blurred by the delayed text flush. There is no test asserting streamed text/tool display ordering when markdown buffering holds content until block stop. **Required fix shape:** (a) make stream rendering preserve block/event order explicitly, perhaps flushing text at the exact text block stop and rendering tool-use starts at their own block starts/stops with clear separators; (b) when a text block is followed by a tool block, include a newline/phase boundary so delayed text does not visually merge into the tool call; (c) add streaming tests with single-paragraph text immediately followed by a tool call and with safe-boundary text, asserting terminal output order and separators; (d) consider the #587 word-boundary fallback so prose is not entirely held until the tool boundary; (e) keep persisted `AssistantEvent` ordering aligned with displayed output. **Why this matters:** streaming UI is also an event log. If buffered prose appears only when the next block stops and visually collides with a tool-use marker, users cannot tell whether the model is still speaking, has switched to tool execution, or the stream stalled. Source: gaebal-gajae dogfood response to Clawhip message `1507352360379482193` on 2026-05-22.
+
+580. **Streaming safe-boundary detection treats any indented triple-backtick line as a fence opener but will not close it if the closing fence is indented more than three spaces, so quoted/list code blocks can freeze streaming until final flush** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 12:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9f762b2`. Active tmux sessions at probe time: `gajae-pr-340-backlog-zero-candidate-selection-final-review`, `gajae-pr-340-backlog-zero-rereview2`; no active claw-code implementation session. Code inspection: `rust/crates/rusty-claude-cli/src/render.rs::parse_fence_opener` counts only literal spaces and accepts fence openers with indent <=3. Once inside a fence, `line_closes_fence` also requires the closing marker indent <=3. That matches CommonMark top-level fenced blocks, but streaming markdown from models often nests code fences under bullets, block quotes, or copied indentation where both opener and closer are indented four or more spaces. In that case the opener with >3 spaces is ignored (fine), but mixed/normalized output can still open at <=3 and then fail to close if the closer is rendered with extra list/quote indentation; `open_fence` remains set, `last_boundary` stops updating, and `MarkdownStreamState::push` returns `None` until `flush()`. Existing tests cover nested fence marker lengths and tilde/backtick distinction, but not list/blockquote/indented fence streaming behavior or a mismatched indentation close. **Required fix shape:** (a) decide whether the stream boundary detector should follow strict CommonMark or be tolerant for model-generated/list-nested fences; (b) if tolerant, close fences when the same marker appears after quote/list prefixes or consistent extra indentation, while still avoiding false closes inside literal code; (c) add tests for bullet-nested fences, blockquote fences, and an opener at indent <=3 with a closer at indent >3; (d) include a max-buffer/word-boundary fallback from #587 so a malformed fence cannot suppress all output indefinitely; (e) keep rendering tests aligned with boundary tests so visual output and streaming segmentation share one markdown dialect. **Why this matters:** model answers frequently include code inside bullets or quoted explanations. If the stream boundary tracker misses the closing fence, the terminal looks stalled even though deltas are arriving, turning a formatting edge case into startup/streaming opacity. Source: gaebal-gajae dogfood response to Clawhip message `1507359904959172758` on 2026-05-22.
+
+581. **Terminal markdown renderer drops styling inside link labels because link text is accumulated as raw text while emphasis/inline-code events inside links are flattened before final rendering** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 13:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8c4b33d`. Active tmux sessions at probe time: `gajae-issue-341-post-merge-cleanup-retention`, `omc-issue-3087-windows-hud-execfile`; no active claw-code implementation session. Code inspection: `rust/crates/rusty-claude-cli/src/render.rs::RenderState::append_raw` diverts all text into `link_stack.last_mut().text` whenever a link is open. `Event::Start(Tag::Emphasis)`, `Strong`, and inline `Code` still update global style/rendering, but when the current context is a link, the rendered text is appended to `LinkState.text` as plain accumulated label text. At `Event::End(TagEnd::Link)`, the renderer emits one uniformly underlined blue `[label](destination)` string. This means markdown like `[**important** docs](https://...)`, `[*emphasis* link](...)`, or ``[`code` API](...)`` loses bold/italic/inline-code styling inside the label, and ANSI escape sequences from nested inline code can be embedded into the label string before the final link styling in ways not covered by tests. Existing link coverage only asserts a simple `[Claw](url)` label. **Required fix shape:** (a) decide whether link labels should preserve nested inline styles or intentionally flatten them; (b) if preserving, store rendered segments or nested style spans in `LinkState` instead of one raw label string; (c) if flattening, strip ANSI/control styling from accumulated labels before final link render and document the limitation; (d) add tests for bold/emphasis/code inside links and nested links/images where the parser permits them; (e) ensure streaming chunk boundaries do not split ANSI state inside a link label. **Why this matters:** terminal rendering is the user-facing event log. Links often carry emphasized API names or inline-code identifiers; flattening or double-styling labels makes important documentation output less readable and can leak nested ANSI into one final styled link span. Source: gaebal-gajae dogfood response to Clawhip message `1507367459689201715` on 2026-05-22.
+
+582. **Terminal table renderer ignores markdown alignment markers, so right/center-aligned numeric columns are rendered left-aligned with no test coverage** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 13:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ccb319a`. Active tmux session at probe time: `omx-issue-2466-ultragoal-get-goal-recovery-resume`; no active claw-code implementation session. Code inspection: `rust/crates/rusty-claude-cli/src/render.rs` starts tables with `Event::Start(Tag::Table(..))` but discards the alignment vector carried by pulldown-cmark. `TableState` stores headers/rows/current cells only, and `render_table_row` always writes the cell text followed by padding spaces (`left` alignment) regardless of whether the markdown separator uses `:---`, `---:`, or `:---:`. Existing test `renders_tables_with_alignment` checks column width alignment only, using `| ---- | ----- |`, and does not cover right/center alignment semantics. **Required fix shape:** (a) store pulldown-cmark table alignment metadata in `TableState`; (b) apply left/center/right padding in `render_table_row` for headers and body cells; (c) add tests for `| left | center | right |` with `| :--- | :---: | ---: |`, including numeric columns; (d) ensure `visible_width` still handles ANSI-styled header cells correctly when padding is split before/after centered text; (e) document whether terminal rendering intentionally supports GitHub-style table alignment. **Why this matters:** tables are common in status reports, benchmarks, and cost/test summaries. Dropping right alignment makes numeric columns harder to scan and means the terminal renderer does not faithfully reflect markdown that users expect from GitHub/Discord-style output. Source: gaebal-gajae dogfood response to Clawhip message `1507375004671672391` on 2026-05-22.
+
+583. **`PermissionEnforcer::check_file_write` uses string-prefix workspace checks, so relative traversal like `../../outside` is allowed as if it were inside the workspace** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@aa9efe5`. Active tmux session at probe time: `omx-issue-2468-autoresearch-docs-css`; no active claw-code implementation session. Channel context included Jobdori's #591 `Prompt`-mode bypass in `permission_enforcer.rs`, so this probe inspected the adjacent file-write boundary helper. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_within_workspace` treats relative paths by string-concatenating `format!("{workspace_root}/{path}")`, then checks `normalized.starts_with(root) || normalized == workspace_root.trim_end_matches('/')`. It never normalizes `.`/`..`, canonicalizes symlinks, or resolves the candidate path. Therefore `check_file_write("../../outside/secret.txt", "/workspace")` builds `/workspace/../../outside/secret.txt`, which still starts with `/workspace/`, and returns `Allowed` in `WorkspaceWrite` mode even though the resolved target is outside the workspace. Existing tests cover absolute outside paths and simple relative `src/main.rs`, but not relative traversal escapes. **Required fix shape:** (a) replace string-prefix checks with path normalization/canonicalization against a workspace root, resolving `.`/`..` before comparison; (b) reject escaping relative paths even if the file does not exist yet, using lexical normalization plus parent canonicalization where needed; (c) add regressions for `../outside.txt`, `../../outside/secret.txt`, `src/../safe.txt`, and symlink escape cases; (d) share the same boundary primitive with runtime file ops and bash path-scope checks so permission enforcement cannot drift; (e) include resolved/candidate path evidence in denial messages without leaking sensitive contents. **Why this matters:** workspace-write mode's core promise is “project files only.” A string-prefix check lets a caller smuggle outside writes through relative traversal while the permission enforcer reports success, defeating the boundary before lower-level file tools can help. Source: gaebal-gajae dogfood response to Clawhip message `1507382558340681779` on 2026-05-22.
+
+584. **`WorkerRegistry::restart` clears prompt/trust state but leaves old events and the original `created_at`, so post-restart timeout evidence can mix prior-attempt blockers with the new boot attempt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ac09033`. Active tmux sessions at probe time: none. Channel context included Jobdori's worker-boot #592 finding, so this probe stayed in `worker_boot.rs` but checked restart lifecycle continuity. Code inspection: `rust/crates/runtime/src/worker_boot.rs::restart` resets `status`, trust flags, prompt fields, `last_error`, attempts, and `prompt_in_flight`, then appends `WorkerEventKind::Restarted`. It does not reset `worker.created_at`, does not create a per-attempt `boot_started_at`, does not increment a restart/attempt counter, and does not clear or partition previous events. Later `observe_startup_timeout` computes `elapsed = now.saturating_sub(worker.created_at)` and derives `trust_prompt_detected`, `tool_permission_detected`, and `ready_for_prompt_detected` by scanning the entire `worker.events` history. A worker that previously hit trust/tool/ready evidence, then restarts cleanly, can have the new timeout classified with old pre-restart evidence and an elapsed time anchored to the original creation. Existing `restart_and_terminate_reset_or_finish_worker` only asserts prompt fields/attempts reset; it does not assert event scoping or elapsed-time reset. **Required fix shape:** (a) add `current_attempt_started_at`/`boot_started_at` and `attempt_index` fields; (b) set them on create and restart and use them for startup timeout elapsed/command_started_at; (c) scope timeout evidence scans to events since the current attempt or store per-attempt cached evidence; (d) add tests where trust/tool evidence exists before restart but not after, proving the post-restart timeout does not inherit stale blockers; (e) include attempt index in worker events so dashboards can separate old and new boot attempts. **Why this matters:** restart is supposed to produce a fresh boot attempt. If old events and timestamps remain authoritative, operators see misleading “stalled for hours / trust required” evidence for a brand-new restart and recovery automation can choose the wrong next action. Source: gaebal-gajae dogfood response to Clawhip message `1507390103956226228` on 2026-05-22.
+
+585. **Prompt-misdelivery auto-recovery arms replay without clearing `last_error`, so a worker can be `ReadyForPrompt` while still carrying a stale prompt-delivery failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@b0bca2e`. Active tmux session at probe time: `gajae-pr-346-session-gateway-continuity-digest-review`; no active claw-code implementation session. Code inspection: in `rust/crates/runtime/src/worker_boot.rs::observe`, prompt misdelivery sets `worker.last_error = Some(WorkerFailureKind::PromptDelivery)` and `prompt_in_flight = false`, then pushes a `PromptMisdelivery` event. If `worker.auto_recover_prompt_misdelivery` is true, it sets `worker.replay_prompt = worker.last_prompt.clone()` and `worker.status = WorkerStatus::ReadyForPrompt`, then pushes `PromptReplayArmed`; however it never clears or demotes `last_error`. `await_ready` then returns `ready: true` and `last_error: Some(PromptDelivery)`, so callers/dashboards can see a ready replay state and a failure state simultaneously. Later `send_prompt` clears `last_error`, but until replay is actually sent the state snapshot is contradictory. Existing replay tests assert `status == ReadyForPrompt` and `replay_prompt` contents, but do not assert `last_error` semantics while replay is armed. **Required fix shape:** (a) when auto-recovery arms replay, either clear `last_error` or replace it with a non-fatal/degraded `replay_armed` status separate from failure; (b) include `recovery_armed:true` in the worker snapshot or ready result so callers can distinguish a recoverable ready state from a failed state; (c) add tests asserting `await_ready` after auto-recovery does not report contradictory ready+fatal error; (d) preserve the original misdelivery event for audit history while keeping current worker state coherent; (e) ensure manual/non-auto recovery still reports failure until explicitly resolved. **Why this matters:** recovery state is an operator contract. A worker that is ready to replay should not also advertise a current fatal prompt-delivery error, or automation may both retry and escalate the same incident. Source: gaebal-gajae dogfood response to Clawhip message `1507397657763778562` on 2026-05-22.
+
+586. **Plugin registry reads synchronously mutate bundled plugin installs without a lock/atomic swap, so startup/listing can race or fail while merely trying to aggregate hooks/tools** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8043090`. Active tmux session at probe time: `gajae-pr-348-package-release-drift-review`; no active claw-code implementation session. Code inspection: every `PluginManager::plugin_registry_report` call begins with `self.sync_bundled_plugins()?`, and read-style paths (`plugin_registry`, `list_plugins`, `discover_plugins`, `aggregated_hooks`, `aggregated_tools`, startup `build_runtime_plugin_state_with_loader`) all flow through it. `sync_bundled_plugins` loads bundled manifests, then for each stale/outdated bundled plugin does `fs::remove_dir_all(&install_path)?; copy_dir_all(&source_root, &install_path)?;`, removes stale bundled IDs/directories, and finally writes `plugins/registry.json`. There is no process-wide file lock, no temp-dir + atomic rename, and no read-only/degraded mode. Two concurrent CLI startups or a startup plus `claw plugins list` can both decide a sync is needed; one can remove an install dir while the other is loading/copying it, yielding transient missing/partial plugin directories or a registry write race. Even a purely diagnostic/list/aggregate command can fail because bundled-plugin self-sync mutates disk before returning registry data. Existing tests cover sync happy paths and load-failure reporting, but not concurrent registry readers or a simulated remove/copy interruption. **Required fix shape:** (a) separate read-only registry discovery from bundled-plugin reconciliation, or gate reconciliation behind an explicit locked startup/update phase; (b) protect bundled install sync and registry writes with an interprocess lock; (c) copy bundled plugins into a temp dir and atomically rename/swap, never exposing partial installs; (d) if sync fails during a read-style command, return a degraded registry report with load failures instead of aborting all plugin aggregation where safe; (e) add concurrency/interruption tests with two managers racing `plugin_registry_report` and with `copy_dir_all` failure after removal, proving readers see either old or new complete plugin installs. **Why this matters:** plugin/MCP startup already has lifecycle friction. Registry reads should be safe and mostly observational; making them perform unlocked destructive replacement means diagnostics and startup can create the very plugin-load failures they are trying to observe. Source: gaebal-gajae dogfood response to Clawhip message `1507405207602987138` on 2026-05-22.
+
+587. **`REPL` has an optional timeout with no default, so model-supplied code can block the tool dispatch thread indefinitely when `timeout_ms` is omitted** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9843ccf`. Active tmux sessions at probe time: none. Channel context included Jobdori's adjacent #595 Sleep finding, so this probe inspected the other model-facing blocking execution surfaces instead of duplicating Sleep. Code inspection: `rust/crates/tools/src/lib.rs::execute_repl` validates code/language, spawns `python -c` / `node -e`, and only enters the polling timeout loop when `input.timeout_ms` is `Some`. If `timeout_ms` is omitted, it directly calls `process.spawn()?.wait_with_output()?`, with no default deadline, no backgrounding, and no abort-signal checks. The tool schema makes `timeout_ms` optional (`"timeout_ms": { "type": "integer", "minimum": 1 }`), and existing REPL success coverage passes `timeout_ms:500`; the timeout regression passes `timeout_ms:10`; there is no test for omitted-timeout long-running code. Therefore `REPL({language:"python", code:"import time; time.sleep(999999)"})` can freeze the same tool dispatch path indefinitely, which is worse than Sleep's 5-minute cap and easy for a model to trigger by forgetting the optional field. **Required fix shape:** (a) make `timeout_ms` default to a conservative deadline (for example 30s) instead of unbounded wait; (b) enforce an upper cap and return a structured timeout error matching bash/PowerShell timeout surfaces; (c) poll in small intervals that can observe a shared abort signal if/when the tool registry gains one; (d) add regressions for omitted timeout, explicit timeout, excessive timeout, and successful short code; (e) include elapsed/timeout metadata in the JSON error so claws can distinguish user-code hang from interpreter startup failure. **Why this matters:** REPL is a model-facing code execution tool. Optional timeout means the safest path depends on the model remembering to provide a guard every time; one missing field can wedge unattended claw runs forever with no heartbeat or typed recovery event. Source: gaebal-gajae dogfood response to Clawhip message `1507412753374249032` on 2026-05-22.
+
+588. **Prompt-mode `read_piped_stdin()` still reads the entire pipe with no cap before merging it into the prompt, so the one-shot prompt path can OOM and API/session history can diverge from the persisted truncated JSONL** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a1728b2`. Active tmux session at probe time: `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Channel context included Jobdori's #596 `LineEditor::read_line_fallback` unbounded line-read finding, so this probe checked the other piped-stdin entrypoint. Code inspection: `rust/crates/rusty-claude-cli/src/main.rs::read_piped_stdin` returns `None` for TTY stdin, then does `let mut buffer = String::new(); io::stdin().read_to_string(&mut buffer)` with no byte/char cap. In `CliAction::Prompt`, when `permission_mode == DangerFullAccess`, this entire buffer is appended by `merge_prompt_with_stdin` to the user prompt and sent to `LiveCli::run_turn_with_output`. The session layer later truncates JSONL fields to `MAX_JSONL_FIELD_CHARS = 16 * 1024`, so a huge piped context can be fully allocated and sent to the provider while the persisted session records only a truncated copy. This is distinct from #596: `read_line_fallback` covers non-TTY REPL line input; `read_piped_stdin` covers explicit one-shot prompt + piped context. Existing tests exercise merge formatting but not large stdin caps or persistence parity. **Required fix shape:** (a) introduce one shared `MAX_STDIN_PROMPT_BYTES`/`MAX_STDIN_PROMPT_CHARS` boundary for both `read_line_fallback` and `read_piped_stdin`; (b) read through `take(limit + 1)` or chunked bounded reads so oversize input is detected before unbounded allocation; (c) fail with a typed “stdin prompt too large” error or summarize/truncate before both API send and session persistence using the same content; (d) add tests for empty stdin, normal small piped context, limit-boundary input, and oversize input; (e) include the stdin byte/char count and cap in JSON/text diagnostics without echoing the large payload. **Why this matters:** piped stdin is a primary automation path (`cat file | claw prompt ...`). If it reads unbounded context and then persists a different truncated transcript, claws cannot replay, audit, or recover the same conversation the provider actually saw. Source: gaebal-gajae dogfood response to Clawhip message `1507420302924185720` on 2026-05-22.
+
+589. **Managed-proxy MCP transport has no auth field or `requires_user_auth` path, so protected proxy endpoints cannot use the same OAuth/auth lifecycle as HTTP/SSE** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a93f36d`. Active tmux sessions at probe time: `gajae-issue-351-ops-surface-inventory-snapshot`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Channel context included Jobdori's #597 WebSocket OAuth gap, so this probe checked the remaining remote/proxy MCP transport shapes. Code inspection: `runtime/src/config.rs::McpManagedProxyServerConfig` has only `{ url, id }`, `parse_mcp_server_config` for `"claudeai-proxy"` parses only those two fields, `runtime/src/mcp_client.rs::McpClientTransport::ManagedProxy` stores `McpManagedProxyTransport { url, id }` with no `auth`, and `commands/src/lib.rs` reports only URL/proxy id in text/JSON. Unlike SSE/HTTP `McpRemoteTransport`, the managed-proxy path cannot carry `McpOAuthConfig`, cannot report `requires_user_auth()`, and cannot participate in the same user-auth/preflight lifecycle. A protected managed proxy must either be unauthenticated, rely on credentials encoded in URL/query (already a reporting leak class), or use an out-of-band mechanism invisible to `mcp list/show/doctor`. **Required fix shape:** (a) add `oauth: Option<McpOAuthConfig>` or an explicit managed-proxy auth config to `McpManagedProxyServerConfig`; (b) include `auth: McpClientAuth` in `McpManagedProxyTransport` or otherwise expose a shared `requires_user_auth` trait across all remote transports; (c) parse/report the auth requirement in `mcp show/list` without leaking tokens; (d) add tests for `claudeai-proxy` with OAuth proving bootstrap requires user auth and JSON/text surfaces expose non-secret auth metadata; (e) align with the WebSocket fix so every network MCP transport (SSE/HTTP/WS/managed-proxy) has symmetric auth configuration or an explicit documented reason why not. **Why this matters:** managed proxy is a remote MCP lifecycle path. If it cannot express auth, operators lose preflight visibility and are pushed toward URL/header secret hacks that diagnostics either leak (#90) or cannot validate. Source: gaebal-gajae dogfood response to Clawhip message `1507427853031968799` on 2026-05-22.
+
+590. **Configured remote MCP transports are parsed and shown as first-class, but runtime `McpServerManager` only starts stdio servers and silently degrades every HTTP/SSE/WS/SDK/managed-proxy server into an unsupported registration failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@106c243`. Active tmux sessions at probe time: `gajae-issue-353-review-ci-verdict-transition-receipt`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/config.rs` parses transport variants `stdio`, `sse`, `http`, `ws`, `sdk`, and `claudeai-proxy`; `commands/src/lib.rs` renders their URL/header/OAuth/proxy details in `mcp list/show`. But `runtime/src/mcp_stdio.rs::McpServerManager::from_servers` inserts only `McpTransport::Stdio` into `managed_servers`; every other transport is pushed into `unsupported_servers` with reason `transport X is not supported by McpServerManager`. `discover_tools_best_effort` later turns those into `McpLifecyclePhase::ServerRegistration` failures/degraded startup, while `discover_tools` over `server_names()` ignores them entirely because `server_names()` only returns managed stdio server keys. Existing test `manager_records_unsupported_non_stdio_servers_without_panicking` locks the current behavior for http/sdk/ws as expected. **Required fix shape:** (a) decide whether remote transports are supported in this build; if not, make config parsing/show/list label them as `configured_but_runtime_unsupported` rather than implying they are usable; (b) if they are intended to work, implement separate managers/clients for HTTP/SSE/WS/managed-proxy and route discovery/tool calls by transport; (c) surface unsupported required remote servers as hard startup blockers in prompt/runtime preflight, and optional ones as typed degraded state, not just best-effort discovery noise; (d) add JSON fields in `mcp list/show/doctor` such as `runtime_supported:false`, `unsupported_reason`, and `required`; (e) add regressions proving `discover_tools` and prompt startup cannot silently ignore required non-stdio servers. **Why this matters:** users can configure and inspect remote MCP servers today, including auth fields for HTTP/SSE, but the actual runtime path does not run them. That split creates event/log opacity: `mcp show` looks configured while prompt-time tool discovery either degrades or omits the server, so operators cannot tell from control-plane surfaces whether remote MCP tools are actually reachable. Source: gaebal-gajae dogfood response to Clawhip message `1507435406277476393` on 2026-05-22.
+
+591. **MCP degraded reports always compute `missing_tools` as empty because degraded construction passes `available_tools` (or `Vec::new()`) as the expected-tool set, so failed/unsupported servers never surface which tools disappeared** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f50625a`. Active tmux sessions at probe time: `gajae-issue-355-backlog-zero-next-artifact-queue`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/mcp_lifecycle_hardened.rs::McpDegradedReport::new` can compute `missing_tools` by subtracting `available_tools` from `expected_tools`. But both producers discard the useful expected set: `runtime/src/mcp_stdio.rs::discover_tools_best_effort` passes `Vec::new()` for `expected_tools`, and `rusty-claude-cli/src/main.rs::RuntimeMcpState::new` passes `available_tools.clone()` as both `available_tools` and `expected_tools`. Therefore `missing_tools` is always empty, even when required servers fail discovery or unsupported remote transports are configured. Existing tests assert `degraded.missing_tools.is_empty()`, locking the broken contract. The only visible degraded data is failed server names/phases; operators cannot tell which qualified tool names vanished from the tool surface. **Required fix shape:** (a) retain each server's advertised/expected tool list from prior successful discovery, static config, manifest metadata, or at least the expected server/tool namespace when known; (b) pass that set into `McpDegradedReport::new` instead of `available_tools`/empty; (c) if exact tools are unknown, expose `missing_tool_names_unknown_for_servers:[...]` so the report is honest rather than falsely empty; (d) update tests to assert non-empty `missing_tools` or explicit unknown-missing metadata when a server with known tools fails; (e) thread the same degraded metadata into `ToolSearch` so models see which tools are unavailable, not just that a server failed. **Why this matters:** degraded startup is supposed to make partial success first-class. An always-empty `missing_tools` field falsely suggests no capability loss, hiding the actual impact of MCP failures and unsupported remote transports from both humans and automation. Source: gaebal-gajae dogfood response to Clawhip message `1507442954308948040` on 2026-05-22.
+
+592. **Plugin degraded-mode reporting drops degraded-server errors because `PluginState::Degraded` only preserves failed server details, so a server marked degraded can appear fully healthy/available in the operator contract** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1df4114`. Active tmux session at probe time: `gajae-issue-357-review-session-vanish-replacement`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime plugin_lifecycle -- --nocapture` passed 6/6, confirming the current locked contract. Code inspection: `runtime/src/plugin_lifecycle.rs::PluginState::from_servers` treats `ServerStatus::Degraded` as usable by placing it in `healthy_servers`, but the `PluginState::Degraded` variant stores only `healthy_servers: Vec<String>` and `failed_servers: Vec<ServerHealth>`. `PluginHealthcheck::degraded_mode` then builds `unavailable_tools` only from `failed_servers` and reports a reason of `"N servers healthy, M servers failed"`. Existing test `degraded_server_status_keeps_server_usable` asserts a degraded `beta` server is included in `healthy_servers` and `failed_servers` is empty, but it does not assert that `beta`'s `last_error` (`high latency`) or degraded status is visible in `degraded_mode`. Result: a plugin with one healthy server and one degraded-but-usable server can emit `startup_degraded` while its degraded-mode payload says `2 servers healthy, 0 servers failed`, has no degraded server list, and can mark the degraded server's tools as simply available if discovery returns them. Operators and automation see the terminal state but lose the actual degraded reason/capability risk. **Required fix shape:** (a) split `PluginState::Degraded` into `healthy_servers`, `degraded_servers: Vec<ServerHealth>`, and `failed_servers`, or preserve all non-healthy server health records; (b) make `degraded_mode` include `degraded_tools`/`degraded_servers` with `last_error` separately from unavailable failed tools; (c) update the reason string/counts to distinguish healthy, degraded, and failed rather than folding degraded into healthy; (d) add tests proving a `ServerStatus::Degraded` server's status/error/capabilities appear in the degraded-mode JSON while remaining callable when appropriate; (e) align this with MCP degraded reports so partial plugin startup carries both availability and quality-of-service impact. **Why this matters:** partial startup needs impact opacity removed. A degraded server is not failed, but it also is not fully healthy; folding it into the healthy list makes `startup_degraded` hard to diagnose and can trick recovery logic into thinking no server has actionable degradation details. Source: gaebal-gajae dogfood response to Clawhip message `1507450501925441616` on 2026-05-22.
+
+593. **Plugin uninstall is a three-store non-atomic transaction: it deletes the plugin directory before persisting registry/settings cleanup, so an IO failure can leave registry and `enabledPlugins` pointing at a removed plugin** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@244bdb7`. Active tmux session at probe time: `gajae-pr-358-review-session-vanish-replacement-review`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p plugins installs_enables_updates_and_uninstalls_external_plugins -- --nocapture` passed 1/1, confirming the happy-path lifecycle test does not cover interrupted persistence. Code inspection: `plugins/src/lib.rs::PluginManager::uninstall` removes the plugin record from the in-memory registry, then `remove_dir_all(record.install_path)`, then `store_registry(&registry)`, then `write_enabled_state(plugin_id, None)`. If directory removal succeeds but `store_registry` fails, `installed.json` still records the old plugin while the on-disk install directory is gone. If `store_registry` succeeds but `write_enabled_state` fails, the registry and disk are removed but `.claw/settings.json` can still contain `enabledPlugins[plugin_id]=true`. The next `plugin_registry_report` may silently prune the stale registry entry, but the enabled state can remain as orphaned configuration noise and the original uninstall command has already destroyed the install before it can report a clean transactional result. Existing `installs_enables_updates_and_uninstalls_external_plugins` asserts only the success path; it does not simulate registry write failure, settings write failure, or crash after `remove_dir_all`. **Required fix shape:** (a) make plugin uninstall a transaction with a tombstone/backup phase: first persist intended disabled/removed state or acquire a lock, then move the install directory to a same-filesystem trash/backup path, then update registry/settings, then remove backup after all metadata commits succeed; (b) if metadata cleanup fails, restore or report a recoverable tombstoned state with `rollback_available:true`; (c) make stale enabled settings for missing plugins produce a structured warning and auto-clean only through a safe metadata path; (d) add tests injecting failures after directory move/removal, after `store_registry`, and after `write_enabled_state`; (e) reuse the atomic JSON/write-lock helper from #508 so registry/settings writes cannot be partially written. **Why this matters:** uninstall is the destructive plugin lifecycle verb. Deleting files before committing metadata means a transient settings/registry IO failure turns a local cleanup into stale-branch-style plugin confusion: inventory can say installed/enabled while the executable hook/tool files are already gone. Source: gaebal-gajae dogfood response to Clawhip message `1507458055812546630` on 2026-05-22.
+
+594. **MCP stdio frame reader trusts arbitrary `Content-Length` and allocates that many bytes before reading, so a buggy or hostile MCP server can OOM the runtime with one header** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 19:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d996b65`. Active tmux session at probe time: `gajae-pr-358-review-session-vanish-replacement-review`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime mcp_stdio -- --nocapture` passed 22/22, confirming current MCP stdio tests cover normal/mismatched/lowercase frames but not oversized frames. Code inspection: `runtime/src/mcp_stdio.rs::McpStdioProcess::read_frame` parses `Content-Length` into `usize`, then immediately does `let mut payload = vec![0_u8; content_length]; self.stdout.read_exact(&mut payload).await?;`. There is no maximum frame size, no per-server byte budget, no early rejection on huge lengths, and no streaming cap. A server can send `Content-Length: 10000000000
+
+` (or any value near available memory) and force allocation before any payload bytes arrive; the surrounding `run_process_request` timeout does not protect the allocation itself. This is distinct from HTTP body caps (#503) and SSE parser buffering (#506): it is the MCP JSON-RPC stdio framing layer. Existing tests assert lowercase `Content-Length`, missing/mismatched IDs, timeout, and retry/reset behavior, but none assert a maximum accepted frame length. **Required fix shape:** (a) add a conservative `MAX_MCP_STDIO_FRAME_BYTES` default and optional per-server override; (b) after parsing `Content-Length`, reject values above the cap with `io::ErrorKind::InvalidData` carrying `content_length` and `max_frame_bytes`; (c) read the body through a bounded buffer/helper so allocation is capped and timeout/error surfaces stay typed as MCP invalid response; (d) add regression scripts that emit huge `Content-Length` with no body and oversized body, proving no large allocation and a structured invalid-response error; (e) include frame-size metadata in MCP degraded/error reports so operators can distinguish protocol abuse from transport EOF. **Why this matters:** MCP servers are extension processes. The client must treat their stdio as untrusted protocol input; one oversized length header should not be able to OOM a prompt startup, tool discovery, or resource read before degraded-mode reporting can fire. Source: gaebal-gajae dogfood response to Clawhip message `1507465601499660349` on 2026-05-22.
+
+595. **OAuth authorize URL builder allows `extra_params` to override core PKCE/OAuth parameters after they were already set, so plugin/config extras can replace `state`, `code_challenge`, `redirect_uri`, or `response_type`** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@46f3bff`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime oauth -- --nocapture` passed 9/9, confirming current OAuth tests cover happy-path URL/form/callback parsing but not reserved extra-param collisions. Code inspection: `runtime/src/oauth.rs::OAuthAuthorizationRequest::build_url` creates a `params` vector containing core parameters (`response_type=code`, `client_id`, `redirect_uri`, `scope`, `state`, `code_challenge`, `code_challenge_method`), then blindly `extend`s `self.extra_params` into the same query. `with_extra_param` accepts any key and stores it in a `BTreeMap`, with no reserved-name validation. A caller that sets `with_extra_param("state", "attacker")`, `code_challenge`, `redirect_uri`, `response_type`, `client_id`, or `scope` produces a URL with duplicate query parameters where the extra value appears after the core value. Because many OAuth parsers use last-value-wins semantics, this can desynchronize the locally expected state/PKCE verifier from the authorization-server-visible values, or change redirect/scope semantics. Jobdori separately filed duplicate callback parameters (#603); this is the outbound sibling: duplicates are generated by the client itself before the browser redirect, not just accepted on callback. **Required fix shape:** (a) define a reserved parameter set for OAuth authorization requests (`response_type`, `client_id`, `redirect_uri`, `scope`, `state`, `code_challenge`, `code_challenge_method`) and reject attempts to add them via `with_extra_param`; (b) make `with_extra_param` return `Result<Self, OAuthError>` or validate in `build_url` with a typed error rather than silently emitting duplicates; (c) add tests for reserved collisions (`state`, `code_challenge`, `redirect_uri`) and a safe extension like `login_hint`; (d) if an override is intentionally supported, make it explicit and update the stored expected state/verifier/redirect to match so callback/token exchange cannot drift; (e) document provider-specific extra params as additive-only. **Why this matters:** `state` and PKCE are the OAuth anti-CSRF/proof-of-possession controls. Letting arbitrary extras duplicate or override them in the authorization URL creates prompt/auth lifecycle ambiguity and can turn a provider-specific hint hook into a security-sensitive parameter injection footgun. Source: gaebal-gajae dogfood response to Clawhip message `1507473155273265172` on 2026-05-22.
+
+596. **MCP tool bridge gates `call_tool` on a stale in-memory registry snapshot before doing live discovery, so newly discovered tools can be rejected and removed tools can be offered until runtime failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@20c9d9d`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime mcp_tool_bridge -- --nocapture` passed 19/19, confirming current bridge tests cover pre-registered happy-path tools but not registry/manager drift. Code inspection: `runtime/src/mcp_tool_bridge.rs::McpToolRegistry::call_tool` first locks `self.inner`, requires `state.status == Connected`, and checks `state.tools.iter().any(|t| t.name == tool_name)`. Only after that snapshot gate does it drop the registry lock and call `spawn_tool_call`, which creates a runtime and runs `manager.discover_tools().await` followed by `manager.call_tool(...)`. The live discovery result updates the manager/tool index, but the registry snapshot used for admission is never refreshed from that discovery. Therefore a tool that becomes available after startup/discovery refresh is still rejected as `tool not found` if it is absent from the stale registry, while a tool that disappeared can remain listed/accepted by the registry and then fail later as an `UnknownTool`/runtime error from the manager. Existing tests explicitly register `echo` in the registry before calling the live manager, so they lock in the assumption that the registry already matches runtime discovery. **Required fix shape:** (a) make `call_tool` reconcile registry state from `manager.discover_tools()` before the tool-existence gate, or delegate existence checks entirely to the authoritative manager and then update the registry snapshot; (b) add a registry `last_discovered_at`/generation or per-server tool-source metadata so `list_tools` can report stale/degraded state; (c) add regressions where the registry starts empty but the manager discovers `echo`, and where the registry advertises a stale tool that the manager no longer exposes; (d) ensure `list_tools`, `ToolSearch`, and `call_tool` agree on the same generation/tool surface; (e) surface drift as a typed MCP lifecycle/degraded event rather than a generic `tool not found`. **Why this matters:** the bridge is a control-plane contract between model-visible MCP tools and the live server manager. If admission checks use an old snapshot while execution uses fresh discovery, claws get stale tool availability evidence and confusing failures exactly where autonomous recovery needs a single source of truth. Source: gaebal-gajae dogfood response to Clawhip message `1507480700868362384` on 2026-05-22.
+
+597. **Worker prompt replay accepts a new `task_receipt` even when replaying a recovered prompt, so auto-recovery can pair the old prompt text with a new/different execution receipt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@71f8554`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime worker_boot -- --nocapture` passed 22/22, confirming current worker-boot tests cover replay happy path and receipt mismatch detection after observation but not replay receipt integrity. Code inspection: `runtime/src/worker_boot.rs::WorkerRegistry::send_prompt` computes `next_prompt` from the explicit `prompt` argument or falls back to `worker.replay_prompt.clone()`, then unconditionally sets `worker.expected_receipt = task_receipt`. The tool input (`tools/src/lib.rs::WorkerSendPromptInput`) allows `prompt: Option<String>` and `task_receipt: Option<WorkerTaskReceipt>` independently. After prompt misdelivery, `observe` sets `worker.replay_prompt = worker.last_prompt.clone()` and payloads include the original `worker.expected_receipt`. But a subsequent `send_prompt(worker_id, None, Some(new_receipt))` replays the old recovered prompt while replacing the expected receipt with a new one. Conversely, replaying with `None` drops the expected receipt entirely. The later wrong-task-receipt detector compares observed receipts against this newly supplied/dropped value, not the receipt that belonged to the misdelivered prompt. Existing tests `prompt_misdelivery_is_detected_and_replay_can_be_rearmed` and `wrong_task_receipt_mismatch_is_detected_before_execution_continues` do not assert that replay preserves the original receipt or rejects receipt changes. **Required fix shape:** (a) when `prompt` is omitted and `worker.replay_prompt` is used, preserve the original `worker.expected_receipt` and reject any conflicting `task_receipt`; (b) store replay as a structured `{prompt, task_receipt}` bundle instead of only `String`; (c) if callers intentionally want a new prompt/receipt, require an explicit non-empty `prompt` and clear the replay bundle with an event; (d) add regressions for replay-with-new-receipt rejection, replay-with-no-receipt preserving the original receipt, and explicit new prompt replacing both prompt and receipt; (e) include receipt id/hash in `PromptReplayArmed`/`Running` event payloads without leaking sensitive prompt text. **Why this matters:** prompt recovery is supposed to resend the same task that was misdelivered. Letting the control plane silently combine old prompt text with a new or missing receipt breaks the provenance chain, causing stale/incorrect task execution evidence and making misdelivery recovery untrustworthy. Source: gaebal-gajae dogfood response to Clawhip message `1507488254734106685` on 2026-05-22.
+
+598. **Sub-agent lane manifests emit `LaneEvent::started/blocked/failed/finished/commit_created` with minimal optional metadata, so exported lane events fail the stricter G004 `emitterIdentity`/`environmentLabel` contract and have duplicate `seq=0` ordering** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d2018d7`. Active tmux sessions at probe time: none; no active claw-code implementation session. Channel context included Jobdori filing #606 for the general G004 validator/model mismatch; this pinpoint is the concrete producer-side lane-manifest impact. Focused validation: `cd rust && cargo test -p tools lane -- --nocapture` passed 8/8, confirming current tools lane tests cover canonical event names/failure taxonomy but not G004 conformance of generated sub-agent manifests. Code inspection: `tools/src/lib.rs::create_agent_with_spawn` initializes `AgentOutput.lane_events` with `LaneEvent::started(iso8601_now())`; `persist_agent_terminal_state` appends `LaneEvent::blocked`, `LaneEvent::failed`, `LaneEvent::finished(...).with_data(...)`, and `LaneEvent::commit_created(...)`. All of these constructors call `LaneEvent::new`, which uses `LaneEventMetadata::new(0, EventProvenance::LiveLane)` and leaves `environment_label`/`emitter_identity` as `None`. Because metadata fields are skipped when `None`, serialized manifests omit the exact fields that `g004_conformance.rs::validate_lane_events` requires, and every appended event also carries `metadata.seq = 0`, violating the validator's strictly-increasing sequence rule as soon as a manifest has multiple events. This means even if the data model is fixed broadly, current sub-agent manifest production still needs a generation path that fills emitter/environment and monotonic seq. **Required fix shape:** (a) route all sub-agent lane event creation through `LaneEventBuilder` or a manifest-local helper that assigns monotonic sequence numbers; (b) set non-secret defaults such as `emitterIdentity:"tools.subagent"` and `environmentLabel` from cwd/session/channel/config, never leave them absent for exported manifests; (c) add a test that creates and completes/fails a sub-agent manifest then validates its `lane_events` with `validate_g004_contract_bundle` or a lane-event-specific conformance helper; (d) update helper constructors or document them as internal/non-G004 only; (e) align with #606 so constructor defaults and validator requirements agree. **Why this matters:** lane manifests are the event-log surface for autonomous sub-agents. If their generated events cannot pass the repo's own conformance validator and all share seq 0, downstream dashboards/reconcilers cannot rely on ordering, ownership, or environment provenance. Source: gaebal-gajae dogfood response to Clawhip message `1507495807052550174` on 2026-05-22.
Author	SHA1	Message	Date
Yeachan-Heo	e5d2eb1423	docs(roadmap): add subagent lane event conformance gap	2026-05-22 21:31:15 +00:00
Yeachan-Heo	d2018d7aee	docs(roadmap): add worker replay receipt integrity gap	2026-05-22 21:01:16 +00:00
Yeachan-Heo	71f85541bd	docs(roadmap): add mcp tool bridge registry drift gap	2026-05-22 20:31:24 +00:00
Yeachan-Heo	20c9d9d6c3	docs(roadmap): add oauth authorize extra param collision gap	2026-05-22 20:01:13 +00:00
Yeachan-Heo	46f3bff7ef	docs(roadmap): add mcp stdio frame size cap gap	2026-05-22 19:31:27 +00:00
Yeachan-Heo	d996b65d64	docs(roadmap): add plugin uninstall transaction gap	2026-05-22 19:01:43 +00:00
Yeachan-Heo	244bdb78fd	docs(roadmap): add plugin degraded server reporting gap	2026-05-22 18:31:51 +00:00
Yeachan-Heo	1df41147e3	docs(roadmap): add mcp degraded missing tools gap	2026-05-22 18:01:37 +00:00
Yeachan-Heo	f50625a04b	docs(roadmap): add remote mcp unsupported runtime gap	2026-05-22 17:31:15 +00:00
Yeachan-Heo	106c243bd3	docs(roadmap): add managed proxy mcp auth gap	2026-05-22 17:01:25 +00:00
Yeachan-Heo	a93f36d0b9	docs(roadmap): add prompt piped stdin cap gap	2026-05-22 16:30:55 +00:00
Yeachan-Heo	a1728b2be8	docs(roadmap): add repl missing default timeout gap	2026-05-22 16:01:10 +00:00
Yeachan-Heo	9843ccfa28	docs(roadmap): add plugin registry sync race gap	2026-05-22 15:31:00 +00:00
Yeachan-Heo	8043090953	docs(roadmap): add replay armed stale error gap	2026-05-22 15:01:23 +00:00
Yeachan-Heo	b0bca2ea4f	docs(roadmap): add worker restart evidence scope gap	2026-05-22 14:30:51 +00:00
Yeachan-Heo	ac0903362c	docs(roadmap): add permission file traversal gap	2026-05-22 14:01:09 +00:00
Yeachan-Heo	aa9efe5474	docs(roadmap): add table alignment render gap	2026-05-22 13:30:55 +00:00
Yeachan-Heo	ccb319a3dc	docs(roadmap): add link label styling gap	2026-05-22 13:00:52 +00:00
Yeachan-Heo	8c4b33d9b1	docs(roadmap): add indented fence streaming gap	2026-05-22 12:30:49 +00:00
Yeachan-Heo	9f762b26fa	docs(roadmap): add stream text-tool ordering gap	2026-05-22 12:00:54 +00:00
Yeachan-Heo	d8864ff151	docs(roadmap): add websearch base url boundary gap	2026-05-22 11:30:40 +00:00
Yeachan-Heo	bff67c2b24	docs(roadmap): add webfetch redirect boundary gap	2026-05-22 11:00:48 +00:00
Yeachan-Heo	571b4c2cd1	docs(roadmap): add glob brace fanout gap	2026-05-22 10:30:50 +00:00
Yeachan-Heo	ab5754a524	docs(roadmap): add grep heavy-dir traversal gap	2026-05-22 10:00:54 +00:00
Yeachan-Heo	819f67b01b	docs(roadmap): add kimi prefix wire-model gap	2026-05-22 09:01:22 +00:00
Yeachan-Heo	c1bb355691	docs(roadmap): add grep zero limit gap	2026-05-22 07:30:46 +00:00
Yeachan-Heo	eb12c3d1ef	docs(roadmap): add grep output mode validation gap	2026-05-22 07:00:48 +00:00
Yeachan-Heo	ef67aa9f96	docs(roadmap): add silent grep skip gap	2026-05-22 06:30:54 +00:00
Yeachan-Heo	8632f90f3d	docs(roadmap): add delayed binary read gap	2026-05-22 05:30:46 +00:00
Yeachan-Heo	a49d8e8361	docs(roadmap): add env wrapper read-only gap	2026-05-22 05:01:15 +00:00
Yeachan-Heo	2bb5d49ce3	docs(roadmap): add git fetch read-only gap	2026-05-22 04:31:16 +00:00
Yeachan-Heo	1882c9582a	docs(roadmap): add path traversal validation gap	2026-05-22 04:01:56 +00:00
Yeachan-Heo	665dd1fe2a	docs(roadmap): add g004 approval delegation continuity gap	2026-05-22 00:30:49 +00:00
Yeachan-Heo	54bbee2680	docs(roadmap): add g004 suffix path validation gap	2026-05-22 00:01:11 +00:00
Yeachan-Heo	8ea54d3a50	docs(roadmap): add approval token expiry boundary gap	2026-05-21 23:30:43 +00:00
Yeachan-Heo	3d877d78f3	docs(roadmap): add branch lock module normalization gap	2026-05-21 23:00:47 +00:00
Yeachan-Heo	f45b651e18	docs(roadmap): add auto compaction telemetry gap	2026-05-21 22:30:42 +00:00
Yeachan-Heo	a8c67b08e2	docs(roadmap): add auto compaction short huge session gap	2026-05-21 22:00:55 +00:00
Yeachan-Heo	9ef521bb98	docs(roadmap): add tolerant tool permission prompt gap	2026-05-21 21:30:41 +00:00
Yeachan-Heo	9fd61af086	docs(roadmap): add prompt echo glyph mismatch gap	2026-05-21 21:00:45 +00:00
Yeachan-Heo	bc55711600	docs(roadmap): add workspace test flag order preflight gap	2026-05-21 20:30:47 +00:00
Yeachan-Heo	6ef54578e8	docs(roadmap): add workspace preflight remote base gap	2026-05-21 20:00:55 +00:00
Yeachan-Heo	a036293829	docs(roadmap): add recovery ledger timestamp test gap	2026-05-21 19:30:59 +00:00
Yeachan-Heo	1540e3a9ab	docs(roadmap): add worker restart timeout anchor gap	2026-05-21 19:00:52 +00:00
Yeachan-Heo	51c05ac359	docs(roadmap): add count_tokens retry coverage gap	2026-05-21 18:30:53 +00:00
Yeachan-Heo	762ee656f3	docs(roadmap): add lane completion timestamp gap	2026-05-21 18:00:48 +00:00
Yeachan-Heo	2bc85c54c8	docs(roadmap): add agent manifest timestamp test gap	2026-05-21 17:30:46 +00:00
Yeachan-Heo	02b68555f5	docs(roadmap): add G004 timestamp validation gap	2026-05-21 17:01:25 +00:00
Yeachan-Heo	d670b43535	docs(roadmap): add timestamp helper fallback gap	2026-05-21 16:31:31 +00:00
Yeachan-Heo	832098c18e	docs(roadmap): add text warning opacity gap	2026-05-21 16:00:51 +00:00
Yeachan-Heo	10f5daca43	docs(roadmap): add prompt init diff warning hang gap	2026-05-21 15:31:12 +00:00
Yeachan-Heo	0a2542abf7	docs(roadmap): add diagnostics warning hang gap	2026-05-21 15:01:16 +00:00
Yeachan-Heo	8d0dc50cef	docs(roadmap): add help warning hang gap	2026-05-21 14:31:15 +00:00
Yeachan-Heo	785d4bde40	docs(roadmap): add inventory warning hang gap	2026-05-21 14:01:31 +00:00
Yeachan-Heo	dbc712597a	docs(roadmap): add top-level session command gap	2026-05-21 13:33:09 +00:00
Yeachan-Heo	adaf8c37ac	docs(roadmap): add eager session store creation gap	2026-05-21 13:01:08 +00:00
Yeachan-Heo	8a92f0e467	docs(roadmap): add resume slash cold-workspace contract gap	2026-05-21 12:30:52 +00:00
Yeachan-Heo	3b9332b814	docs(roadmap): add slash json error contract gap	2026-05-21 11:30:52 +00:00
Yeachan-Heo	ad6b5734f1	docs(roadmap): add slash command error kind drift	2026-05-21 11:00:47 +00:00
Yeachan-Heo	236d345f56	docs(roadmap): add pr issue argument hang	2026-05-21 10:31:14 +00:00
Yeachan-Heo	87fa106dff	docs(roadmap): add diff commit argument hang	2026-05-21 10:01:42 +00:00
Yeachan-Heo	03fefd25c3	docs(roadmap): add help topic fallthrough hang	2026-05-21 09:31:05 +00:00
Yeachan-Heo	c8e2ae4ce4	docs(roadmap): add bootstrap acp argument hang	2026-05-21 09:01:12 +00:00
Yeachan-Heo	d2a4b7b3a9	docs(roadmap): add export init argument hang	2026-05-21 08:31:06 +00:00
Yeachan-Heo	7e35ed05f3	docs(roadmap): add status version positional hang	2026-05-21 08:01:23 +00:00
Yeachan-Heo	fc591b6aae	docs(roadmap): add doctor sandbox positional hang	2026-05-21 07:31:10 +00:00
Yeachan-Heo	297a9c4b3b	docs(roadmap): add skills manifest fallthrough hang	2026-05-21 07:01:22 +00:00
Yeachan-Heo	4fc57a3595	docs(roadmap): add lifecycle subcommand fallthrough hang	2026-05-21 06:31:30 +00:00
Yeachan-Heo	66e3c8c6d7	docs(roadmap): add config unknown section hang	2026-05-21 06:01:53 +00:00
Yeachan-Heo	63f4865c8d	docs(roadmap): add config section parse kind drift	2026-05-21 05:31:13 +00:00
Yeachan-Heo	4639a5820c	docs(roadmap): add config env unsupported flag hang	2026-05-21 05:01:10 +00:00
Yeachan-Heo	24ccc6e0d2	docs(roadmap): add broad-cwd override flag scope gap	2026-05-21 04:31:14 +00:00
Yeachan-Heo	06517490e6	docs(roadmap): add dangerous flag diagnostic scope gap	2026-05-21 04:01:41 +00:00
Yeachan-Heo	88c4412a93	docs(roadmap): add compact trailing flag hang gap	2026-05-21 03:31:25 +00:00
Yeachan-Heo	5665ca1581	docs(roadmap): add missing allowedTools value hang gap	2026-05-21 03:01:16 +00:00
Yeachan-Heo	aacabdfb89	docs(roadmap): add missing permission-mode value hang gap	2026-05-21 02:01:00 +00:00
Yeachan-Heo	41a17e26ca	docs(roadmap): add missing model value hang gap	2026-05-21 01:30:58 +00:00
Yeachan-Heo	ce9ae27f39	docs(roadmap): add missing output-format value hang gap	2026-05-21 01:01:05 +00:00
Yeachan-Heo	62f5ab5611	docs(roadmap): add invalid output-format hang gap	2026-05-21 00:31:09 +00:00
Yeachan-Heo	918f8f709d	docs(roadmap): add config env type validation gap	2026-05-21 00:00:53 +00:00
Yeachan-Heo	0bbe19eb7a	docs(roadmap): add config env secret redaction gap	2026-05-20 23:31:55 +00:00
Yeachan-Heo	fbd2f01347	docs(roadmap): add config model type silent ignore	2026-05-20 23:01:22 +00:00
Yeachan-Heo	07a12d4cf3	docs(roadmap): add binary overwrite misreported create	2026-05-20 22:30:53 +00:00
Yeachan-Heo	ea29fbeb44	docs(roadmap): add whole-file structured patch amplification	2026-05-20 22:00:40 +00:00
Yeachan-Heo	b3b9eb23b5	docs(roadmap): add glob search traversal cap gap	2026-05-20 21:30:39 +00:00
Yeachan-Heo	0864f39512	docs(roadmap): add grep search pre-limit scan gap	2026-05-20 21:00:44 +00:00
Yeachan-Heo	02f288724f	docs(roadmap): add read_file default output cap	2026-05-20 20:30:49 +00:00
Yeachan-Heo	154e7eda07	docs(roadmap): add file edit output amplification	2026-05-20 19:30:53 +00:00
Yeachan-Heo	8c62fffbd7	docs(roadmap): add non-atomic config writes	2026-05-20 19:00:50 +00:00
Yeachan-Heo	e2b96ead3d	docs(roadmap): add anthropic request triple serialization	2026-05-20 18:31:01 +00:00
Yeachan-Heo	7bc373f951	docs(roadmap): add sse parser quadratic buffering	2026-05-20 18:00:47 +00:00
Yeachan-Heo	ad8a0b3deb	docs(roadmap): add g004 report schema mismatch	2026-05-20 17:30:52 +00:00
Yeachan-Heo	19a19182ef	docs(roadmap): add lane event conformance mismatch	2026-05-20 17:00:50 +00:00
Yeachan-Heo	8e8dea5023	docs(roadmap): add web fetch body cap gap	2026-05-20 16:30:47 +00:00
Yeachan-Heo	f751c98ea5	docs(roadmap): add utf8 truncation panic	2026-05-20 16:00:53 +00:00
Yeachan-Heo	51ea1aa01e	docs(roadmap): add powershell output amplification	2026-05-20 15:30:42 +00:00
Yeachan-Heo	7e73cdb60f	docs(roadmap): add repl output amplification	2026-05-20 15:01:44 +00:00
Yeachan-Heo	25b8dbb313	docs(roadmap): add todowrite output amplification	2026-05-20 14:31:04 +00:00
Yeachan-Heo	3d2a047aaf	docs(roadmap): add cargo read-only bypass	2026-05-20 14:00:59 +00:00
Yeachan-Heo	5a4a8ebfb2	docs(roadmap): add gh read-only bypass	2026-05-20 13:31:00 +00:00
Yeachan-Heo	214176d6dc	docs(roadmap): add interpreter read-only bypass	2026-05-20 13:01:39 +00:00
Yeachan-Heo	8382e1ec51	docs(roadmap): add tee read-only bypass	2026-05-20 12:31:57 +00:00
Yeachan-Heo	916bf5f24d	docs(roadmap): add git availability hang	2026-05-20 12:01:19 +00:00
Yeachan-Heo	4d185f0dee	docs(roadmap): add tmux availability hang	2026-05-20 11:31:07 +00:00
Yeachan-Heo	2bf6924e01	docs(roadmap): add gitcontext system prompt hang	2026-05-20 11:01:37 +00:00
Yeachan-Heo	56555a3ad6	docs(roadmap): add dirty diff git hang	2026-05-20 10:31:18 +00:00
Yeachan-Heo	d5aa815b39	docs(roadmap): add status doctor git metadata hang	2026-05-20 10:01:41 +00:00
Yeachan-Heo	edcf5bf33a	docs(roadmap): add stale branch freshness ref gap	2026-05-20 09:30:51 +00:00
Yeachan-Heo	d541130121	docs(roadmap): add status dirty path inventory gap	2026-05-20 09:01:30 +00:00
Yeachan-Heo	92e97e0b63	docs(roadmap): add prompt post-flag hang	2026-05-20 08:31:34 +00:00
Yeachan-Heo	32961cfc5b	docs(roadmap): add prompt clean-home hang	2026-05-20 08:01:25 +00:00
Yeachan-Heo	8a2e133f66	docs(roadmap): add planning alias argument hang	2026-05-20 07:31:23 +00:00
Yeachan-Heo	51a450e473	docs(roadmap): add debug alias argument hang	2026-05-20 07:01:25 +00:00
Yeachan-Heo	dbd04ad334	docs(roadmap): add interactive alias help hang	2026-05-20 06:31:24 +00:00
Yeachan-Heo	625b8b06d8	docs(roadmap): add telemetry alias help hang	2026-05-20 06:01:16 +00:00
Yeachan-Heo	111e7e853c	docs(roadmap): add slash alias help hang	2026-05-20 05:31:22 +00:00
Yeachan-Heo	51e6040b23	docs(roadmap): add session direct command hang	2026-05-20 05:01:09 +00:00
Yeachan-Heo	93f20dfd25	docs(roadmap): add unknown skill invocation hang	2026-05-20 04:32:31 +00:00
Yeachan-Heo	4d52703ca9	docs(roadmap): add export option hang	2026-05-20 03:31:35 +00:00
Yeachan-Heo	90a0d38d52	docs(roadmap): add system-prompt modifier hang	2026-05-20 03:01:45 +00:00
Yeachan-Heo	8afdb9448a	docs(roadmap): add inventory json parity gap	2026-05-20 02:31:20 +00:00
Yeachan-Heo	bb2cf3f448	docs(roadmap): add local diagnostics json gap	2026-05-20 02:01:25 +00:00
Yeachan-Heo	9495dbee30	docs(roadmap): add local verb help arity hangs	2026-05-20 01:31:56 +00:00
Yeachan-Heo	1208b9a034	docs(roadmap): add compact slash-only help hang	2026-05-20 01:01:25 +00:00
Yeachan-Heo	e9db12d98b	docs(roadmap): add silent malformed config gap	2026-05-20 00:30:56 +00:00
Yeachan-Heo	a666fa6f10	docs(roadmap): add version json alias gap	2026-05-20 00:01:10 +00:00
Yeachan-Heo	ac3665fe2f	docs(roadmap): add help alias arity hang	2026-05-19 23:31:17 +00:00
Yeachan-Heo	7cbb6e7fa5	docs(roadmap): add unexpected positional hang	2026-05-19 22:32:00 +00:00
Yeachan-Heo	674cec191f	docs(roadmap): add lifecycle explicit action hangs	2026-05-19 22:01:43 +00:00
Yeachan-Heo	59df684a17	docs(roadmap): add explicit plugin list hang	2026-05-19 21:31:23 +00:00
Yeachan-Heo	31d5db7453	docs(roadmap): add mixed PATH mcp reachability proof	2026-05-19 21:00:48 +00:00
Yeachan-Heo	164589b8e6	docs(roadmap): add mcp executable reachability gap	2026-05-19 20:30:59 +00:00
Yeachan-Heo	b4a732f33d	docs(roadmap): add compact skills output gap	2026-05-19 20:01:01 +00:00
Yeachan-Heo	2980f6de6e	docs(roadmap): add output-format flag placement gap	2026-05-19 19:31:20 +00:00
Yeachan-Heo	2cca298f7a	docs(roadmap): add direct slash invocation contract	2026-05-19 18:31:28 +00:00
Yeachan-Heo	6b90f83b9d	docs(roadmap): add text command help hang	2026-05-19 18:02:27 +00:00
Yeachan-Heo	8303af0898	docs(roadmap): add command-specific json help hang	2026-05-19 17:32:04 +00:00
Yeachan-Heo	3e2d902271	docs(roadmap): add structured root help schema	2026-05-19 17:01:00 +00:00
Yeachan-Heo	78d334c4e2	docs(roadmap): add dogfood timeout retry evidence	2026-05-19 16:31:00 +00:00
Yeachan-Heo	0063c0d698	docs(roadmap): add global flag order and clean version hang	2026-05-19 16:01:15 +00:00
Yeachan-Heo	429671ec12	docs(roadmap): add root help json hang	2026-05-19 15:31:27 +00:00
Yeachan-Heo	ed9d387e9a	docs(roadmap): add shared lifecycle help hang	2026-05-19 15:01:46 +00:00
Yeachan-Heo	8d02077cfd	docs(roadmap): add plugin help hang contract	2026-05-19 14:32:04 +00:00
Yeachan-Heo	44bd2b54f5	docs(roadmap): add plugin lifecycle help contract	2026-05-19 14:01:01 +00:00
Yeachan-Heo	e2c310dc04	docs(roadmap): add stale bundled plugin source provenance	2026-05-19 13:32:37 +00:00
Yeachan-Heo	25d663d140	docs(roadmap): add missing dogfood binary preflight	2026-05-19 13:00:59 +00:00
Yeachan-Heo	6183d958ba	docs(roadmap): add dogfood workdir provenance guard	2026-05-19 12:31:56 +00:00