docs(roadmap): add #426 — plugins json has zero output_format_contract test coverage

docs(roadmap): add no-session kind drift item
Adds ROADMAP #422 documenting the concrete export/resume no-session ErrorKind drift.
2026-05-13 17:36:44 +00:00 · 2026-05-01 05:02:38 +09:00 · 2026-05-01 02:46:12 +09:00
1 changed files with 3 additions and 1 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6301,4 +6301,6 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 380. **Top-level `tokens --help --output-format json` hangs with zero stdout/stderr instead of returning bounded command help JSON** — dogfooded 2026-04-30 for the 02:30 nudge on current `origin/main` / rebuilt `./rust/target/debug/claw` with embedded `git_sha` `d95b230c`. After verifying #358 covered `cost --help`, a fresh adjacent probe on the token-budget surface showed the same silent failure class: repeated bounded runs of `timeout 8 ./rust/target/debug/claw tokens --help --output-format json` exited `124` with `stdout=0` and `stderr=0`. In the same rebuilt binary, `version --output-format json` returned promptly with version/build metadata, proving the binary itself and JSON output path are reachable. This is distinct from #358's cost help hang: the affected surface is the sibling `tokens` command help, which agents use before estimating prompt/session token budgets. **Required fix shape:** (a) make `tokens --help --output-format json` return static/bounded stdout JSON with `kind:"help"` or `kind:"tokens"`, `action:"help"`, usage, options, examples, supported output formats, and related slash/direct commands; (b) ensure help rendering does not initialize slow token accounting, session, or provider state; (c) if any dynamic provider is consulted, return a typed JSON timeout/unavailable error instead of hanging; (d) add regression coverage proving tokens help in JSON mode returns within a deterministic budget. **Why this matters:** token budgeting is a preflight clawability surface. If help hangs silently, automation cannot safely discover how to inspect or constrain token usage before running expensive prompts, and budget-aware wrappers stall at the discovery step. Source: gaebal-gajae dogfood follow-up for the 02:30 nudge on rebuilt `./rust/target/debug/claw` `d95b230c`.
 381. **Top-level `cache --help --output-format json` hangs with zero stdout/stderr instead of returning bounded command help JSON** — dogfooded 2026-04-30 for the 03:00 nudge on current `origin/main` / rebuilt `./rust/target/debug/claw` with embedded `git_sha` `d95b230c`. After #358 and #380 landed for the cost/tokens preflight help hangs, a fresh adjacent probe on the cache-control surface showed the same silent failure class: repeated bounded runs of `timeout --kill-after=1s 8s ./rust/target/debug/claw cache --help --output-format json` exited `124` with `stdout=0` and `stderr=0`. In the same rebuilt binary, `version --output-format json` returned promptly with version/build metadata, proving the binary itself and JSON output path are reachable. This is distinct from the separate `/cache` slash-command envelope mismatch class: the affected surface here is top-level `cache` command help, where agents need bounded local discovery before deciding whether to inspect, clear, or summarize cache state. **Required fix shape:** (a) make `cache --help --output-format json` return static/bounded stdout JSON with `kind:"help"` or `kind:"cache"`, `action:"help"`, usage, options, examples, supported output formats, and related slash/direct commands; (b) ensure help rendering does not initialize slow cache/session/provider state; (c) if any dynamic provider is consulted, return a typed JSON timeout/unavailable error instead of hanging; (d) add regression coverage proving cache help in JSON mode returns within a deterministic budget. **Why this matters:** cache inspection and cleanup are recovery/control-plane operations. If cache help hangs silently, claws cannot safely discover cache semantics before attempting cleanup, and automation stalls before it can choose a non-destructive cache action. Source: gaebal-gajae dogfood follow-up for the 03:00 nudge on rebuilt `./rust/target/debug/claw` `d95b230c`.

-418. **`system-prompt --output-format json` exposes `"__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"` as a literal element in the `sections` array — an internal split delimiter leaked into the public structured output** — dogfooded 2026-04-30 by Jobdori on `e939777f`. Running `claw system-prompt --output-format json` returns `{"kind":"system-prompt","message":"<full prose>","sections":["You are an interactive agent...", "# System\n...", "# Doing tasks\n...", "# Executing actions with care\n...", "__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__", "# Environment context\n...", "# Project context\n...", "# Claude instructions\n...", "# Runtime config\n..."]}`. The `sections` array has 9 elements; element index 4 is the raw string `"__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"`. This internal sentinel marks the boundary between the static and dynamic sections of the compiled system prompt, used during assembly to split the prompt at injection time. It appears in the public JSON output verbatim as a first-class section, indistinguishable from real sections by type alone. Automation that iterates `sections[]` must special-case this sentinel or it will process an internal implementation string as if it were a real system prompt section. **Required fix shape:** (a) strip `"__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"` and any similar internal delimiters from the `sections` array before serializing to JSON; (b) if the static/dynamic boundary is semantically meaningful for callers, expose it as a structured metadata field such as `boundary_index:4` or as a `section_type:"static"|"dynamic"` field on each section entry, not as a raw sentinel string in the array; (c) rename the `sections` type from `string[]` to `[{id, type, content}]` to enable this without breaking the boundary signal; (d) add regression coverage proving the `system-prompt --output-format json` output's `sections` array contains no elements whose value equals `"__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"` or matches `/__[A-Z_]+__/`. **Why this matters:** internal sentinel strings in public JSON are a contract liability — they couple the wire format to internal implementation details. Any refactor that renames or removes the sentinel breaks callers that don't special-case it, and automation that doesn't know to filter it will miscount, misparse, or misrender the system prompt. Source: Jobdori live dogfood, `e939777f`, 2026-04-30.
+422. **`export --output-format json` and `--resume latest` report the same "no managed sessions" scenario using two different `kind` codes — `no_managed_sessions` vs `session_load_failed` — making "no session found" undetectable by a single kind-code check** — dogfooded 2026-04-30 KST (UTC+9) by Jobdori on `e939777f`. Running `claw export --output-format json` with no session present returns (on stderr, exit 1): `{"error":"no managed sessions found in .claw/sessions/<fingerprint>/","hint":"Start \`claw\` to create a session, then rerun with \`--resume latest\`.\nNote: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.","kind":"no_managed_sessions","type":"error"}`. Running `claw --resume latest /status --output-format json` with no session present returns (on stderr, exit 1): `{"error":"failed to restore session: no managed sessions found in .claw/sessions/<fingerprint>/","hint":"Start \`claw\` to create a session, then rerun with \`--resume latest\`.\nNote: claw partitions sessions per workspace fingerprint; sessions from other CWDs are invisible.","kind":"session_load_failed","type":"error"}`. Both describe the same root condition — there are no sessions to operate on — but they expose it via different `kind` discriminants. Automation that checks `kind == "no_managed_sessions"` to detect a cold workspace will miss the `--resume` path's `session_load_failed`, and vice versa. A wrapper that guards "run with --resume only if a session exists" must special-case both codes. The hint text is identical between them, suggesting the messages are logically equivalent. Additionally neither code matches the proposed canonical names `session_not_found` / `session_load_failed` as stable `ErrorKind` discriminants described in ROADMAP #77's fix shape, which explicitly proposes typed error-kind codes for session lifecycle failures. **Required fix shape:** (a) unify "no sessions found for this workspace fingerprint" under a single canonical `kind` code — either `no_managed_sessions` or `session_not_found` — used consistently by every command path that encounters an empty session registry; (b) if `session_load_failed` is a more general category (covering e.g. corrupt session files, IO errors, schema version mismatches), it should nest a concrete `reason:"no_managed_sessions"` or `reason:"session_not_found"` sub-field so callers can distinguish "empty registry" from "found but unreadable"; (c) align with the canonical error-kind contract proposed in #77; (d) add regression coverage proving `export` and `--resume latest` in an empty workspace both return an error with the same top-level `kind` code. **Why this matters:** session guard-rails in orchestration need a single stable `kind` to detect cold workspaces without enumerating all possible no-session synonyms. Two divergent codes for the same condition make defensive automation brittle and contradict the promise of machine-readable error envelopes. Source: Jobdori live dogfood, `e939777f`, 2026-04-30 KST (UTC+9).
+
+426. **`claw plugins --output-format json` surface has zero test coverage in `output_format_contract.rs` — the contract tests that lock every other major JSON surface (`agents`, `mcp`, `skills`, `status`, `sandbox`, `doctor`, `help`, `version`, `acp`, `bootstrap-plan`, `system-prompt`, `init`, `diff`, `config`) do not include a single assertion about `plugins`, leaving the plugin JSON shape entirely untested and free to silently drift** — dogfooded 2026-05-01 KST (UTC+9) by Jobdori on `57096b0a`. Inspection of `rust/crates/rusty-claude-cli/tests/output_format_contract.rs` (11 tests) shows coverage for every major `--output-format json` surface except `plugins`. Live binary produces `{"action":"list","kind":"plugin","message":"<prose table>","reload_runtime":false,"target":null}` for `claw plugins --output-format json`, but no test asserts (a) that `kind == "plugin"`, (b) that `action == "list"`, (c) that the payload is valid JSON at all, (d) that `reload_runtime` is present and boolean, or (e) that `target` is present and null/typed. The only plugin-related test in `mock_parity_harness.rs` covers plugin tool execution round-trip, not the CLI inspection surface. This means the `plugins list` JSON schema documented in ROADMAP #416 (mutation-shape envelope, missing `plugins:[]` array) can silently regress or change shape between commits with no CI signal. By contrast, if `agents list` drops the `agents[]` field, `inventory_commands_emit_structured_json_when_requested` catches it immediately. **Required fix shape:** (a) add a `plugins_command_emits_json_when_requested` test in `output_format_contract.rs` that: runs `claw plugins --output-format json` in an isolated temp dir, asserts stdout is valid JSON, asserts `kind == "plugin"`, asserts `action == "list"`, asserts `reload_runtime` is boolean, asserts `target` is JSON null when no plugin is targeted; (b) add a second assertion variant for `claw plugins enable <name> --output-format json` to confirm the lifecycle-mutation envelope fields are stable; (c) pair with the existing ROADMAP #416 fix so when `plugins:[]` array is added, a test assertion locks it; (d) extend to `plugins disable`, `plugins install`, and `plugins uninstall` shapes once they are implemented. **Why this matters:** test brittleness — the plugin JSON surface is the only major `--output-format json` surface with zero contract coverage. Any schema change, regression, or panic in `plugins` dispatch goes undetected until a downstream claw hits it at runtime. CI's four-check suite gives a false assurance of coverage for plugin JSON because `cargo test --workspace` passes even with zero plugin contract assertions. Source: Jobdori dogfood + test-file inspection, `57096b0a`, 2026-05-01 KST (UTC+9). Cross-reference: #416 (plugins list returns mutation shape, no plugins array).
Author	SHA1	Message	Date
YeonGyu-Kim	527dd6b1ce	docs(roadmap): add #426 — plugins json has zero output_format_contract test coverage	2026-05-01 05:02:38 +09:00
YeonGyu-Kim	57096b0a1a	docs(roadmap): add no-session kind drift item Adds ROADMAP #422 documenting the concrete export/resume no-session ErrorKind drift.	2026-05-01 02:46:12 +09:00