docs(roadmap): add #330 — resume mode stats/cost always zero

Merge pull request #2838 from ultraworkers/docs/roadmap-322-323-clean
docs(roadmap): add #322 #323 — json stream corruption and session identity contradiction
2026-05-13 17:36:44 +00:00 · 2026-04-29 21:36:16 +09:00 · 2026-04-29 19:40:50 +09:00 · 2026-04-29 19:38:00 +09:00
1 changed files with 5 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6258,3 +6258,8 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 248. **Non-interactive prompt mode can exceed caller timeouts with no in-band startup/API phase event or partial status artifact** — dogfooded 2026-04-29 from live tmux session `claw-code-issue-247-human-fresh-run` after the owner explicitly asked gaebal-gajae to make a fresh session and use `claw-code` directly. The actual `./rust/target/debug/claw` binary was launched via `clawhip tmux new` on current main. `claw doctor --output-format json` and `claw status --output-format json` both succeeded and reported auth/config/workspace ok, but minimal non-interactive prompt calls (`timeout 120 ./rust/target/debug/claw --output-format json --dangerously-skip-permissions "echo hello"` and `timeout 120 ./rust/target/debug/claw --output-format json prompt "Reply with just the word hello"`) both timed out from the outer harness after roughly 150s with only `Command exceeded timeout` visible. There was no machine-readable `api_request_started`, `waiting_for_first_token`, provider/model/base-url identity, retry count, or partial status file/event that would let clawhip distinguish slow provider, network stall, auth/OAuth drift, stream parser hang, or prompt-mode bug. **Required fix shape:** (a) emit structured non-interactive lifecycle events for `startup_ok`, `api_request_started`, `first_byte/first_token`, retry/backoff, and terminal `timeout_or_stall` states; (b) include provider/model/base URL source and auth source category without leaking secrets; (c) support a CLI/request timeout flag or env override that returns a typed JSON error before the outer orchestrator kills the process; (d) write/emit a final partial status artifact on timeout so lane monitors do not have to infer state from a dead process. **Why this matters:** non-interactive prompt mode is the automation path; if it can hang past the caller's timeout while doctor/status are green, claws lose the ability to tell whether startup, auth, transport, provider latency, or stream consumption failed. Source: live session `claw-code-issue-247-human-fresh-run` on 2026-04-29.

 249. **`/issue` advertises GitHub issue creation but never reaches a GitHub/OAuth/auth preflight or creation path, and the non-interactive error suggests unusable resume forms** — dogfooded 2026-04-29 on current main `8e22f757` while chasing the remaining Phase-0 GitHub OAuth blocker. The visible help advertises `/issue [context]` as “Draft or create a GitHub issue from the conversation,” but the actual implementation path only renders a local `Issue` report (`format_issue_report`) and does not invoke `gh`, GitHub API, OAuth, token discovery, browser auth, or even a dry-run/auth-preflight surface. Direct non-interactive use (`./rust/target/debug/claw '/issue dogfood test'`) returns `slash command /issue dogfood test is interactive-only` and suggests `claw --resume SESSION.jsonl /issue ...` / `claw --resume latest /issue ...` “when the command is marked [resume]”, while `/help` does not mark `/issue` as resume-safe and resume dispatch rejects interactive-only commands. That leaves operators with a GitHub-labeled command whose real behavior is neither issue creation nor a clear GitHub OAuth blocker. **Required fix shape:** (a) split the contract explicitly: either rename/copy to “draft issue text” or implement a real `create` path with GitHub auth preflight; (b) surface a machine-readable GitHub auth state (`gh_cli_authenticated`, `github_token_present`, `oauth_required`, `creation_unavailable`) before any issue-create attempt; (c) make the direct-mode error avoid suggesting resume forms for commands not marked resume-safe; (d) add regression coverage proving `/issue` help, direct-mode rejection, resume support flags, and creation/draft behavior agree. **Why this matters:** Phase-0 GitHub OAuth verification cannot complete if the only GitHub issue surface stops at local prose while still advertising creation. Claws need to know whether they are missing GitHub auth, using a draft-only helper, or hitting an unimplemented creation path. Source: gaebal-gajae dogfood cycle in `#clawcode-building-in-public` on 2026-04-29.
+322. **Config deprecation warnings are emitted to stderr even under `--output-format json`, making JSON output unparseable from combined stdout+stderr capture** — dogfooded 2026-04-29 by Jobdori on current main (`8e22f75`). Running `cargo run --bin claw -- doctor --output-format json 2>&1 | python3 -c "import sys,json; json.loads(sys.stdin.read())"` fails with `Expecting value: line 1 column 1 (char 0)` because a `warning: /path/settings.json: field "enabledPlugins" is deprecated. Use "plugins.enabled" instead` line is emitted to stderr before the JSON body begins. When a caller captures combined output (the common automation pattern: `2>&1`, subprocess `STDOUT | STDERR`, PTY capture, or tmux pane scrape) the warning prefix breaks JSON parse for every downstream consumer. Root cause: `rust/crates/runtime/src/config.rs` line ~300 calls `eprintln!("warning: {warning}")` unconditionally during `ClawSettings::load_merged()` regardless of active output format. **Required fix shape:** (a) thread the active `CliOutputFormat` through the config loading path and suppress or defer human-readable warning strings when `json` mode is active; (b) instead, collect deprecation diagnostics and inject them into the JSON output as a top-level `"warnings": [...]` array (same field already used by `doctor`); (c) ensure the JSON body is always the first bytes on stdout and all prose warnings stay on stderr or are suppressed in json mode; (d) add regression coverage proving `claw <any-cmd> --output-format json` stdout is valid JSON regardless of config deprecation state. **Why this matters:** `--output-format json` is the automation/claw contract; if config warnings can silently corrupt the JSON stream, every orchestration layer that captures combined output gets broken parse-on-warning with no stable fallback. Source: Jobdori live dogfood on mengmotaHost, claw-code main `8e22f75`, 2026-04-29.
+
+323. **`status --output-format json` reports `session.session = "live-repl"` while simultaneously reporting `session_lifecycle.kind = "saved_only"` — contradictory session identity in a single status snapshot** — dogfooded 2026-04-29 by Jobdori on current main (`804d96b`). Running `claw status --output-format json` from an active REPL-style invocation produced `"session": "live-repl"` in the `workspace` block and `"session_lifecycle": {"kind": "saved_only", "pane_id": null, ...}` in the same object. Those two fields carry contradictory claims: `"live-repl"` asserts there is an active interactive session, while `"saved_only"` asserts there is no live tmux pane hosting the session — the session exists only as a saved artifact. A downstream claw reading this snapshot cannot tell which claim to trust: is this a running session whose pane is undetectable, or a saved-only session that the `session` field is misclassifying? Root cause: `"live-repl"` is a fallback sentinel emitted by `main.rs:6070` when `context.session_path` is `None`, while `session_lifecycle` is computed independently by `classify_session_lifecycle_for()` from tmux pane discovery; the two fields share no common source and can diverge. **Required fix shape:** (a) derive both `session.session` and `session_lifecycle.kind` from the same lifecycle classification result so they cannot diverge; (b) replace the `"live-repl"` free-form sentinel with a structured `session_kind` field (`live_repl`, `saved`, `resume`, etc.) that carries the same type vocabulary as `session_lifecycle.kind`; (c) when `session_lifecycle.kind = "saved_only"`, never emit `"session": "live-repl"` (or vice versa); (d) add a regression test proving `status --output-format json` never emits `session.kind = "live_repl"` and `session_lifecycle.kind = "saved_only"` simultaneously. **Why this matters:** `status --output-format json` is the machine-readable truth surface for session state; if two fields in the same snapshot contradict each other, every lane, monitor, and orchestrator has to pick a winner instead of reading a coherent state. Source: Jobdori live dogfood on mengmotaHost, claw-code `804d96b`, 2026-04-29.
+
+330. **`/stats` and `/cost` in `--resume` mode always return zero token counts regardless of saved session usage** — dogfooded 2026-04-29 by Jobdori on current main (`e7074f4`). Running `claw --output-format json --resume latest /stats` and `claw --output-format json --resume latest /cost` both return `{"input_tokens": 0, "output_tokens": 0, "total_tokens": 0, ...}` even when the session file was created by a real interactive claw run. The same zero-fill is produced by `claw --output-format json status` (usage sub-object). Inspection of the default session path (`~/.claw/sessions/.../session-*.jsonl`) shows the file has only 2 events (`session_meta`, `message`) with no usage/token records embedded. Two candidate root causes: (a) session serialization never writes usage events into the JSONL file even after real prompt exchanges, so resume-mode has no usage to replay; (b) resume-mode stat accumulation reads usage from memory-side counters that are always zero because no API calls happened in the resume-only invocation. Either way, the operator effect is the same: `--resume` + `/stats`/`/cost` cannot report what the session actually consumed. **Required fix shape:** (a) persist per-turn usage records (input/output/cache tokens per exchange) into the session JSONL at write time so resume-mode can reconstruct cumulative counts by replay; (b) expose a `"total_usage"` summary block at the tail of every session JSONL so resume-mode can read it in O(1) without full replay; (c) ensure `/stats` and `/cost` in resume-mode sum the persisted usage, not memory-side live counters; (d) add regression coverage proving `--resume SESSION.jsonl /stats` returns non-zero token counts after a session that made real API calls. **Why this matters:** token usage visibility is critical for quota management and cost attribution; if `--resume` mode always shows zeros, operators and orchestration lanes cannot trust usage data from saved sessions and must rely on external billing dashboards instead of in-band tooling. Source: Jobdori live dogfood on mengmotaHost, claw-code `e7074f4`, 2026-04-29.
Author	SHA1	Message	Date
YeonGyu-Kim	52161c781a	docs(roadmap): add #330 — resume mode stats/cost always zero	2026-04-29 21:36:16 +09:00
Bellman	e7074f47ee	Merge pull request #2838 from ultraworkers/docs/roadmap-322-323-clean docs(roadmap): add #322 #323 — json stream corruption and session identity contradiction	2026-04-29 19:40:50 +09:00
YeonGyu-Kim	9468383b67	docs(roadmap): add #322 #323 — json stream corruption and session identity contradiction	2026-04-29 19:38:00 +09:00