docs(roadmap): add todowrite output amplification

2026-05-20 20:56:44 +00:00 · 2026-05-20 14:31:04 +00:00
parent 3d2a047aaf
commit 25b8dbb313
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6551,3 +6551,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 497. **`PermissionEnforcer::check_bash` also allowlists `gh` as read-only, so GitHub-mutating commands (`gh pr merge`, `gh issue edit`, `gh repo delete`, etc.) can run in `read-only` mode** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 13:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@214176d`. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_read_only_command` includes `gh` in the read-only allowlist and only rejects `-i`, `--in-place`, ` > `, or ` >> `. There is no `gh` subcommand classifier analogous to `bash_validation.rs::validate_git_read_only` for `git`. Therefore `gh pr merge 123 --merge`, `gh issue edit 5 --add-label done`, `gh repo delete owner/repo --yes`, and `gh api repos/:owner/:repo/actions/runs/1/approve -X POST` are all classified as read-only by the runtime enforcer, even though they mutate remote GitHub state. **Required fix shape:** (a) remove `gh` from the blanket read-only allowlist or implement a conservative `gh` subcommand classifier; (b) allow only clearly read-only forms (`gh pr view/list`, `gh issue view/list`, `gh run view/list`, `gh api` without mutating method) and require workspace-write/danger or prompt for merge/edit/create/delete/api `-X POST|PATCH|PUT|DELETE`; (c) add regressions proving mutating `gh` commands are denied in `PermissionMode::ReadOnly` while view/list commands remain allowed if desired; (d) preferably replace the local heuristic with the canonical bash-validation pipeline so shell permission logic has one source of truth. **Why this matters:** read-only mode is not just filesystem safety; it should prevent external state mutation. A `gh pr merge` or `gh issue edit` from a supposedly read-only lane is a serious control-plane bypass and can alter public repo state without the permission escalation the mode implies. Source: gaebal-gajae dogfood response to Clawhip message `1506650232937513140` on 2026-05-20.

 498. **`PermissionEnforcer::check_bash` allowlists `cargo` and `rustc` as read-only, bypassing the canonical package/build-state classifier and allowing state-mutating Rust toolchain commands in `read-only` mode** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@5a4a8eb`. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_read_only_command` includes `cargo` and `rustc` in the read-only allowlist and only rejects `-i`, `--in-place`, ` > `, or ` >> `. The canonical `bash_validation.rs` pipeline treats `cargo` as package/build-state management (`STATE_MODIFYING_COMMANDS` / `PACKAGE_COMMANDS`) and has a regression that `npm install` is blocked in read-only mode, but the runtime enforcer does not call that pipeline. As a result, commands like `cargo install cargo-edit`, `cargo add anyhow`, `cargo generate-lockfile`, `cargo update`, `cargo fix --allow-dirty`, and `cargo build` (writes `target/`) are classified as read-only by the actual enforcer. `rustc -o out main.rs` likewise writes an output binary without shell redirection and is allowed. **Required fix shape:** (a) remove `cargo` and `rustc` from the blanket read-only allowlist; (b) optionally add a conservative subcommand classifier where only clearly non-mutating forms (`cargo --version`, `cargo metadata --no-deps` if proven side-effect-free, `rustc --version`) are read-only; (c) route `check_bash` through the canonical `bash_validation` pipeline; (d) add regressions for `cargo install`, `cargo add`, `cargo update`, `cargo build`, `cargo test`, and `rustc -o` under `PermissionMode::ReadOnly`. **Why this matters:** build/package commands routinely modify the workspace, global cargo home, lockfiles, and build artifacts. A read-only exploratory lane should not be able to install packages or rewrite lock/build outputs just because the first token is `cargo`. Source: gaebal-gajae dogfood response to Clawhip message `1506657779883184250` on 2026-05-20.
+
+499. **`TodoWrite` returns both `oldTodos` and `newTodos` in every tool result, so large task boards are echoed twice per update and repeatedly burn context even though the model only needs the delta/current list** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@3d2a047`. Code inspection: `run_todo_write` serializes `execute_todo_write(input)?` via `to_pretty_json`; `execute_todo_write` reads the persisted todo store into `old_todos`, writes the new/persisted list, then returns `TodoWriteOutput { old_todos, new_todos: input.todos, verification_nudge_needed }`. The JSON field names are `oldTodos` and `newTodos`, so every TodoWrite result contains the entire previous board plus the entire submitted board. For a 200-item task board, a one-item status change returns roughly 400 todo objects to the model; repeated status updates multiply the same backlog text across the context window. This is the same output-amplification class as NotebookEdit (#500), but on the core planning/task-control surface rather than notebooks. **Required fix shape:** (a) replace `oldTodos` with a compact diff (`changed:[{id/content,status_before,status_after}]`, `added`, `removed`, `unchanged_count`) or hide it behind a debug flag; (b) keep `newTodos` only if the current board is below a safe size, otherwise return `current_count`, `open_count`, `completed_count`, and a truncated active subset; (c) include `truncated:true`/`omitted_old_count` metadata for large boards; (d) add regressions proving single-item updates on large boards do not serialize the entire old board. **Why this matters:** TodoWrite is called frequently in multi-step sessions. Echoing full before/after state on every update creates context-window pressure, increases cost, and makes compaction summaries noisier without adding useful operator signal. Source: gaebal-gajae dogfood response to Clawhip message `1506665332478050344` on 2026-05-20.