diff --git a/ROADMAP.md b/ROADMAP.md index ab56d275..fb524b62 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6559,3 +6559,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed) 501. **`PowerShell`/shared `execute_shell_command` returns unbounded raw stdout/stderr in every foreground path, so shell output can still flood the model context even after the REPL output-amplification finding** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7e73cdb`. Code inspection: `execute_powershell` delegates to `execute_shell_command`, whose timeout-success, timeout-kill, and no-timeout paths all construct `runtime::BashCommandOutput` with `stdout: String::from_utf8_lossy(&output.stdout).into_owned()` and `stderr: String::from_utf8_lossy(&output.stderr).into_owned()`. The background path discards output, but every foreground path serializes the full captured streams. A command like `PowerShell(command:"1..1000000 | % { 'x' }")` or any verbose test/log script can return megabytes of JSON-escaped output to the model. The timeout path is especially bad: it kills the process but still returns everything produced before the timeout plus the timeout footer, with no cap or `truncated` metadata. **Required fix shape:** (a) introduce a shared output-budget helper for all shell-like tools (`bash`, `PowerShell`, `REPL`) with byte caps, first/last slicing, and `stdout_truncated`/`stderr_truncated`/byte-count metadata; (b) preserve existing `raw_output_path`/`persisted_output_path` semantics by spilling full streams to artifact files when safe; (c) apply caps before JSON serialization in both success and timeout branches; (d) add regressions using stub `pwsh` and shell commands that emit >cap stdout/stderr and timeout-with-output. **Why this matters:** verbose tests and build tools are normal claw-code workflows. A single noisy PowerShell/shell command should not silently consume the context window or force compaction; the model needs a bounded summary plus a way to fetch artifacts if the full log is needed. Source: gaebal-gajae dogfood response to Clawhip message `1506680431620395091` on 2026-05-20. 502. **HTTP request tool truncates response bodies with byte indexing (`&body[..8192]`), so any multibyte UTF-8 character crossing the 8192-byte boundary can panic instead of returning a bounded tool result** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51ea1aa`. Code inspection: the HTTP request handler reads `let body = response.text().unwrap_or_default();` then, when `body.len() > 8192`, builds `format!("{}\n\n[response truncated — {} bytes total]", &body[..8192], body.len())`. `String::len()` is bytes, and Rust string slicing requires a character boundary. A response like `"a".repeat(8191) + "é" + ...` has length >8192 but byte offset 8192 is in the middle of the two-byte `é`; `&body[..8192]` panics. Nearby `preview_text` correctly truncates by chars, so the safe helper already exists but is not used here. **Required fix shape:** (a) replace direct byte slicing with a UTF-8-safe truncation helper (`char_indices`/`floor_char_boundary` or reuse `preview_text` plus byte-count metadata); (b) report both original byte length and whether truncation occurred; (c) apply the same helper to all response/body truncation paths; (d) add a regression with a local HTTP response whose 8192nd byte is inside a multibyte character and assert the tool returns JSON with `truncated:true` instead of panicking. **Why this matters:** non-English pages, emoji-heavy logs, and binary-ish HTTP responses are common. A truncation path intended to protect the context window should never crash the tool runtime on valid UTF-8. Source: gaebal-gajae dogfood response to Clawhip message `1506687983397376103` on 2026-05-20. + +503. **`WebFetch`/`WebSearch` download full response bodies with `response.text()` before previewing, so large pages can allocate unbounded memory and stall the tool despite returning only a 900-char preview or eight search hits** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f751c98`. Code inspection: `execute_web_fetch` calls `response.text()` into `body`, records `bytes = body.len()`, normalizes the whole body (`html_to_text` for HTML), collapses all whitespace, and only then trims via `preview_text(..., 900)`. `execute_web_search` similarly calls `response.text()` on the search response, then parses links and truncates hits to 8. The HTTP client has a 20s timeout and redirect cap, but there is no `Content-Length` guard, streaming byte cap, decompressed-size cap, or early abort. A large/decompression-bomb HTML response can force multi-megabyte/GB allocation and full text normalization even though the returned result is tiny. **Required fix shape:** (a) add a max download/decompressed body size for web tools (configurable but safe default); (b) reject/abort early on `Content-Length` above cap and enforce a streaming cap while reading chunks; (c) record `body_truncated:true`, `bytes_read`, and `content_length` metadata in the tool output; (d) make `html_to_text`/search extraction operate on the capped buffer; (e) add local HTTP regressions for huge `Content-Length`, chunked oversized body, and compressed oversized body proving bounded memory/time. **Why this matters:** `WebFetch` is a common lightweight alternative to browser automation. Its output is intentionally small, but the hidden pre-output work is unbounded; a hostile or simply large page can make a dogfood session look hung or OOM the runtime before any useful event/log signal is emitted. Source: gaebal-gajae dogfood response to Clawhip message `1506695531307733082` on 2026-05-20.