docs(roadmap): add utf8 truncation panic

2026-05-21 05:06:44 +00:00 · 2026-05-20 16:00:53 +00:00
parent 51ea1aa01e
commit f751c98ea5
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6557,3 +6557,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 500. **`REPL` tool returns unbounded raw `stdout`/`stderr` strings, so a tiny inline snippet can inject megabytes of output into the model context just like the NotebookEdit/TodoWrite amplification class** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@25b8dbb`. Code inspection: `execute_repl` in `rust/crates/tools/src/lib.rs` runs python/node/shell snippets with piped stdout/stderr, then returns `ReplOutput { stdout: String::from_utf8_lossy(&output.stdout).into_owned(), stderr: String::from_utf8_lossy(&output.stderr).into_owned(), ... }` serialized via `to_pretty_json`. There is no byte cap, line cap, truncation marker, or structured artifact path. A user/model can run `REPL(language:"python", code:"print('x'*5_000_000)")` and the full 5MB output is returned to the model as a JSON string; stderr has the same issue. This is distinct from bash timeout/provenance handling because REPL is marketed as a structured execution helper, yet it has no output budget. **Required fix shape:** (a) cap `stdout` and `stderr` in `ReplOutput` (e.g. first/last 64KB) with `stdout_truncated`, `stderr_truncated`, `stdout_bytes`, `stderr_bytes`; (b) optionally spill full output to an artifact file and return `artifact_path` only when safe; (c) apply the same cap to PowerShell/bash if not already covered; (d) add regressions for large stdout/stderr from python/node/shell proving the serialized tool result stays bounded and includes truncation metadata. **Why this matters:** REPL is an easy path for accidental context-window blowups (`print(large_df)`, stack traces, generated JSON). Without output budgets, a single tool call can consume the context window, trigger compaction, or hide the useful signal behind megabytes of raw output. Source: gaebal-gajae dogfood response to Clawhip message `1506672878047989812` on 2026-05-20.
 501. **`PowerShell`/shared `execute_shell_command` returns unbounded raw stdout/stderr in every foreground path, so shell output can still flood the model context even after the REPL output-amplification finding** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7e73cdb`. Code inspection: `execute_powershell` delegates to `execute_shell_command`, whose timeout-success, timeout-kill, and no-timeout paths all construct `runtime::BashCommandOutput` with `stdout: String::from_utf8_lossy(&output.stdout).into_owned()` and `stderr: String::from_utf8_lossy(&output.stderr).into_owned()`. The background path discards output, but every foreground path serializes the full captured streams. A command like `PowerShell(command:"1..1000000 | % { 'x' }")` or any verbose test/log script can return megabytes of JSON-escaped output to the model. The timeout path is especially bad: it kills the process but still returns everything produced before the timeout plus the timeout footer, with no cap or `truncated` metadata. **Required fix shape:** (a) introduce a shared output-budget helper for all shell-like tools (`bash`, `PowerShell`, `REPL`) with byte caps, first/last slicing, and `stdout_truncated`/`stderr_truncated`/byte-count metadata; (b) preserve existing `raw_output_path`/`persisted_output_path` semantics by spilling full streams to artifact files when safe; (c) apply caps before JSON serialization in both success and timeout branches; (d) add regressions using stub `pwsh` and shell commands that emit >cap stdout/stderr and timeout-with-output. **Why this matters:** verbose tests and build tools are normal claw-code workflows. A single noisy PowerShell/shell command should not silently consume the context window or force compaction; the model needs a bounded summary plus a way to fetch artifacts if the full log is needed. Source: gaebal-gajae dogfood response to Clawhip message `1506680431620395091` on 2026-05-20.
 502. **HTTP request tool truncates response bodies with byte indexing (`&body[..8192]`), so any multibyte UTF-8 character crossing the 8192-byte boundary can panic instead of returning a bounded tool result** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51ea1aa`. Code inspection: the HTTP request handler reads `let body = response.text().unwrap_or_default();` then, when `body.len() > 8192`, builds `format!("{}\n\n[response truncated — {} bytes total]", &body[..8192], body.len())`. `String::len()` is bytes, and Rust string slicing requires a character boundary. A response like `"a".repeat(8191) + "é" + ...` has length >8192 but byte offset 8192 is in the middle of the two-byte `é`; `&body[..8192]` panics. Nearby `preview_text` correctly truncates by chars, so the safe helper already exists but is not used here. **Required fix shape:** (a) replace direct byte slicing with a UTF-8-safe truncation helper (`char_indices`/`floor_char_boundary` or reuse `preview_text` plus byte-count metadata); (b) report both original byte length and whether truncation occurred; (c) apply the same helper to all response/body truncation paths; (d) add a regression with a local HTTP response whose 8192nd byte is inside a multibyte character and assert the tool returns JSON with `truncated:true` instead of panicking. **Why this matters:** non-English pages, emoji-heavy logs, and binary-ish HTTP responses are common. A truncation path intended to protect the context window should never crash the tool runtime on valid UTF-8. Source: gaebal-gajae dogfood response to Clawhip message `1506687983397376103` on 2026-05-20.