docs(roadmap): add grep output mode validation gap

2026-05-22 13:46:44 +00:00 · 2026-05-22 07:00:48 +00:00
parent ef67aa9f96
commit eb12c3d1ef
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6695,3 +6695,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 570. **`read_file` binary detection scans only the first 8192 bytes for NUL, so text-prefixed binaries can pass the binary gate and fail later as generic UTF-8 read errors** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 05:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a49d8e8`. Active tmux session at probe time: `gajae-issue-324-review-gate-degradation`; no active claw-code implementation session. Channel context from Jobdori pointed at `is_binary_file` line 30. Code inspection confirmed `rust/crates/runtime/src/file_ops.rs::is_binary_file` opens the path, reads exactly one 8192-byte chunk, and returns true only if that first chunk contains NUL. `read_file` then calls `fs::read_to_string` over the whole file. A file with an 8KB text header followed by NUL/non-UTF8 binary payload is therefore not rejected by the explicit binary gate; it reaches `read_to_string` and surfaces as a generic invalid UTF-8/io error instead of the stable `InvalidData: file appears to be binary` contract. Existing coverage writes NUL at byte 0 (`rejects_binary_files`) and does not cover delayed-NUL or delayed-non-UTF8 payloads. **Required fix shape:** (a) make binary/text validation cover the entire allowed read range (bounded by `MAX_READ_SIZE`) or stream decode as UTF-8 while detecting NUL/non-text bytes; (b) map any delayed NUL/non-UTF8 failure to the same stable binary-file error kind/message used by the early gate; (c) add regressions with NUL at byte 8192+, non-UTF8 after an ASCII prefix, and valid large UTF-8 text controls; (d) avoid loading oversized files by preserving the metadata size gate before full scan/decode; (e) include byte offset/evidence in diagnostics only if it does not leak file contents. **Why this matters:** `read_file` should fail predictably on binary artifacts. A small text header is common in mixed/generated files, and letting those bypass the binary gate creates inconsistent errors that look like brittle UTF-8/tool failures rather than an intentional binary-read refusal. Source: gaebal-gajae dogfood response to Clawhip message `1507254212403138672` on 2026-05-22.

 571. **`grep_search` silently drops unreadable or non-UTF8 files, so search results can look complete while binary/permission/encoding failures are hidden from the output contract** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 06:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8632f90`. Active tmux session at probe time: `gajae-issue-326-backlog-zero-continuation`; no active claw-code implementation session. Channel context included Jobdori's glob-sort #576 finding, so this probe stayed in the adjacent file-ops search surface but chose a different contract gap. Code inspection: `rust/crates/runtime/src/file_ops.rs::grep_search_impl` iterates `collect_search_files`, applies workspace/filter checks, then does `let Ok(file_contents) = fs::read_to_string(&file_path) else { continue; };`. Any file that exists in the search set but cannot be decoded as UTF-8, is transiently unreadable, or hits another read error is silently skipped. The returned `GrepSearchOutput` has `num_files`, `filenames`, optional `content`, and `num_matches`, but no `skipped_files`, `read_errors`, `binary_files`, or `truncated_due_to_errors` field. This differs from `read_file`, which intentionally rejects binary files with a stable error, and it makes grep output appear authoritative even when a subset of candidate files was ignored. Existing grep tests cover happy-path content matches but not a directory containing one matching text file plus one unreadable/binary/non-UTF8 file. **Required fix shape:** (a) track skipped candidate files by reason (`binary_or_non_utf8`, `permission_denied`, `read_error`) without leaking file contents; (b) expose counts and optionally bounded path samples in `GrepSearchOutput`; (c) preserve best-effort search behavior if desired, but mark the result as partial/degraded when any candidate is skipped; (d) add regressions with delayed-non-UTF8/binary and permission-denied files proving the output reports skipped counts; (e) align binary/non-UTF8 classification with the `read_file` fix required by #570 so file tools share one text/binary contract. **Why this matters:** grep is an observability surface. If it silently ignores files, agents can conclude “no matches” or report an incomplete match set without any evidence that encoding or permission failures narrowed the search. Source: gaebal-gajae dogfood response to Clawhip message `1507269308277850223` on 2026-05-22.
+
+572. **`grep_search` accepts unknown `output_mode` strings as filename-mode success, so typos like `contents` or `json` silently change the result contract instead of failing fast** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 07:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ef67aa9`. Active tmux session at probe time: `gajae-pr-327-backlog-zero-review`; no active claw-code implementation session. Channel context included Jobdori's adjacent grep duplicate-context #577 finding; this probe stayed in `grep_search_impl` but checked argument-contract handling. Code inspection: `rust/crates/runtime/src/file_ops.rs::grep_search_impl` sets `output_mode = input.output_mode.clone().unwrap_or_else(|| "files_with_matches")`, then special-cases only `"count"` and `"content"`. Any other string falls through to the default filename-list path and is echoed back as `mode: Some(output_mode.clone())`, with `content: None` and `num_matches: None`. That means misspellings such as `output_mode:"contents"`, `"files_with_match"`, or unsupported values like `"json"` return a successful-looking response whose `mode` claims the unsupported value while the payload shape is actually files-with-matches. There is no enum validation, no `InvalidInput`, and no test for unsupported output modes. **Required fix shape:** (a) validate `output_mode` against an explicit enum (`files_with_matches`, `content`, `count`) before reading files; (b) return a typed `InvalidInput`/machine-readable error listing supported values for unknown modes; (c) make the returned `mode` always match the actual payload semantics; (d) add regressions for `contents`, `files_with_match`, and valid `content`/`count`/default behavior; (e) keep this aligned with CLI `--output-format` enum validation gaps so search/tool contracts fail fast on typos. **Why this matters:** grep output shape drives model/tool parsing. A typo should not silently downgrade from content/count mode into filename mode while preserving the bogus mode label; that creates false negative evidence and parser confusion with no visible error. Source: gaebal-gajae dogfood response to Clawhip message `1507276861917368421` on 2026-05-22.