docs(roadmap): add whole-file structured patch amplification

2026-05-21 13:16:45 +00:00 · 2026-05-20 22:00:40 +00:00
parent b3b9eb23b5
commit ea29fbeb44
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6579,3 +6579,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 511. **`grep_search` collects every file and every matching content line before applying `head_limit`, so a small requested result can still scan/read/store an unbounded workspace worth of data** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@02f2887`. Code inspection: `grep_search_impl` first calls `collect_search_files(&base_path)?`, which walks the entire tree into a `Vec<PathBuf>` with no ignored-dir policy, file-count cap, or early stop. For every candidate it then does `fs::read_to_string(&file_path)` with no per-file size guard (unlike `read_file`'s 10MB max) and, for `output_mode == "content"`, pushes every matched/context line into `content_lines`. Only after the full traversal does it call `apply_limit(filenames, input.head_limit, input.offset)` and later `apply_limit(content_lines, head_limit, offset)`. The default limit is 250 output items, but it is not an execution budget: a repo with huge generated text files or thousands of matches still pays full read/regex/memory cost before returning 250 lines. **Required fix shape:** (a) stream search results and stop once `offset + head_limit` content lines/files have been collected, while continuing only if `count` mode explicitly needs totals; (b) add skip dirs/file-size guards shared with `glob_search` (`.git`, `node_modules`, `target`, etc.) and binary detection; (c) expose `truncated:true`, `files_scanned`, `files_skipped_size`, and `matches_seen` metadata; (d) add regression fixtures with a huge file and many matches proving `head_limit:1` does not read/accumulate the entire workspace. **Why this matters:** grep is a read-only diagnostic primitive. `head_limit` currently protects only the final JSON size, not runtime CPU/memory or accidental context blowups, so common searches in generated/vendor-heavy repos can look like tool hangs even when the caller asked for one line. Source: gaebal-gajae dogfood response to Clawhip message `1506763474955403414` on 2026-05-20.

 512. **`glob_search` traverses and stores all matches before truncating to 100, then sorts them by metadata, so `truncated:true` still means the full workspace scan already happened** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@0864f39`. Code inspection: `glob_search_impl` expands brace patterns, derives a walk root, then runs `WalkDir::new(&walk_root).filter_entry(...)` and pushes every matching file into `matches`. Only after all patterns and all entries are exhausted does it sort the entire `matches` vector by `fs::metadata(path).modified()` and compute `truncated = matches.len() > 100`, then `take(100)` for output. The ignored-dir list helps common vendor dirs, but there is no max traversal count, max match count, timeout/deadline, or early stop once enough results are known. A broad pattern like `**/*` in a generated workspace can collect/sort/stat tens or hundreds of thousands of paths just to return 100 names. **Required fix shape:** (a) add execution budgets for glob traversal (`max_entries_scanned`, `max_matches_collected`, optional deadline); (b) stream/top-k results instead of collecting every match before truncation, or make `sort_by_mtime` opt-in when exact newest-100 is required; (c) return `entries_scanned`, `matches_seen`, `truncated_reason`, and `ignored_dirs` metadata; (d) share traversal budget primitives with `grep_search` so read-only discovery tools fail/degrade consistently; (e) add a regression with >1000 generated files proving `glob_search` returns promptly without storing/sorting every match when capped. **Why this matters:** glob is a safe-looking read-only discovery tool, but broad globs in large repos are a common source of startup friction and apparent hangs. Output truncation alone is not enough; the work done before truncation must also be bounded and observable. Source: gaebal-gajae dogfood response to Clawhip message `1506771028834123986` on 2026-05-20.
+
+513. **`make_patch` is not a diff hunk generator; it emits the entire old file as removed lines plus the entire new file as added lines, so `structured_patch` itself can double the output size for every write/edit** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 22:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@b3b9eb2`. Code inspection: `runtime/src/file_ops.rs::make_patch(original, updated)` builds one `StructuredPatchHunk` by iterating `for line in original.lines() { lines.push(format!("-{line}")); }` and then `for line in updated.lines() { lines.push(format!("+{line}")); }`; `old_lines` and `new_lines` are the full line counts. This means the supposedly structured patch is a whole-file delete+whole-file add, not a minimal/contextual diff. Combined with #509's full `content`/`original_file` fields, editing a 1MB file can return: full original file, full new file, and a `structured_patch` containing full original+full new again. Even after removing raw content fields, `structured_patch` would still be an output-amplification bug unless it becomes a real bounded diff. **Required fix shape:** (a) replace `make_patch` with a real line diff/hunk algorithm that emits only changed hunks plus configurable context; (b) cap patch lines/bytes with `patch_truncated`, `omitted_hunks`, and full old/new byte counts; (c) for full-file rewrites, return a summary (`rewrite:true`, changed line counts, previews) rather than every line; (d) add regressions for one-line edit in a 10k-line file proving patch output is O(changed lines + context), not O(file size). **Why this matters:** structured patches are the right contract for coding tools, but a whole-file pseudo-patch creates the same context-window blowup as raw file echoes while looking machine-friendly. Review/debug loops need concise, truthful diffs. Source: gaebal-gajae dogfood response to Clawhip message `1506778574143754422` on 2026-05-20.