docs(roadmap): add grep heavy-dir traversal gap

2026-05-22 13:46:44 +00:00 · 2026-05-22 10:00:54 +00:00
parent 819f67b01b
commit ab5754a524
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6701,3 +6701,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 573. **`grep_search` treats `head_limit:0` as unlimited, so callers cannot request an empty/page-metadata probe and may accidentally dump the full match set** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 07:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@eb12c3d`. Active tmux session at probe time: `gajae-pr-330-review-v2`; no active claw-code implementation session. Code inspection: `rust/crates/runtime/src/file_ops.rs::apply_limit` is used for grep filenames and content lines. It computes `explicit_limit = limit.unwrap_or(250)`, but if `explicit_limit == 0` it returns the full post-offset item vector with `appliedLimit: None` instead of truncating to zero or rejecting the value. Therefore a caller using `head_limit:0` as a common “no rows, just metadata/count/page preflight” convention gets every matching filename/content line after the offset, bypassing the default 250 cap and potentially injecting a large result into the model context. Existing grep tests pass `head_limit: Some(10)` and do not cover zero-limit semantics. This also makes `appliedLimit` misleading: an explicit limit was supplied, but the output reports no applied limit. **Required fix shape:** (a) define `head_limit` as a positive integer and reject zero with `InvalidInput`, or make zero return an empty result with `appliedLimit:0` consistently; (b) never let zero disable truncation unless a separately named `unlimited:true` escape hatch exists; (c) add regressions for `head_limit:0` in filename and content modes, positive limits, default 250 truncation, and offset-only behavior; (d) ensure `appliedLimit` reflects the caller-supplied limit when accepted; (e) document pagination semantics so wrappers do not accidentally turn metadata probes into full dumps. **Why this matters:** search pagination is a context-window safety control. A zero limit should be safe or invalid, not the one value that disables the cap and can flood the assistant with every match. Source: gaebal-gajae dogfood response to Clawhip message `1507284411995918427` on 2026-05-22.

 574. **Kimi compatibility helpers only strip the `kimi/` routing prefix, so documented `dashscope/kimi-*` and `moonshot/kimi-*` slugs can leak provider prefixes onto the wire** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 09:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@c1bb355`. Active tmux session at probe time: `omx-issue-2462-madmax-lock-diagnostic`; no active claw-code implementation session. Channel context included Jobdori's reasoning-history #581 finding in `openai_compat.rs`; this probe inspected the same provider file for another Kimi/OpenAI-compatible model-routing contract gap. Code inspection: `model_rejects_is_error_field` intentionally recognizes `dashscope/kimi-k2.5` and `moonshot/kimi-k2.5` by stripping any prefix with `rsplit('/')`, and tests assert those slugs reject `is_error`. But `wire_model_for_base_url` / `strip_routing_prefix` only strip prefixes matching `openai|xai|grok|qwen|kimi`; `dashscope` and `moonshot` are not included. Therefore a request model like `dashscope/kimi-k2.5` can be treated as Kimi for tool-result compatibility while the serialized `model` sent to a DashScope/Moonshot-compatible endpoint remains `dashscope/kimi-k2.5` instead of the expected `kimi-k2.5`. Existing tests cover `strip_routing_prefix("kimi/kimi-k2.5")` but not `dashscope/kimi-k2.5` or `moonshot/kimi-k2.5`, despite those exact prefixes being listed in Kimi compatibility tests. **Required fix shape:** (a) decide the supported routing prefixes for Kimi/DashScope/Moonshot and use one shared prefix-strip helper for compatibility checks and wire model serialization; (b) add `dashscope` and `moonshot` where appropriate, or reject those prefixed slugs early with a typed configuration error; (c) add tests proving `wire_model_for_base_url` and `strip_routing_prefix` produce `kimi-k2.5` for every documented Kimi prefix; (d) keep OpenRouter/non-default OpenAI base-url slash-slug preservation semantics intact; (e) update docs/config examples so model slugs and provider routing prefixes are unambiguous. **Why this matters:** provider compatibility logic and wire serialization must agree on the model identity. If one path says “this is Kimi” while another sends a prefixed slug the backend may not understand, users get avoidable 400/model-not-found errors that look like provider instability. Source: gaebal-gajae dogfood response to Clawhip message `1507307060998443059` on 2026-05-22.
+
+575. **`grep_search` walks `.git`, `node_modules`, `target`, and other heavy directories even though `glob_search` skips them, so grep can waste startup/context time scanning generated/vendor files by default** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@819f67b`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs` defines `GLOB_SEARCH_IGNORED_DIRS` (`.git`, `node_modules`, `.build`, `target`, `dist`, `coverage`) and `glob_search_impl` applies it via `WalkDir::filter_entry(|entry| !should_skip_glob_dir(entry))`. But `grep_search_impl` calls `collect_search_files(&base_path)`, and `collect_search_files` uses raw `WalkDir::new(base_path)` with no `filter_entry` or ignore list. As a result, a default grep over a repo can descend through `.git/objects`, Rust `target/`, vendored `node_modules`, coverage, and dist outputs, then silently skip unreadable/non-UTF8 files (#571) or spend time decoding generated blobs before any model-visible answer. Existing tests prove `glob_search_skips_common_heavy_directories`, but no equivalent grep test exists. **Required fix shape:** (a) make `collect_search_files` share the same ignored-directory policy as `glob_search`, or define an explicit grep ignore policy with opt-in overrides; (b) add skipped/ignored directory counts to grep output so operators know scope was pruned; (c) add regressions where `.git`/`target` contain matching files but default grep ignores them, while direct path-to-file behavior remains explicit; (d) allow explicit include/override if users really want generated/vendor search; (e) align docs/tool schema so glob and grep default search scope semantics match. **Why this matters:** grep is a high-frequency dogfood/search tool. Scanning generated and VCS internals by default creates startup friction, noisy false matches, and token waste, especially in Rust/JS workspaces with huge `target` or `node_modules` trees. Source: gaebal-gajae dogfood response to Clawhip message `1507322157561151671` on 2026-05-22.