docs(roadmap): add webfetch redirect boundary gap

2026-05-22 13:46:44 +00:00 · 2026-05-22 11:00:48 +00:00
parent 571b4c2cd1
commit bff67c2b24
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6705,3 +6705,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 575. **`grep_search` walks `.git`, `node_modules`, `target`, and other heavy directories even though `glob_search` skips them, so grep can waste startup/context time scanning generated/vendor files by default** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@819f67b`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs` defines `GLOB_SEARCH_IGNORED_DIRS` (`.git`, `node_modules`, `.build`, `target`, `dist`, `coverage`) and `glob_search_impl` applies it via `WalkDir::filter_entry(|entry| !should_skip_glob_dir(entry))`. But `grep_search_impl` calls `collect_search_files(&base_path)`, and `collect_search_files` uses raw `WalkDir::new(base_path)` with no `filter_entry` or ignore list. As a result, a default grep over a repo can descend through `.git/objects`, Rust `target/`, vendored `node_modules`, coverage, and dist outputs, then silently skip unreadable/non-UTF8 files (#571) or spend time decoding generated blobs before any model-visible answer. Existing tests prove `glob_search_skips_common_heavy_directories`, but no equivalent grep test exists. **Required fix shape:** (a) make `collect_search_files` share the same ignored-directory policy as `glob_search`, or define an explicit grep ignore policy with opt-in overrides; (b) add skipped/ignored directory counts to grep output so operators know scope was pruned; (c) add regressions where `.git`/`target` contain matching files but default grep ignores them, while direct path-to-file behavior remains explicit; (d) allow explicit include/override if users really want generated/vendor search; (e) align docs/tool schema so glob and grep default search scope semantics match. **Why this matters:** grep is a high-frequency dogfood/search tool. Scanning generated and VCS internals by default creates startup friction, noisy false matches, and token waste, especially in Rust/JS workspaces with huge `target` or `node_modules` trees. Source: gaebal-gajae dogfood response to Clawhip message `1507322157561151671` on 2026-05-22.

 576. **`glob_search` brace expansion has no fan-out cap, so a single pattern with large brace groups can explode into thousands of `WalkDir` traversals before any timeout/result limit applies** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 10:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ab5754a`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/runtime/src/file_ops.rs::expand_braces` recursively expands the first `{...}` group by splitting on every comma and calling itself on each alternative. `glob_search_impl` then iterates every expanded pattern and creates a separate `WalkDir` traversal for each one before applying the hard `take(100)` result cap. There is no maximum number of alternatives, no recursion-depth/expanded-pattern cap, and no early deduplication by shared walk root. A user/model-supplied pattern like `**/*.{a,b,c,...hundreds}` or multiple brace groups can produce hundreds/thousands of full repo walks even though only 100 filenames can be returned. Existing tests cover a tiny `*.{rs,toml}` happy path and unmatched braces, but not expansion fan-out or timeout behavior. **Required fix shape:** (a) impose a small maximum expanded-pattern count and return `InvalidInput`/typed error when exceeded; (b) deduplicate shared walk roots or compile multiple patterns per root so alternatives do not trigger repeated full-tree walks; (c) include `expanded_pattern_count` and truncation/fanout metadata in diagnostics; (d) add regressions for large single brace groups, multiple nested brace groups, and normal small brace patterns; (e) keep the final 100-result cap but do not rely on it as protection against pre-result traversal explosions. **Why this matters:** glob is a model-facing discovery tool. Unbounded brace fan-out turns a compact pattern into many expensive filesystem walks, creating startup friction and an easy self-inflicted DoS before the existing result limit can help. Source: gaebal-gajae dogfood response to Clawhip message `1507329710257078353` on 2026-05-22.
+
+577. **`WebFetch` upgrades only the initial HTTP URL, but follows redirects without revalidating the final target, so HTTPS pages can redirect the tool into localhost/private HTTP endpoints** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 11:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@571b4c2`. Active tmux sessions at probe time: none. Code inspection: `rust/crates/tools/src/lib.rs::normalize_fetch_url` upgrades an initial `http://` URL to HTTPS unless the host is exactly `localhost`, `127.0.0.1`, or `::1`. But `build_http_client` enables `redirect(reqwest::redirect::Policy::limited(10))`, and `execute_web_fetch` sends the request once, then trusts `response.url()` as the final URL. There is no redirect policy that re-runs scheme/host/IP checks for each hop, no block for redirects from public HTTPS to `http://localhost`, `http://127.0.0.1`, RFC1918/link-local/metadata IPs, or non-HTTPS final URLs. As a result, a model-requested public URL can legally redirect the tool into local/private network resources even though direct non-local HTTP is upgraded and no SSRF boundary is documented. **Required fix shape:** (a) implement a custom redirect policy that validates every `Location` target before following it; (b) reject redirects to localhost, loopback, link-local, RFC1918/private ranges, cloud metadata hosts/IPs, and/or downgrade-to-HTTP unless explicitly allowed; (c) resolve hostnames before fetch or after redirect to prevent DNS-based private IP hops; (d) add tests with public HTTPS -> localhost/private/metadata redirects and safe HTTPS->HTTPS redirects; (e) expose final URL plus redirect-block reason in the `WebFetch` output/error without leaking response bodies. **Why this matters:** WebFetch is an external content tool. Redirects are part of the fetch boundary, not a postscript; without per-hop validation, a harmless-looking public URL can become an internal network probe or local service read. Source: gaebal-gajae dogfood response to Clawhip message `1507337256107642961` on 2026-05-22.