diff --git a/ROADMAP.md b/ROADMAP.md index 3c6534fa..403f7743 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6563,3 +6563,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed) 503. **`WebFetch`/`WebSearch` download full response bodies with `response.text()` before previewing, so large pages can allocate unbounded memory and stall the tool despite returning only a 900-char preview or eight search hits** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f751c98`. Code inspection: `execute_web_fetch` calls `response.text()` into `body`, records `bytes = body.len()`, normalizes the whole body (`html_to_text` for HTML), collapses all whitespace, and only then trims via `preview_text(..., 900)`. `execute_web_search` similarly calls `response.text()` on the search response, then parses links and truncates hits to 8. The HTTP client has a 20s timeout and redirect cap, but there is no `Content-Length` guard, streaming byte cap, decompressed-size cap, or early abort. A large/decompression-bomb HTML response can force multi-megabyte/GB allocation and full text normalization even though the returned result is tiny. **Required fix shape:** (a) add a max download/decompressed body size for web tools (configurable but safe default); (b) reject/abort early on `Content-Length` above cap and enforce a streaming cap while reading chunks; (c) record `body_truncated:true`, `bytes_read`, and `content_length` metadata in the tool output; (d) make `html_to_text`/search extraction operate on the capped buffer; (e) add local HTTP regressions for huge `Content-Length`, chunked oversized body, and compressed oversized body proving bounded memory/time. **Why this matters:** `WebFetch` is a common lightweight alternative to browser automation. Its output is intentionally small, but the hidden pre-output work is unbounded; a hostile or simply large page can make a dogfood session look hung or OOM the runtime before any useful event/log signal is emitted. Source: gaebal-gajae dogfood response to Clawhip message `1506695531307733082` on 2026-05-20. 504. **`AgentOutput.laneEvents` produced by real agent runs violates the G004 conformance contract because production `LaneEvent::new` emits `metadata.seq=0` for every event while the validator requires strictly increasing sequence numbers** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8e8dea5`, building on Jobdori's live #505 `LaneEvent::new seq=0` report. Code inspection: `AgentOutput` manifests are initialized with `LaneEvent::started(...)` and terminal persistence appends `LaneEvent::blocked/failed/finished/commit_created(...)`; all of those convenience constructors route through `LaneEvent::new(... metadata: LaneEventMetadata::new(0, EventProvenance::LiveLane))`. `write_agent_manifest` only dedupes commit events and does not restamp sequences. Meanwhile `runtime/src/g004_conformance.rs::validate_lane_events` explicitly requires `/metadata/seq` to be present and strictly increasing (`if seq <= previous { "sequence must be strictly increasing" }`). Therefore any successful agent manifest with `lane.started` + `lane.finished` or failed manifest with `lane.started` + `lane.blocked` + `lane.failed` is invalid under the repo's own G004 contract, even before external consumers sort by seq. **Required fix shape:** (a) restamp lane event metadata seqs before manifest write (`for (idx,event) in lane_events.iter_mut().enumerate() { event.metadata.seq = idx as u64 + 1; }`) as an immediate containment, or better stamp from a per-session event counter at creation; (b) run `validate_g004_contract_bundle` (or an AgentOutput-specific wrapper) in tests against real initialized/success/failed manifests; (c) add a regression that `write_agent_manifest` never persists duplicate/non-increasing seqs after terminal append/dedupe; (d) keep `reconcile_terminal_events` sorting semantics meaningful by ensuring production seqs are nonzero and monotonic. **Why this matters:** this is event/log opacity in the literal contract layer: the product advertises machine-checkable event ordering, but real persisted manifests fail that checker. Downstream clawhips/watchers either cannot trust the conformance helper or must special-case production data. Source: gaebal-gajae dogfood response to Clawhip message `1506703078492082197` on 2026-05-20. + +505. **G004 report conformance expects `schemaVersion:"g004.report.v1"`, but the runtime's canonical report implementation emits `schemaVersion:"claw.report.v1"`, so first-party canonical reports cannot pass the repo's own G004 bundle validator without an undocumented schema rewrite** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@19a1918`. Code inspection: `runtime/src/g004_conformance.rs` hardcodes `const REPORT_SCHEMA_VERSION: &str = "g004.report.v1"` and `validate_reports` requires every `/reports/*/schemaVersion` to equal that value. Separately, the actual report schema module defines `pub const REPORT_SCHEMA_V1: &str = "claw.report.v1"`; `canonicalize_report` overwrites reports with `report.schema_version = REPORT_SCHEMA_V1.to_string()`, and the report registry/capability projection also key off `claw.report.v1`. Grep shows no adapter mapping `claw.report.v1` to `g004.report.v1`; the G004 fixture is hand-authored with `g004.report.v1`. Result: a real `CanonicalReportV1` produced by runtime and inserted into a G004 contract bundle is rejected by `validate_g004_contract_bundle` solely on schema-version mismatch. **Required fix shape:** (a) decide whether G004 should validate the first-party `claw.report.v1` schema directly or introduce an explicit projection adapter from `CanonicalReportV1` to `g004.report.v1`; (b) do not hardcode a competing report schema string in the conformance helper without a conversion path; (c) add a regression that builds a canonical report via `canonicalize_report`, wraps it in a G004 bundle with valid lane events, and verifies either acceptance or a typed `unsupported_schema_version` with documented adapter guidance; (d) update fixtures to use the same path real producers use. **Why this matters:** conformance tests should protect interoperability, not validate an artificial fixture dialect that production cannot emit. Otherwise downstream report consumers see event/log opacity: the report looks valid to the runtime registry and invalid to the G004 bundle validator. Source: gaebal-gajae dogfood response to Clawhip message `1506710631254986793` on 2026-05-20.