docs(roadmap): add lane event conformance mismatch

This commit is contained in:
Yeachan-Heo
2026-05-20 17:00:50 +00:00
parent 8e8dea5023
commit 19a19182ef

View File

@@ -6561,3 +6561,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
502. **HTTP request tool truncates response bodies with byte indexing (`&body[..8192]`), so any multibyte UTF-8 character crossing the 8192-byte boundary can panic instead of returning a bounded tool result** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51ea1aa`. Code inspection: the HTTP request handler reads `let body = response.text().unwrap_or_default();` then, when `body.len() > 8192`, builds `format!("{}\n\n[response truncated — {} bytes total]", &body[..8192], body.len())`. `String::len()` is bytes, and Rust string slicing requires a character boundary. A response like `"a".repeat(8191) + "é" + ...` has length >8192 but byte offset 8192 is in the middle of the two-byte `é`; `&body[..8192]` panics. Nearby `preview_text` correctly truncates by chars, so the safe helper already exists but is not used here. **Required fix shape:** (a) replace direct byte slicing with a UTF-8-safe truncation helper (`char_indices`/`floor_char_boundary` or reuse `preview_text` plus byte-count metadata); (b) report both original byte length and whether truncation occurred; (c) apply the same helper to all response/body truncation paths; (d) add a regression with a local HTTP response whose 8192nd byte is inside a multibyte character and assert the tool returns JSON with `truncated:true` instead of panicking. **Why this matters:** non-English pages, emoji-heavy logs, and binary-ish HTTP responses are common. A truncation path intended to protect the context window should never crash the tool runtime on valid UTF-8. Source: gaebal-gajae dogfood response to Clawhip message `1506687983397376103` on 2026-05-20.
503. **`WebFetch`/`WebSearch` download full response bodies with `response.text()` before previewing, so large pages can allocate unbounded memory and stall the tool despite returning only a 900-char preview or eight search hits** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f751c98`. Code inspection: `execute_web_fetch` calls `response.text()` into `body`, records `bytes = body.len()`, normalizes the whole body (`html_to_text` for HTML), collapses all whitespace, and only then trims via `preview_text(..., 900)`. `execute_web_search` similarly calls `response.text()` on the search response, then parses links and truncates hits to 8. The HTTP client has a 20s timeout and redirect cap, but there is no `Content-Length` guard, streaming byte cap, decompressed-size cap, or early abort. A large/decompression-bomb HTML response can force multi-megabyte/GB allocation and full text normalization even though the returned result is tiny. **Required fix shape:** (a) add a max download/decompressed body size for web tools (configurable but safe default); (b) reject/abort early on `Content-Length` above cap and enforce a streaming cap while reading chunks; (c) record `body_truncated:true`, `bytes_read`, and `content_length` metadata in the tool output; (d) make `html_to_text`/search extraction operate on the capped buffer; (e) add local HTTP regressions for huge `Content-Length`, chunked oversized body, and compressed oversized body proving bounded memory/time. **Why this matters:** `WebFetch` is a common lightweight alternative to browser automation. Its output is intentionally small, but the hidden pre-output work is unbounded; a hostile or simply large page can make a dogfood session look hung or OOM the runtime before any useful event/log signal is emitted. Source: gaebal-gajae dogfood response to Clawhip message `1506695531307733082` on 2026-05-20.
504. **`AgentOutput.laneEvents` produced by real agent runs violates the G004 conformance contract because production `LaneEvent::new` emits `metadata.seq=0` for every event while the validator requires strictly increasing sequence numbers** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8e8dea5`, building on Jobdori's live #505 `LaneEvent::new seq=0` report. Code inspection: `AgentOutput` manifests are initialized with `LaneEvent::started(...)` and terminal persistence appends `LaneEvent::blocked/failed/finished/commit_created(...)`; all of those convenience constructors route through `LaneEvent::new(... metadata: LaneEventMetadata::new(0, EventProvenance::LiveLane))`. `write_agent_manifest` only dedupes commit events and does not restamp sequences. Meanwhile `runtime/src/g004_conformance.rs::validate_lane_events` explicitly requires `/metadata/seq` to be present and strictly increasing (`if seq <= previous { "sequence must be strictly increasing" }`). Therefore any successful agent manifest with `lane.started` + `lane.finished` or failed manifest with `lane.started` + `lane.blocked` + `lane.failed` is invalid under the repo's own G004 contract, even before external consumers sort by seq. **Required fix shape:** (a) restamp lane event metadata seqs before manifest write (`for (idx,event) in lane_events.iter_mut().enumerate() { event.metadata.seq = idx as u64 + 1; }`) as an immediate containment, or better stamp from a per-session event counter at creation; (b) run `validate_g004_contract_bundle` (or an AgentOutput-specific wrapper) in tests against real initialized/success/failed manifests; (c) add a regression that `write_agent_manifest` never persists duplicate/non-increasing seqs after terminal append/dedupe; (d) keep `reconcile_terminal_events` sorting semantics meaningful by ensuring production seqs are nonzero and monotonic. **Why this matters:** this is event/log opacity in the literal contract layer: the product advertises machine-checkable event ordering, but real persisted manifests fail that checker. Downstream clawhips/watchers either cannot trust the conformance helper or must special-case production data. Source: gaebal-gajae dogfood response to Clawhip message `1506703078492082197` on 2026-05-20.