diff --git a/ROADMAP.md b/ROADMAP.md index 8a0aec37..56ecd867 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6569,3 +6569,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed) 506. **SSE stream parsers repeatedly rescan and drain from the front of a growing buffer, making large batched streams quadratic and adding avoidable latency to provider streaming/event handling** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ad8a0b3`, after Jobdori noted the unfiled parser shape. Code inspection: `api/src/sse.rs::SseParser::push` appends a chunk, then loops `while let Some(frame) = self.next_frame()`. `next_frame` searches `self.buffer.windows(2)` from byte 0 for `\n\n` (then separately scans `windows(4)` for `\r\n\r\n`), drains `..position+separator_len` into a new `Vec`, and converts to `String`. `api/src/providers/openai_compat.rs::next_sse_frame` duplicates the same algorithm. `runtime/src/sse.rs::IncrementalSseParser::push_chunk` does the string analogue with repeated `self.buffer.find('\n')` plus `drain(..=index)`. For a single network read or proxy flush containing thousands of small SSE frames/lines, each extracted frame/line rescans and moves the remaining buffer from the front; total work trends O(N²) in bytes/frames and allocates a fresh buffer per frame. **Required fix shape:** (a) replace front-drain parsing with an index/cursor-based parser (`scan_pos`, `consumed_until`) and compact the buffer only occasionally; (b) search for `\n\n`/`\r\n\r\n` from the previous scan position, not from 0 every loop; (c) share one bounded SSE framing helper between Anthropic and OpenAI-compatible providers; (d) add a micro-benchmark or regression that pushes one chunk containing 10k tiny frames and asserts linear-ish parse time/allocation behavior; (e) add a max pending-buffer size and emit a typed stream framing error when no separator arrives before the cap. **Why this matters:** streaming is the main event/log surface for prompt delivery, tool calls, and usage. A proxy, provider, or test harness that batches many small SSE frames into one chunk should not turn the parser into a CPU/allocation hotspot or make streaming look stalled before any model event is delivered. Source: gaebal-gajae dogfood response to Clawhip message `1506718176509952130` on 2026-05-20. 507. **Anthropic requests can serialize/render the full message body three times before the real `/v1/messages` call: local preflight JSON byte estimate, remote `count_tokens` request body, then final message request body** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7bc373f`, after Jobdori filed the sibling OpenAI-compatible double-build issue (#508). Code inspection: `AnthropicProvider::message` and `stream_message` call `self.preflight_message_request(...)`; that first invokes `super::preflight_message_request(request)`, whose `estimate_serialized_tokens` serializes `messages`, `system`, `tools`, and `tool_choice` via `serde_json::to_vec`. If a model limit is known, `preflight_message_request` then calls `count_tokens`, which does `self.request_profile.render_json_body(request)?`, strips fields, and posts the full `/v1/messages/count_tokens` body. After preflight succeeds, `send_raw_request` renders the same full body again with `self.request_profile.render_json_body(request)?` and sends `/v1/messages`. So a large session pays at least one local serialization plus two full Anthropic-body renders; if `count_tokens` fails, the fallback still paid for rendering that body before the final render. **Required fix shape:** (a) memoize/render the Anthropic request body once per call and reuse it for count_tokens and final send where schemas are identical or share a base projection; (b) use a streaming/estimated byte counter for the local guard instead of serializing large subtrees into throwaway `Vec`s; (c) skip remote `count_tokens` when the local estimate is far below known limits unless strict mode requires it; (d) add an instrumentation test with a large message vector proving one-shot and streaming calls do not render/serialize the full request more than once per network target. **Why this matters:** long-context sessions are already context-window and latency sensitive. Doing multiple complete JSON render/serialization passes before every Anthropic call wastes CPU/memory and makes prompt delivery look slower or stalled under large histories, especially when paired with retries and stream startup. Source: gaebal-gajae dogfood response to Clawhip message `1506725730392870952` on 2026-05-20. + +508. **Config/plan-mode writes use direct `std::fs::write` with no atomic rename or advisory lock, so crashes and concurrent tool calls can corrupt `settings.json`, `settings.local.json`, or plan-mode state** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@e2b96ea`. Code inspection: `execute_config` reads a settings JSON object, mutates it, and persists through `write_json_object`; `EnterPlanMode`/`ExitPlanMode` also call `write_json_object` for `.claw/settings.local.json` and `write_plan_mode_state` for `.claw/tool-state/plan-mode.json`. Both `write_json_object` and `write_plan_mode_state` create parent directories and then call `std::fs::write(path, serde_json::to_string_pretty(...))` directly on the destination. There is no temp file + fsync + rename, and no lock around the read-modify-write window. By contrast, the OAuth credentials writer already uses a safer `.tmp` then rename pattern. Two concurrent `Config`/plan-mode calls can both read the same old document and last-writer-wins one update; a crash/interruption mid-write can leave a truncated JSON settings file that later startup/doctor must diagnose. **Required fix shape:** (a) introduce a shared `atomic_write_json(path, value)` helper using same-directory temp file, flush/fsync, and rename; (b) wrap read-modify-write config mutations in an advisory lock (`settings.json.lock` / `settings.local.json.lock`) or a compare-and-swap retry loop; (c) use the same helper for `write_plan_mode_state`, `write_agent_manifest` where appropriate, and any future JSON state files; (d) add stress/regression tests with two concurrent config writes and a simulated partial-write failure proving no malformed JSON and no lost sibling setting. **Why this matters:** config and plan-mode are control-plane state. A supposedly safe tool call should not be able to silently lose another setting or leave the workspace unable to load settings after an interrupted write; that turns a small config action into startup friction and stale permission-mode confusion. Source: gaebal-gajae dogfood response to Clawhip message `1506733276037910700` on 2026-05-20.