docs(roadmap): add anthropic request triple serialization

2026-05-20 20:56:44 +00:00 · 2026-05-20 18:31:01 +00:00
parent 7bc373f951
commit e2b96ead3d
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6567,3 +6567,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 505. **G004 report conformance expects `schemaVersion:"g004.report.v1"`, but the runtime's canonical report implementation emits `schemaVersion:"claw.report.v1"`, so first-party canonical reports cannot pass the repo's own G004 bundle validator without an undocumented schema rewrite** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@19a1918`. Code inspection: `runtime/src/g004_conformance.rs` hardcodes `const REPORT_SCHEMA_VERSION: &str = "g004.report.v1"` and `validate_reports` requires every `/reports/*/schemaVersion` to equal that value. Separately, the actual report schema module defines `pub const REPORT_SCHEMA_V1: &str = "claw.report.v1"`; `canonicalize_report` overwrites reports with `report.schema_version = REPORT_SCHEMA_V1.to_string()`, and the report registry/capability projection also key off `claw.report.v1`. Grep shows no adapter mapping `claw.report.v1` to `g004.report.v1`; the G004 fixture is hand-authored with `g004.report.v1`. Result: a real `CanonicalReportV1` produced by runtime and inserted into a G004 contract bundle is rejected by `validate_g004_contract_bundle` solely on schema-version mismatch. **Required fix shape:** (a) decide whether G004 should validate the first-party `claw.report.v1` schema directly or introduce an explicit projection adapter from `CanonicalReportV1` to `g004.report.v1`; (b) do not hardcode a competing report schema string in the conformance helper without a conversion path; (c) add a regression that builds a canonical report via `canonicalize_report`, wraps it in a G004 bundle with valid lane events, and verifies either acceptance or a typed `unsupported_schema_version` with documented adapter guidance; (d) update fixtures to use the same path real producers use. **Why this matters:** conformance tests should protect interoperability, not validate an artificial fixture dialect that production cannot emit. Otherwise downstream report consumers see event/log opacity: the report looks valid to the runtime registry and invalid to the G004 bundle validator. Source: gaebal-gajae dogfood response to Clawhip message `1506710631254986793` on 2026-05-20.
 506. **SSE stream parsers repeatedly rescan and drain from the front of a growing buffer, making large batched streams quadratic and adding avoidable latency to provider streaming/event handling** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ad8a0b3`, after Jobdori noted the unfiled parser shape. Code inspection: `api/src/sse.rs::SseParser::push` appends a chunk, then loops `while let Some(frame) = self.next_frame()`. `next_frame` searches `self.buffer.windows(2)` from byte 0 for `\n\n` (then separately scans `windows(4)` for `\r\n\r\n`), drains `..position+separator_len` into a new `Vec`, and converts to `String`. `api/src/providers/openai_compat.rs::next_sse_frame` duplicates the same algorithm. `runtime/src/sse.rs::IncrementalSseParser::push_chunk` does the string analogue with repeated `self.buffer.find('\n')` plus `drain(..=index)`. For a single network read or proxy flush containing thousands of small SSE frames/lines, each extracted frame/line rescans and moves the remaining buffer from the front; total work trends O(N²) in bytes/frames and allocates a fresh buffer per frame. **Required fix shape:** (a) replace front-drain parsing with an index/cursor-based parser (`scan_pos`, `consumed_until`) and compact the buffer only occasionally; (b) search for `\n\n`/`\r\n\r\n` from the previous scan position, not from 0 every loop; (c) share one bounded SSE framing helper between Anthropic and OpenAI-compatible providers; (d) add a micro-benchmark or regression that pushes one chunk containing 10k tiny frames and asserts linear-ish parse time/allocation behavior; (e) add a max pending-buffer size and emit a typed stream framing error when no separator arrives before the cap. **Why this matters:** streaming is the main event/log surface for prompt delivery, tool calls, and usage. A proxy, provider, or test harness that batches many small SSE frames into one chunk should not turn the parser into a CPU/allocation hotspot or make streaming look stalled before any model event is delivered. Source: gaebal-gajae dogfood response to Clawhip message `1506718176509952130` on 2026-05-20.
 507. **Anthropic requests can serialize/render the full message body three times before the real `/v1/messages` call: local preflight JSON byte estimate, remote `count_tokens` request body, then final message request body** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@7bc373f`, after Jobdori filed the sibling OpenAI-compatible double-build issue (#508). Code inspection: `AnthropicProvider::message` and `stream_message` call `self.preflight_message_request(...)`; that first invokes `super::preflight_message_request(request)`, whose `estimate_serialized_tokens` serializes `messages`, `system`, `tools`, and `tool_choice` via `serde_json::to_vec`. If a model limit is known, `preflight_message_request` then calls `count_tokens`, which does `self.request_profile.render_json_body(request)?`, strips fields, and posts the full `/v1/messages/count_tokens` body. After preflight succeeds, `send_raw_request` renders the same full body again with `self.request_profile.render_json_body(request)?` and sends `/v1/messages`. So a large session pays at least one local serialization plus two full Anthropic-body renders; if `count_tokens` fails, the fallback still paid for rendering that body before the final render. **Required fix shape:** (a) memoize/render the Anthropic request body once per call and reuse it for count_tokens and final send where schemas are identical or share a base projection; (b) use a streaming/estimated byte counter for the local guard instead of serializing large subtrees into throwaway `Vec`s; (c) skip remote `count_tokens` when the local estimate is far below known limits unless strict mode requires it; (d) add an instrumentation test with a large message vector proving one-shot and streaming calls do not render/serialize the full request more than once per network target. **Why this matters:** long-context sessions are already context-window and latency sensitive. Doing multiple complete JSON render/serialization passes before every Anthropic call wastes CPU/memory and makes prompt delivery look slower or stalled under large histories, especially when paired with retries and stream startup. Source: gaebal-gajae dogfood response to Clawhip message `1506725730392870952` on 2026-05-20.