docs(roadmap): add worker replay receipt integrity gap

2026-05-23 14:06:44 +00:00 · 2026-05-22 21:01:16 +00:00
parent 71f85541bd
commit d2018d7aee
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6747,3 +6747,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 595. **OAuth authorize URL builder allows `extra_params` to override core PKCE/OAuth parameters after they were already set, so plugin/config extras can replace `state`, `code_challenge`, `redirect_uri`, or `response_type`** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@46f3bff`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime oauth -- --nocapture` passed 9/9, confirming current OAuth tests cover happy-path URL/form/callback parsing but not reserved extra-param collisions. Code inspection: `runtime/src/oauth.rs::OAuthAuthorizationRequest::build_url` creates a `params` vector containing core parameters (`response_type=code`, `client_id`, `redirect_uri`, `scope`, `state`, `code_challenge`, `code_challenge_method`), then blindly `extend`s `self.extra_params` into the same query. `with_extra_param` accepts any key and stores it in a `BTreeMap`, with no reserved-name validation. A caller that sets `with_extra_param("state", "attacker")`, `code_challenge`, `redirect_uri`, `response_type`, `client_id`, or `scope` produces a URL with duplicate query parameters where the extra value appears after the core value. Because many OAuth parsers use last-value-wins semantics, this can desynchronize the locally expected state/PKCE verifier from the authorization-server-visible values, or change redirect/scope semantics. Jobdori separately filed duplicate callback parameters (#603); this is the outbound sibling: duplicates are generated by the client itself before the browser redirect, not just accepted on callback. **Required fix shape:** (a) define a reserved parameter set for OAuth authorization requests (`response_type`, `client_id`, `redirect_uri`, `scope`, `state`, `code_challenge`, `code_challenge_method`) and reject attempts to add them via `with_extra_param`; (b) make `with_extra_param` return `Result<Self, OAuthError>` or validate in `build_url` with a typed error rather than silently emitting duplicates; (c) add tests for reserved collisions (`state`, `code_challenge`, `redirect_uri`) and a safe extension like `login_hint`; (d) if an override is intentionally supported, make it explicit and update the stored expected state/verifier/redirect to match so callback/token exchange cannot drift; (e) document provider-specific extra params as additive-only. **Why this matters:** `state` and PKCE are the OAuth anti-CSRF/proof-of-possession controls. Letting arbitrary extras duplicate or override them in the authorization URL creates prompt/auth lifecycle ambiguity and can turn a provider-specific hint hook into a security-sensitive parameter injection footgun. Source: gaebal-gajae dogfood response to Clawhip message `1507473155273265172` on 2026-05-22.

 596. **MCP tool bridge gates `call_tool` on a stale in-memory registry snapshot before doing live discovery, so newly discovered tools can be rejected and removed tools can be offered until runtime failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@20c9d9d`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime mcp_tool_bridge -- --nocapture` passed 19/19, confirming current bridge tests cover pre-registered happy-path tools but not registry/manager drift. Code inspection: `runtime/src/mcp_tool_bridge.rs::McpToolRegistry::call_tool` first locks `self.inner`, requires `state.status == Connected`, and checks `state.tools.iter().any(|t| t.name == tool_name)`. Only after that snapshot gate does it drop the registry lock and call `spawn_tool_call`, which creates a runtime and runs `manager.discover_tools().await` followed by `manager.call_tool(...)`. The live discovery result updates the manager/tool index, but the registry snapshot used for admission is never refreshed from that discovery. Therefore a tool that becomes available after startup/discovery refresh is still rejected as `tool not found` if it is absent from the stale registry, while a tool that disappeared can remain listed/accepted by the registry and then fail later as an `UnknownTool`/runtime error from the manager. Existing tests explicitly register `echo` in the registry before calling the live manager, so they lock in the assumption that the registry already matches runtime discovery. **Required fix shape:** (a) make `call_tool` reconcile registry state from `manager.discover_tools()` before the tool-existence gate, or delegate existence checks entirely to the authoritative manager and then update the registry snapshot; (b) add a registry `last_discovered_at`/generation or per-server tool-source metadata so `list_tools` can report stale/degraded state; (c) add regressions where the registry starts empty but the manager discovers `echo`, and where the registry advertises a stale tool that the manager no longer exposes; (d) ensure `list_tools`, `ToolSearch`, and `call_tool` agree on the same generation/tool surface; (e) surface drift as a typed MCP lifecycle/degraded event rather than a generic `tool not found`. **Why this matters:** the bridge is a control-plane contract between model-visible MCP tools and the live server manager. If admission checks use an old snapshot while execution uses fresh discovery, claws get stale tool availability evidence and confusing failures exactly where autonomous recovery needs a single source of truth. Source: gaebal-gajae dogfood response to Clawhip message `1507480700868362384` on 2026-05-22.
+
+597. **Worker prompt replay accepts a new `task_receipt` even when replaying a recovered prompt, so auto-recovery can pair the old prompt text with a new/different execution receipt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@71f8554`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime worker_boot -- --nocapture` passed 22/22, confirming current worker-boot tests cover replay happy path and receipt mismatch detection after observation but not replay receipt integrity. Code inspection: `runtime/src/worker_boot.rs::WorkerRegistry::send_prompt` computes `next_prompt` from the explicit `prompt` argument or falls back to `worker.replay_prompt.clone()`, then unconditionally sets `worker.expected_receipt = task_receipt`. The tool input (`tools/src/lib.rs::WorkerSendPromptInput`) allows `prompt: Option<String>` and `task_receipt: Option<WorkerTaskReceipt>` independently. After prompt misdelivery, `observe` sets `worker.replay_prompt = worker.last_prompt.clone()` and payloads include the original `worker.expected_receipt`. But a subsequent `send_prompt(worker_id, None, Some(new_receipt))` replays the old recovered prompt while replacing the expected receipt with a new one. Conversely, replaying with `None` drops the expected receipt entirely. The later wrong-task-receipt detector compares observed receipts against this newly supplied/dropped value, not the receipt that belonged to the misdelivered prompt. Existing tests `prompt_misdelivery_is_detected_and_replay_can_be_rearmed` and `wrong_task_receipt_mismatch_is_detected_before_execution_continues` do not assert that replay preserves the original receipt or rejects receipt changes. **Required fix shape:** (a) when `prompt` is omitted and `worker.replay_prompt` is used, preserve the original `worker.expected_receipt` and reject any conflicting `task_receipt`; (b) store replay as a structured `{prompt, task_receipt}` bundle instead of only `String`; (c) if callers intentionally want a new prompt/receipt, require an explicit non-empty `prompt` and clear the replay bundle with an event; (d) add regressions for replay-with-new-receipt rejection, replay-with-no-receipt preserving the original receipt, and explicit new prompt replacing both prompt and receipt; (e) include receipt id/hash in `PromptReplayArmed`/`Running` event payloads without leaking sensitive prompt text. **Why this matters:** prompt recovery is supposed to resend the same task that was misdelivered. Letting the control plane silently combine old prompt text with a new or missing receipt breaks the provenance chain, causing stale/incorrect task execution evidence and making misdelivery recovery untrustworthy. Source: gaebal-gajae dogfood response to Clawhip message `1507488254734106685` on 2026-05-22.