docs(roadmap): add subagent lane event conformance gap

2026-05-23 06:06:43 +00:00 · 2026-05-22 21:31:15 +00:00
parent d2018d7aee
commit e5d2eb1423
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6749,3 +6749,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 596. **MCP tool bridge gates `call_tool` on a stale in-memory registry snapshot before doing live discovery, so newly discovered tools can be rejected and removed tools can be offered until runtime failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 20:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@20c9d9d`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime mcp_tool_bridge -- --nocapture` passed 19/19, confirming current bridge tests cover pre-registered happy-path tools but not registry/manager drift. Code inspection: `runtime/src/mcp_tool_bridge.rs::McpToolRegistry::call_tool` first locks `self.inner`, requires `state.status == Connected`, and checks `state.tools.iter().any(|t| t.name == tool_name)`. Only after that snapshot gate does it drop the registry lock and call `spawn_tool_call`, which creates a runtime and runs `manager.discover_tools().await` followed by `manager.call_tool(...)`. The live discovery result updates the manager/tool index, but the registry snapshot used for admission is never refreshed from that discovery. Therefore a tool that becomes available after startup/discovery refresh is still rejected as `tool not found` if it is absent from the stale registry, while a tool that disappeared can remain listed/accepted by the registry and then fail later as an `UnknownTool`/runtime error from the manager. Existing tests explicitly register `echo` in the registry before calling the live manager, so they lock in the assumption that the registry already matches runtime discovery. **Required fix shape:** (a) make `call_tool` reconcile registry state from `manager.discover_tools()` before the tool-existence gate, or delegate existence checks entirely to the authoritative manager and then update the registry snapshot; (b) add a registry `last_discovered_at`/generation or per-server tool-source metadata so `list_tools` can report stale/degraded state; (c) add regressions where the registry starts empty but the manager discovers `echo`, and where the registry advertises a stale tool that the manager no longer exposes; (d) ensure `list_tools`, `ToolSearch`, and `call_tool` agree on the same generation/tool surface; (e) surface drift as a typed MCP lifecycle/degraded event rather than a generic `tool not found`. **Why this matters:** the bridge is a control-plane contract between model-visible MCP tools and the live server manager. If admission checks use an old snapshot while execution uses fresh discovery, claws get stale tool availability evidence and confusing failures exactly where autonomous recovery needs a single source of truth. Source: gaebal-gajae dogfood response to Clawhip message `1507480700868362384` on 2026-05-22.

 597. **Worker prompt replay accepts a new `task_receipt` even when replaying a recovered prompt, so auto-recovery can pair the old prompt text with a new/different execution receipt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@71f8554`. Active tmux sessions at probe time: none; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime worker_boot -- --nocapture` passed 22/22, confirming current worker-boot tests cover replay happy path and receipt mismatch detection after observation but not replay receipt integrity. Code inspection: `runtime/src/worker_boot.rs::WorkerRegistry::send_prompt` computes `next_prompt` from the explicit `prompt` argument or falls back to `worker.replay_prompt.clone()`, then unconditionally sets `worker.expected_receipt = task_receipt`. The tool input (`tools/src/lib.rs::WorkerSendPromptInput`) allows `prompt: Option<String>` and `task_receipt: Option<WorkerTaskReceipt>` independently. After prompt misdelivery, `observe` sets `worker.replay_prompt = worker.last_prompt.clone()` and payloads include the original `worker.expected_receipt`. But a subsequent `send_prompt(worker_id, None, Some(new_receipt))` replays the old recovered prompt while replacing the expected receipt with a new one. Conversely, replaying with `None` drops the expected receipt entirely. The later wrong-task-receipt detector compares observed receipts against this newly supplied/dropped value, not the receipt that belonged to the misdelivered prompt. Existing tests `prompt_misdelivery_is_detected_and_replay_can_be_rearmed` and `wrong_task_receipt_mismatch_is_detected_before_execution_continues` do not assert that replay preserves the original receipt or rejects receipt changes. **Required fix shape:** (a) when `prompt` is omitted and `worker.replay_prompt` is used, preserve the original `worker.expected_receipt` and reject any conflicting `task_receipt`; (b) store replay as a structured `{prompt, task_receipt}` bundle instead of only `String`; (c) if callers intentionally want a new prompt/receipt, require an explicit non-empty `prompt` and clear the replay bundle with an event; (d) add regressions for replay-with-new-receipt rejection, replay-with-no-receipt preserving the original receipt, and explicit new prompt replacing both prompt and receipt; (e) include receipt id/hash in `PromptReplayArmed`/`Running` event payloads without leaking sensitive prompt text. **Why this matters:** prompt recovery is supposed to resend the same task that was misdelivered. Letting the control plane silently combine old prompt text with a new or missing receipt breaks the provenance chain, causing stale/incorrect task execution evidence and making misdelivery recovery untrustworthy. Source: gaebal-gajae dogfood response to Clawhip message `1507488254734106685` on 2026-05-22.
+
+598. **Sub-agent lane manifests emit `LaneEvent::started/blocked/failed/finished/commit_created` with minimal optional metadata, so exported lane events fail the stricter G004 `emitterIdentity`/`environmentLabel` contract and have duplicate `seq=0` ordering** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 21:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@d2018d7`. Active tmux sessions at probe time: none; no active claw-code implementation session. Channel context included Jobdori filing #606 for the general G004 validator/model mismatch; this pinpoint is the concrete producer-side lane-manifest impact. Focused validation: `cd rust && cargo test -p tools lane -- --nocapture` passed 8/8, confirming current tools lane tests cover canonical event names/failure taxonomy but not G004 conformance of generated sub-agent manifests. Code inspection: `tools/src/lib.rs::create_agent_with_spawn` initializes `AgentOutput.lane_events` with `LaneEvent::started(iso8601_now())`; `persist_agent_terminal_state` appends `LaneEvent::blocked`, `LaneEvent::failed`, `LaneEvent::finished(...).with_data(...)`, and `LaneEvent::commit_created(...)`. All of these constructors call `LaneEvent::new`, which uses `LaneEventMetadata::new(0, EventProvenance::LiveLane)` and leaves `environment_label`/`emitter_identity` as `None`. Because metadata fields are skipped when `None`, serialized manifests omit the exact fields that `g004_conformance.rs::validate_lane_events` requires, and every appended event also carries `metadata.seq = 0`, violating the validator's strictly-increasing sequence rule as soon as a manifest has multiple events. This means even if the data model is fixed broadly, current sub-agent manifest production still needs a generation path that fills emitter/environment and monotonic seq. **Required fix shape:** (a) route all sub-agent lane event creation through `LaneEventBuilder` or a manifest-local helper that assigns monotonic sequence numbers; (b) set non-secret defaults such as `emitterIdentity:"tools.subagent"` and `environmentLabel` from cwd/session/channel/config, never leave them absent for exported manifests; (c) add a test that creates and completes/fails a sub-agent manifest then validates its `lane_events` with `validate_g004_contract_bundle` or a lane-event-specific conformance helper; (d) update helper constructors or document them as internal/non-G004 only; (e) align with #606 so constructor defaults and validator requirements agree. **Why this matters:** lane manifests are the event-log surface for autonomous sub-agents. If their generated events cannot pass the repo's own conformance validator and all share seq 0, downstream dashboards/reconcilers cannot rely on ordering, ownership, or environment provenance. Source: gaebal-gajae dogfood response to Clawhip message `1507495807052550174` on 2026-05-22.