diff --git a/docs/g003-boot-session-verification-map.md b/docs/g003-boot-session-verification-map.md new file mode 100644 index 00000000..52659ba0 --- /dev/null +++ b/docs/g003-boot-session-verification-map.md @@ -0,0 +1,96 @@ +# G003 boot/session/preflight verification map + +Generated by `worker-1` for OMX team task 2 on 2026-05-14. + +## Scope and coordination + +- Active goal context: `G003-boot-session` / Stream 1 reliable worker boot and session control. +- Boundary: this artifact is an audit/integration map only. It does not mutate `.omx/ultragoal` and it does not change shared implementation or tests. +- Current worker split from leader mailbox: + - `worker-1`: task 1 worker boot / prompt SLA plus this task 2 audit map. + - `worker-2`: default trusted roots / trust resolver. + - `worker-3`: startup-no-evidence classifier. + - `worker-4`: session control plus preflight/doctor JSON surfaces. +- Native subagent probes were attempted for Task 2 (`test probe` and `debug/root-cause probe`) but both failed before returning findings with `429 Too Many Requests`; the map below is based on direct repository inspection. + +## Implementation surface map + +### Worker boot lifecycle and prompt SLA + +- `rust/crates/runtime/src/worker_boot.rs` + - Core state types: `WorkerStatus`, `WorkerFailureKind`, `WorkerEventKind`, `WorkerEventPayload`, `StartupFailureClassification`, `StartupEvidenceBundle`, `WorkerTaskReceipt`, and `WorkerReadySnapshot`. + - Control plane: `WorkerRegistry::{create,get,observe,resolve_trust,send_prompt,await_ready,restart,terminate,observe_completion,observe_startup_timeout}`. + - Lifecycle states currently covered in code: `spawning`, `trust_required`, `tool_permission_required`, `ready_for_prompt`, `running`, `finished`, and `failed`. + - Prompt delivery semantics currently use `Running` events and fields `prompt_in_flight`, `last_prompt`, `expected_receipt`, `replay_prompt`, and `prompt_delivery_attempts`. + - Startup-no-evidence surface: `observe_startup_timeout` builds `StartupEvidenceBundle` and classifies trust, tool permission, prompt acceptance timeout, prompt misdelivery, transport death, worker crash, or unknown. + - File observability surface: `emit_state_file` writes `.claw/worker-state.json` with status, readiness, trust state, prompt-in-flight flag, last event, and update age. + +- `rust/crates/tools/src/lib.rs` + - Tool APIs expose the worker control plane through `WorkerCreate`, `WorkerGet`, `WorkerObserve`, `WorkerResolveTrust`, `WorkerAwaitReady`, `WorkerSendPrompt`, `WorkerRestart`, `WorkerTerminate`, and `WorkerObserveCompletion`. + - `WorkerCreate` merges `ConfigLoader::trusted_roots()` with per-call `trusted_roots` before calling `WorkerRegistry::create`. + - Tool-level tests exercise worker create/observe/send/restart/terminate/completion and state-file transitions. + +### Trust resolver and default trusted roots + +- `rust/crates/runtime/src/trust_resolver.rs` + - `TrustConfig`, `TrustAllowlistEntry`, and `TrustResolver` model trust prompts, allowlist/denylist policy, auto-trust, manual approval, and emitted trust events. + - `path_matches_trusted_root` and internal `path_matches` canonicalize paths when possible. + - Hazard: prefix matching must avoid accidental sibling matches such as `/tmp/work` matching `/tmp/work-evil`; worker-2 owns any changes here. + +- `rust/crates/runtime/src/config.rs` + - `trustedRoots` is parsed by `parse_optional_trusted_roots` and exposed through `RuntimeConfig::trusted_roots()` / feature config accessors. + - Current default is empty when unset; any project default roots work belongs to worker-2. + +### Session control + +- `rust/crates/runtime/src/session_control.rs` + - `SessionStore` namespaces sessions by canonical workspace fingerprint. + - Key API: `from_cwd`, `from_data_dir`, `create_handle`, `resolve_reference`, `resolve_managed_path`, `list_sessions`, `latest_session`, `load_session`, and `fork_session`. + - Guardrail: `validate_loaded_session` rejects cross-workspace sessions and allows legacy sessions only when their path remains inside the current workspace. + - Worker-4 owns changes to this lane. + +### CLI doctor/status/preflight and bootstrap-adjacent surfaces + +- `rust/crates/commands/src/lib.rs` + - Slash command definitions include `/status`, `/sandbox`, and `/doctor`. + - JSON rendering for command surfaces exists through handler functions and tests in the same module. + +- `rust/crates/tools/src/lib.rs` + - Bash and PowerShell tool runners include `workspace_test_branch_preflight`, which returns structured output with `return_code_interpretation: preflight_blocked:branch_divergence` for broad workspace tests on stale branches. + - Tests around `bash_workspace_tests_are_blocked_when_branch_is_behind_main` and targeted-test skipping protect this preflight behavior. + +## Existing focused verification commands + +Run from `rust/` unless noted. + +- Worker boot runtime contract: + - `cargo test -p runtime worker_boot -- --nocapture` +- Worker tool API contract: + - `cargo test -p tools worker_ -- --nocapture` +- Session control contract: + - `cargo test -p runtime session_control -- --nocapture` +- Trust resolver/config trusted roots: + - `cargo test -p runtime trust_resolver -- --nocapture` + - `cargo test -p runtime config::tests::parses_trusted_roots_from_settings config::tests::trusted_roots_default_is_empty_when_unset -- --nocapture` +- Preflight/tool branch guardrails: + - `cargo test -p tools bash_workspace_tests_are_blocked_when_branch_is_behind_main bash_targeted_tests_skip_branch_preflight -- --nocapture` +- Formatting/type/lint baseline: + - `../scripts/fmt.sh --check` + - `cargo check -p runtime -p tools -p commands` + - `cargo clippy -p runtime -p tools -p commands --all-targets --no-deps -- -D warnings` + +## Gaps and hazards for leader integration + +- Prompt SLA event naming is partially implicit: `send_prompt` emits `WorkerEventKind::Running`; it does not expose separate `prompt.sent`, `prompt.accepted`, `prompt.acceptance_delayed`, or `prompt.acceptance_timeout` event names. The current equivalent evidence is `prompt_in_flight`, `Running`, `observe_completion`, and startup-timeout classification. +- `StartupFailureClassification::PromptAcceptanceTimeout` is covered in `worker_boot` tests; full terminal/transport integration should still be verified by the leader or worker-3 if a real pane watcher exists outside the in-memory registry. +- Default trusted roots are parsed and merged into `WorkerCreate`, but unset config currently means no default roots. Worker-2 owns any change to default root selection. +- Session control protects workspace fingerprints at load/fork time; worker-4 owns CLI/doctor/preflight JSON contract changes. +- Full-workspace clippy currently has known unrelated runtime findings observed during task 1 verification; do not block this docs-only map on those unless leader re-scopes cleanup. + +## Recommended safe integration order + +1. Integrate worker boot / prompt SLA changes first and run `cargo test -p runtime worker_boot -- --nocapture` plus `cargo test -p tools worker_ -- --nocapture`. +2. Integrate trust-root changes and rerun trust/config tests plus the worker create config merge test. +3. Integrate startup-no-evidence classifier changes and rerun `cargo test -p runtime worker_boot -- --nocapture`. +4. Integrate session control / preflight / doctor JSON changes and rerun session-control, commands JSON, and preflight tests. +5. Run final formatting, targeted cargo check/clippy, then broader workspace tests with known full-workspace failures documented separately.