docs(roadmap): add replay armed stale error gap

2026-05-23 06:06:43 +00:00 · 2026-05-22 15:01:23 +00:00
parent b0bca2ea4f
commit 8043090953
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6721,3 +6721,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 583. **`PermissionEnforcer::check_file_write` uses string-prefix workspace checks, so relative traversal like `../../outside` is allowed as if it were inside the workspace** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@aa9efe5`. Active tmux session at probe time: `omx-issue-2468-autoresearch-docs-css`; no active claw-code implementation session. Channel context included Jobdori's #591 `Prompt`-mode bypass in `permission_enforcer.rs`, so this probe inspected the adjacent file-write boundary helper. Code inspection: `rust/crates/runtime/src/permission_enforcer.rs::is_within_workspace` treats relative paths by string-concatenating `format!("{workspace_root}/{path}")`, then checks `normalized.starts_with(root) || normalized == workspace_root.trim_end_matches('/')`. It never normalizes `.`/`..`, canonicalizes symlinks, or resolves the candidate path. Therefore `check_file_write("../../outside/secret.txt", "/workspace")` builds `/workspace/../../outside/secret.txt`, which still starts with `/workspace/`, and returns `Allowed` in `WorkspaceWrite` mode even though the resolved target is outside the workspace. Existing tests cover absolute outside paths and simple relative `src/main.rs`, but not relative traversal escapes. **Required fix shape:** (a) replace string-prefix checks with path normalization/canonicalization against a workspace root, resolving `.`/`..` before comparison; (b) reject escaping relative paths even if the file does not exist yet, using lexical normalization plus parent canonicalization where needed; (c) add regressions for `../outside.txt`, `../../outside/secret.txt`, `src/../safe.txt`, and symlink escape cases; (d) share the same boundary primitive with runtime file ops and bash path-scope checks so permission enforcement cannot drift; (e) include resolved/candidate path evidence in denial messages without leaking sensitive contents. **Why this matters:** workspace-write mode's core promise is “project files only.” A string-prefix check lets a caller smuggle outside writes through relative traversal while the permission enforcer reports success, defeating the boundary before lower-level file tools can help. Source: gaebal-gajae dogfood response to Clawhip message `1507382558340681779` on 2026-05-22.

 584. **`WorkerRegistry::restart` clears prompt/trust state but leaves old events and the original `created_at`, so post-restart timeout evidence can mix prior-attempt blockers with the new boot attempt** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 14:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@ac09033`. Active tmux sessions at probe time: none. Channel context included Jobdori's worker-boot #592 finding, so this probe stayed in `worker_boot.rs` but checked restart lifecycle continuity. Code inspection: `rust/crates/runtime/src/worker_boot.rs::restart` resets `status`, trust flags, prompt fields, `last_error`, attempts, and `prompt_in_flight`, then appends `WorkerEventKind::Restarted`. It does not reset `worker.created_at`, does not create a per-attempt `boot_started_at`, does not increment a restart/attempt counter, and does not clear or partition previous events. Later `observe_startup_timeout` computes `elapsed = now.saturating_sub(worker.created_at)` and derives `trust_prompt_detected`, `tool_permission_detected`, and `ready_for_prompt_detected` by scanning the entire `worker.events` history. A worker that previously hit trust/tool/ready evidence, then restarts cleanly, can have the new timeout classified with old pre-restart evidence and an elapsed time anchored to the original creation. Existing `restart_and_terminate_reset_or_finish_worker` only asserts prompt fields/attempts reset; it does not assert event scoping or elapsed-time reset. **Required fix shape:** (a) add `current_attempt_started_at`/`boot_started_at` and `attempt_index` fields; (b) set them on create and restart and use them for startup timeout elapsed/command_started_at; (c) scope timeout evidence scans to events since the current attempt or store per-attempt cached evidence; (d) add tests where trust/tool evidence exists before restart but not after, proving the post-restart timeout does not inherit stale blockers; (e) include attempt index in worker events so dashboards can separate old and new boot attempts. **Why this matters:** restart is supposed to produce a fresh boot attempt. If old events and timestamps remain authoritative, operators see misleading “stalled for hours / trust required” evidence for a brand-new restart and recovery automation can choose the wrong next action. Source: gaebal-gajae dogfood response to Clawhip message `1507390103956226228` on 2026-05-22.
+
+585. **Prompt-misdelivery auto-recovery arms replay without clearing `last_error`, so a worker can be `ReadyForPrompt` while still carrying a stale prompt-delivery failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@b0bca2e`. Active tmux session at probe time: `gajae-pr-346-session-gateway-continuity-digest-review`; no active claw-code implementation session. Code inspection: in `rust/crates/runtime/src/worker_boot.rs::observe`, prompt misdelivery sets `worker.last_error = Some(WorkerFailureKind::PromptDelivery)` and `prompt_in_flight = false`, then pushes a `PromptMisdelivery` event. If `worker.auto_recover_prompt_misdelivery` is true, it sets `worker.replay_prompt = worker.last_prompt.clone()` and `worker.status = WorkerStatus::ReadyForPrompt`, then pushes `PromptReplayArmed`; however it never clears or demotes `last_error`. `await_ready` then returns `ready: true` and `last_error: Some(PromptDelivery)`, so callers/dashboards can see a ready replay state and a failure state simultaneously. Later `send_prompt` clears `last_error`, but until replay is actually sent the state snapshot is contradictory. Existing replay tests assert `status == ReadyForPrompt` and `replay_prompt` contents, but do not assert `last_error` semantics while replay is armed. **Required fix shape:** (a) when auto-recovery arms replay, either clear `last_error` or replace it with a non-fatal/degraded `replay_armed` status separate from failure; (b) include `recovery_armed:true` in the worker snapshot or ready result so callers can distinguish a recoverable ready state from a failed state; (c) add tests asserting `await_ready` after auto-recovery does not report contradictory ready+fatal error; (d) preserve the original misdelivery event for audit history while keeping current worker state coherent; (e) ensure manual/non-auto recovery still reports failure until explicitly resolved. **Why this matters:** recovery state is an operator contract. A worker that is ready to replay should not also advertise a current fatal prompt-delivery error, or automation may both retry and escalate the same incident. Source: gaebal-gajae dogfood response to Clawhip message `1507397657763778562` on 2026-05-22.