docs(roadmap): add workspace preflight remote base gap

2026-05-22 13:46:44 +00:00 · 2026-05-21 20:00:55 +00:00
parent a036293829
commit 6ef54578e8
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6663,3 +6663,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 553. **Worker restart reuses the original `created_at`, so startup-timeout elapsed time after restart includes the previous worker lifetime and stale pre-restart events** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@51c05ac` and binary built from source SHA `25d663d`. Code inspection: `WorkerRegistry::restart` at `runtime/src/worker_boot.rs:600-621` resets status, prompt fields, trust, and error state, then appends `WorkerEventKind::Restarted`, but it does not reset `worker.created_at` or clear/carry a new boot-start timestamp. `observe_startup_timeout` at `worker_boot.rs:713-735` computes `elapsed = now.saturating_sub(worker.created_at)` and reports `command_started_at: worker.created_at`. Therefore a worker restarted after a long previous lifetime can immediately report a huge startup timeout elapsed/command_started_at from the original boot, not the restart. Because events are also retained, the timeout evidence can mix pre-restart trust/tool-permission detections with the new boot attempt. This is distinct from Jobdori's #554 O(3N) scan/unbounded-events finding: even with O(1) caches, the temporal anchor for a restarted boot is wrong. **Required fix shape:** (a) add `boot_started_at` or `current_attempt_started_at` distinct from immutable worker creation time; (b) set that timestamp on create and restart, and use it for startup timeout elapsed/command_started_at; (c) either scope trust/tool-permission evidence to events since the current boot attempt or store per-attempt cached flags; (d) include `attempt_index`/`restart_count` in startup evidence and worker events so old/new attempts are separable; (e) add a regression where a worker is created, time advances/restart occurs, then `observe_startup_timeout` reports elapsed from restart rather than original creation and ignores pre-restart prompts. **Why this matters:** restart is supposed to create a fresh startup attempt. If timeout evidence is anchored to the first creation, operators see misleading "stalled for hours" reports and stale blocker classifications for a brand-new restart, which breaks recovery decisions. Source: gaebal-gajae dogfood response to Clawhip message `1507095667959402638` on 2026-05-21.

 554. **Recovery ledger tests assert only `started_at.is_some()` / `finished_at.is_some()`, so fake tick-counter timestamps are explicitly accepted by the machine-readable ledger suite** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 19:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1540e3a` and binary built from source SHA `25d663d`. Code inspection: `RecoveryContext::next_timestamp` in `runtime/src/recovery_recipes.rs:276-279` returns `recovery-ledger-tick-N`, and `attempt_recovery` stores those strings into public `RecoveryLedgerEntry.started_at` / `finished_at`. The test named `recovery_context_exposes_machine_readable_ledger` at `recovery_recipes.rs:688-720` only asserts `entry.started_at.is_some()` and `entry.finished_at.is_some()`; exhaustion/failure ledger tests likewise validate state/result/command details but not timestamp parseability. Therefore the test suite labels the ledger machine-readable while allowing non-date sentinel strings in fields named `started_at` and `finished_at`. This is the test-coverage sibling of Jobdori's #555 public API timestamp bug: even after production is fixed, nothing in the ledger tests prevents a regression back to tick strings or other unparseable data. **Required fix shape:** (a) add a recovery timestamp assertion helper that parses `started_at` and `finished_at` as RFC3339/ISO-8601 UTC; (b) update success, exhausted, and failed ledger tests to use it; (c) add a negative unit test proving `recovery-ledger-tick-1` is rejected by the helper/contract; (d) document whether recovery ledger timestamps are wall-clock instants or monotonic attempt IDs, and if both are needed, add separate `attempt_seq` instead of overloading timestamp fields; (e) align with the timestamp contract fixes in #548-#551. **Why this matters:** tests currently make the wrong semantic promise: "machine-readable" only means present. Recovery ledgers drive retries/escalation audit trails, so timestamp fields must be parseable dates or consumers cannot sort, correlate, or display recovery attempts reliably. Source: gaebal-gajae dogfood response to Clawhip message `1507103214665859264` on 2026-05-21.
+
+555. **Workspace-test stale-branch preflight can compare against stale local `main` instead of `origin/main`, letting branches behind the remote base run full workspace tests as “fresh”** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 20:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a036293` and binary built from source SHA `25d663d`. Live channel context had multiple open claw-code PRs whose head ref is `main`, and the watchdog target is specifically stale-branch confusion. Code inspection: `tools/src/lib.rs::workspace_test_branch_preflight` reads the current branch then calls `resolve_main_ref(&branch)` before `check_freshness`. `resolve_main_ref` at `tools/src/lib.rs:2020-2032` returns local `main` whenever it exists, except when the current branch itself is `main` and `origin/main` exists. In the common feature-branch case with both refs present, the stale-branch guard compares feature branch to local `main`, not `origin/main`. If local `main` has not been fetched/updated, a branch can be behind `origin/main` but equal to local `main`, so `check_freshness` returns `Fresh` and `cargo test --workspace` proceeds without the preflight block. **Required fix shape:** (a) prefer `origin/main` (or the configured protected/base remote ref) for non-main branches when present; (b) fetch or verify the remote ref freshness before using it, or emit a degraded `branch.remote_base_unknown`/`branch.base_ref_stale` lane event instead of silently falling back; (c) include `baseRefSource` and `baseRefCommit` in the blocked lane event payload so operators know whether freshness was checked against local or remote state; (d) add a regression with local `main` stale, `origin/main` ahead, and a feature branch equal to local `main`, proving workspace tests are blocked; (e) keep the current `branch == main -> origin/main` behavior but cover it separately. **Why this matters:** full workspace tests are used as green evidence. If the guard checks an outdated local main, agents can burn time and report green against a stale base while missing fixes already in the remote protected branch. Source: gaebal-gajae dogfood response to Clawhip message `1507110763301437440` on 2026-05-21.