From a1728b2be80e9fe9fb610f10ff8b92c15656952f Mon Sep 17 00:00:00 2001 From: Yeachan-Heo Date: Fri, 22 May 2026 16:01:10 +0000 Subject: [PATCH] docs(roadmap): add repl missing default timeout gap --- ROADMAP.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ROADMAP.md b/ROADMAP.md index de4b7a74..5e65654b 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6725,3 +6725,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed) 585. **Prompt-misdelivery auto-recovery arms replay without clearing `last_error`, so a worker can be `ReadyForPrompt` while still carrying a stale prompt-delivery failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@b0bca2e`. Active tmux session at probe time: `gajae-pr-346-session-gateway-continuity-digest-review`; no active claw-code implementation session. Code inspection: in `rust/crates/runtime/src/worker_boot.rs::observe`, prompt misdelivery sets `worker.last_error = Some(WorkerFailureKind::PromptDelivery)` and `prompt_in_flight = false`, then pushes a `PromptMisdelivery` event. If `worker.auto_recover_prompt_misdelivery` is true, it sets `worker.replay_prompt = worker.last_prompt.clone()` and `worker.status = WorkerStatus::ReadyForPrompt`, then pushes `PromptReplayArmed`; however it never clears or demotes `last_error`. `await_ready` then returns `ready: true` and `last_error: Some(PromptDelivery)`, so callers/dashboards can see a ready replay state and a failure state simultaneously. Later `send_prompt` clears `last_error`, but until replay is actually sent the state snapshot is contradictory. Existing replay tests assert `status == ReadyForPrompt` and `replay_prompt` contents, but do not assert `last_error` semantics while replay is armed. **Required fix shape:** (a) when auto-recovery arms replay, either clear `last_error` or replace it with a non-fatal/degraded `replay_armed` status separate from failure; (b) include `recovery_armed:true` in the worker snapshot or ready result so callers can distinguish a recoverable ready state from a failed state; (c) add tests asserting `await_ready` after auto-recovery does not report contradictory ready+fatal error; (d) preserve the original misdelivery event for audit history while keeping current worker state coherent; (e) ensure manual/non-auto recovery still reports failure until explicitly resolved. **Why this matters:** recovery state is an operator contract. A worker that is ready to replay should not also advertise a current fatal prompt-delivery error, or automation may both retry and escalate the same incident. Source: gaebal-gajae dogfood response to Clawhip message `1507397657763778562` on 2026-05-22. 586. **Plugin registry reads synchronously mutate bundled plugin installs without a lock/atomic swap, so startup/listing can race or fail while merely trying to aggregate hooks/tools** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8043090`. Active tmux session at probe time: `gajae-pr-348-package-release-drift-review`; no active claw-code implementation session. Code inspection: every `PluginManager::plugin_registry_report` call begins with `self.sync_bundled_plugins()?`, and read-style paths (`plugin_registry`, `list_plugins`, `discover_plugins`, `aggregated_hooks`, `aggregated_tools`, startup `build_runtime_plugin_state_with_loader`) all flow through it. `sync_bundled_plugins` loads bundled manifests, then for each stale/outdated bundled plugin does `fs::remove_dir_all(&install_path)?; copy_dir_all(&source_root, &install_path)?;`, removes stale bundled IDs/directories, and finally writes `plugins/registry.json`. There is no process-wide file lock, no temp-dir + atomic rename, and no read-only/degraded mode. Two concurrent CLI startups or a startup plus `claw plugins list` can both decide a sync is needed; one can remove an install dir while the other is loading/copying it, yielding transient missing/partial plugin directories or a registry write race. Even a purely diagnostic/list/aggregate command can fail because bundled-plugin self-sync mutates disk before returning registry data. Existing tests cover sync happy paths and load-failure reporting, but not concurrent registry readers or a simulated remove/copy interruption. **Required fix shape:** (a) separate read-only registry discovery from bundled-plugin reconciliation, or gate reconciliation behind an explicit locked startup/update phase; (b) protect bundled install sync and registry writes with an interprocess lock; (c) copy bundled plugins into a temp dir and atomically rename/swap, never exposing partial installs; (d) if sync fails during a read-style command, return a degraded registry report with load failures instead of aborting all plugin aggregation where safe; (e) add concurrency/interruption tests with two managers racing `plugin_registry_report` and with `copy_dir_all` failure after removal, proving readers see either old or new complete plugin installs. **Why this matters:** plugin/MCP startup already has lifecycle friction. Registry reads should be safe and mostly observational; making them perform unlocked destructive replacement means diagnostics and startup can create the very plugin-load failures they are trying to observe. Source: gaebal-gajae dogfood response to Clawhip message `1507405207602987138` on 2026-05-22. + +587. **`REPL` has an optional timeout with no default, so model-supplied code can block the tool dispatch thread indefinitely when `timeout_ms` is omitted** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9843ccf`. Active tmux sessions at probe time: none. Channel context included Jobdori's adjacent #595 Sleep finding, so this probe inspected the other model-facing blocking execution surfaces instead of duplicating Sleep. Code inspection: `rust/crates/tools/src/lib.rs::execute_repl` validates code/language, spawns `python -c` / `node -e`, and only enters the polling timeout loop when `input.timeout_ms` is `Some`. If `timeout_ms` is omitted, it directly calls `process.spawn()?.wait_with_output()?`, with no default deadline, no backgrounding, and no abort-signal checks. The tool schema makes `timeout_ms` optional (`"timeout_ms": { "type": "integer", "minimum": 1 }`), and existing REPL success coverage passes `timeout_ms:500`; the timeout regression passes `timeout_ms:10`; there is no test for omitted-timeout long-running code. Therefore `REPL({language:"python", code:"import time; time.sleep(999999)"})` can freeze the same tool dispatch path indefinitely, which is worse than Sleep's 5-minute cap and easy for a model to trigger by forgetting the optional field. **Required fix shape:** (a) make `timeout_ms` default to a conservative deadline (for example 30s) instead of unbounded wait; (b) enforce an upper cap and return a structured timeout error matching bash/PowerShell timeout surfaces; (c) poll in small intervals that can observe a shared abort signal if/when the tool registry gains one; (d) add regressions for omitted timeout, explicit timeout, excessive timeout, and successful short code; (e) include elapsed/timeout metadata in the JSON error so claws can distinguish user-code hang from interpreter startup failure. **Why this matters:** REPL is a model-facing code execution tool. Optional timeout means the safest path depends on the model remembering to provide a guard every time; one missing field can wedge unattended claw runs forever with no heartbeat or typed recovery event. Source: gaebal-gajae dogfood response to Clawhip message `1507412753374249032` on 2026-05-22.