diff --git a/ROADMAP.md b/ROADMAP.md index 5e65654b..455aaee0 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -6727,3 +6727,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed) 586. **Plugin registry reads synchronously mutate bundled plugin installs without a lock/atomic swap, so startup/listing can race or fail while merely trying to aggregate hooks/tools** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 15:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@8043090`. Active tmux session at probe time: `gajae-pr-348-package-release-drift-review`; no active claw-code implementation session. Code inspection: every `PluginManager::plugin_registry_report` call begins with `self.sync_bundled_plugins()?`, and read-style paths (`plugin_registry`, `list_plugins`, `discover_plugins`, `aggregated_hooks`, `aggregated_tools`, startup `build_runtime_plugin_state_with_loader`) all flow through it. `sync_bundled_plugins` loads bundled manifests, then for each stale/outdated bundled plugin does `fs::remove_dir_all(&install_path)?; copy_dir_all(&source_root, &install_path)?;`, removes stale bundled IDs/directories, and finally writes `plugins/registry.json`. There is no process-wide file lock, no temp-dir + atomic rename, and no read-only/degraded mode. Two concurrent CLI startups or a startup plus `claw plugins list` can both decide a sync is needed; one can remove an install dir while the other is loading/copying it, yielding transient missing/partial plugin directories or a registry write race. Even a purely diagnostic/list/aggregate command can fail because bundled-plugin self-sync mutates disk before returning registry data. Existing tests cover sync happy paths and load-failure reporting, but not concurrent registry readers or a simulated remove/copy interruption. **Required fix shape:** (a) separate read-only registry discovery from bundled-plugin reconciliation, or gate reconciliation behind an explicit locked startup/update phase; (b) protect bundled install sync and registry writes with an interprocess lock; (c) copy bundled plugins into a temp dir and atomically rename/swap, never exposing partial installs; (d) if sync fails during a read-style command, return a degraded registry report with load failures instead of aborting all plugin aggregation where safe; (e) add concurrency/interruption tests with two managers racing `plugin_registry_report` and with `copy_dir_all` failure after removal, proving readers see either old or new complete plugin installs. **Why this matters:** plugin/MCP startup already has lifecycle friction. Registry reads should be safe and mostly observational; making them perform unlocked destructive replacement means diagnostics and startup can create the very plugin-load failures they are trying to observe. Source: gaebal-gajae dogfood response to Clawhip message `1507405207602987138` on 2026-05-22. 587. **`REPL` has an optional timeout with no default, so model-supplied code can block the tool dispatch thread indefinitely when `timeout_ms` is omitted** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9843ccf`. Active tmux sessions at probe time: none. Channel context included Jobdori's adjacent #595 Sleep finding, so this probe inspected the other model-facing blocking execution surfaces instead of duplicating Sleep. Code inspection: `rust/crates/tools/src/lib.rs::execute_repl` validates code/language, spawns `python -c` / `node -e`, and only enters the polling timeout loop when `input.timeout_ms` is `Some`. If `timeout_ms` is omitted, it directly calls `process.spawn()?.wait_with_output()?`, with no default deadline, no backgrounding, and no abort-signal checks. The tool schema makes `timeout_ms` optional (`"timeout_ms": { "type": "integer", "minimum": 1 }`), and existing REPL success coverage passes `timeout_ms:500`; the timeout regression passes `timeout_ms:10`; there is no test for omitted-timeout long-running code. Therefore `REPL({language:"python", code:"import time; time.sleep(999999)"})` can freeze the same tool dispatch path indefinitely, which is worse than Sleep's 5-minute cap and easy for a model to trigger by forgetting the optional field. **Required fix shape:** (a) make `timeout_ms` default to a conservative deadline (for example 30s) instead of unbounded wait; (b) enforce an upper cap and return a structured timeout error matching bash/PowerShell timeout surfaces; (c) poll in small intervals that can observe a shared abort signal if/when the tool registry gains one; (d) add regressions for omitted timeout, explicit timeout, excessive timeout, and successful short code; (e) include elapsed/timeout metadata in the JSON error so claws can distinguish user-code hang from interpreter startup failure. **Why this matters:** REPL is a model-facing code execution tool. Optional timeout means the safest path depends on the model remembering to provide a guard every time; one missing field can wedge unattended claw runs forever with no heartbeat or typed recovery event. Source: gaebal-gajae dogfood response to Clawhip message `1507412753374249032` on 2026-05-22. + +588. **Prompt-mode `read_piped_stdin()` still reads the entire pipe with no cap before merging it into the prompt, so the one-shot prompt path can OOM and API/session history can diverge from the persisted truncated JSONL** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 16:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a1728b2`. Active tmux session at probe time: `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Channel context included Jobdori's #596 `LineEditor::read_line_fallback` unbounded line-read finding, so this probe checked the other piped-stdin entrypoint. Code inspection: `rust/crates/rusty-claude-cli/src/main.rs::read_piped_stdin` returns `None` for TTY stdin, then does `let mut buffer = String::new(); io::stdin().read_to_string(&mut buffer)` with no byte/char cap. In `CliAction::Prompt`, when `permission_mode == DangerFullAccess`, this entire buffer is appended by `merge_prompt_with_stdin` to the user prompt and sent to `LiveCli::run_turn_with_output`. The session layer later truncates JSONL fields to `MAX_JSONL_FIELD_CHARS = 16 * 1024`, so a huge piped context can be fully allocated and sent to the provider while the persisted session records only a truncated copy. This is distinct from #596: `read_line_fallback` covers non-TTY REPL line input; `read_piped_stdin` covers explicit one-shot prompt + piped context. Existing tests exercise merge formatting but not large stdin caps or persistence parity. **Required fix shape:** (a) introduce one shared `MAX_STDIN_PROMPT_BYTES`/`MAX_STDIN_PROMPT_CHARS` boundary for both `read_line_fallback` and `read_piped_stdin`; (b) read through `take(limit + 1)` or chunked bounded reads so oversize input is detected before unbounded allocation; (c) fail with a typed “stdin prompt too large” error or summarize/truncate before both API send and session persistence using the same content; (d) add tests for empty stdin, normal small piped context, limit-boundary input, and oversize input; (e) include the stdin byte/char count and cap in JSON/text diagnostics without echoing the large payload. **Why this matters:** piped stdin is a primary automation path (`cat file | claw prompt ...`). If it reads unbounded context and then persists a different truncated transcript, claws cannot replay, audit, or recover the same conversation the provider actually saw. Source: gaebal-gajae dogfood response to Clawhip message `1507420302924185720` on 2026-05-22.