mirror of
https://github.com/instructkr/claude-code.git
synced 2026-05-23 06:06:43 +00:00
docs(roadmap): add plugin uninstall transaction gap
This commit is contained in:
@@ -6737,3 +6737,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
|
||||
591. **MCP degraded reports always compute `missing_tools` as empty because degraded construction passes `available_tools` (or `Vec::new()`) as the expected-tool set, so failed/unsupported servers never surface which tools disappeared** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f50625a`. Active tmux sessions at probe time: `gajae-issue-355-backlog-zero-next-artifact-queue`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/mcp_lifecycle_hardened.rs::McpDegradedReport::new` can compute `missing_tools` by subtracting `available_tools` from `expected_tools`. But both producers discard the useful expected set: `runtime/src/mcp_stdio.rs::discover_tools_best_effort` passes `Vec::new()` for `expected_tools`, and `rusty-claude-cli/src/main.rs::RuntimeMcpState::new` passes `available_tools.clone()` as both `available_tools` and `expected_tools`. Therefore `missing_tools` is always empty, even when required servers fail discovery or unsupported remote transports are configured. Existing tests assert `degraded.missing_tools.is_empty()`, locking the broken contract. The only visible degraded data is failed server names/phases; operators cannot tell which qualified tool names vanished from the tool surface. **Required fix shape:** (a) retain each server's advertised/expected tool list from prior successful discovery, static config, manifest metadata, or at least the expected server/tool namespace when known; (b) pass that set into `McpDegradedReport::new` instead of `available_tools`/empty; (c) if exact tools are unknown, expose `missing_tool_names_unknown_for_servers:[...]` so the report is honest rather than falsely empty; (d) update tests to assert non-empty `missing_tools` or explicit unknown-missing metadata when a server with known tools fails; (e) thread the same degraded metadata into `ToolSearch` so models see which tools are unavailable, not just that a server failed. **Why this matters:** degraded startup is supposed to make partial success first-class. An always-empty `missing_tools` field falsely suggests no capability loss, hiding the actual impact of MCP failures and unsupported remote transports from both humans and automation. Source: gaebal-gajae dogfood response to Clawhip message `1507442954308948040` on 2026-05-22.
|
||||
|
||||
592. **Plugin degraded-mode reporting drops degraded-server errors because `PluginState::Degraded` only preserves failed server details, so a server marked degraded can appear fully healthy/available in the operator contract** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@1df4114`. Active tmux session at probe time: `gajae-issue-357-review-session-vanish-replacement`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p runtime plugin_lifecycle -- --nocapture` passed 6/6, confirming the current locked contract. Code inspection: `runtime/src/plugin_lifecycle.rs::PluginState::from_servers` treats `ServerStatus::Degraded` as usable by placing it in `healthy_servers`, but the `PluginState::Degraded` variant stores only `healthy_servers: Vec<String>` and `failed_servers: Vec<ServerHealth>`. `PluginHealthcheck::degraded_mode` then builds `unavailable_tools` only from `failed_servers` and reports a reason of `"N servers healthy, M servers failed"`. Existing test `degraded_server_status_keeps_server_usable` asserts a degraded `beta` server is included in `healthy_servers` and `failed_servers` is empty, but it does not assert that `beta`'s `last_error` (`high latency`) or degraded status is visible in `degraded_mode`. Result: a plugin with one healthy server and one degraded-but-usable server can emit `startup_degraded` while its degraded-mode payload says `2 servers healthy, 0 servers failed`, has no degraded server list, and can mark the degraded server's tools as simply available if discovery returns them. Operators and automation see the terminal state but lose the actual degraded reason/capability risk. **Required fix shape:** (a) split `PluginState::Degraded` into `healthy_servers`, `degraded_servers: Vec<ServerHealth>`, and `failed_servers`, or preserve all non-healthy server health records; (b) make `degraded_mode` include `degraded_tools`/`degraded_servers` with `last_error` separately from unavailable failed tools; (c) update the reason string/counts to distinguish healthy, degraded, and failed rather than folding degraded into healthy; (d) add tests proving a `ServerStatus::Degraded` server's status/error/capabilities appear in the degraded-mode JSON while remaining callable when appropriate; (e) align this with MCP degraded reports so partial plugin startup carries both availability and quality-of-service impact. **Why this matters:** partial startup needs impact opacity removed. A degraded server is not failed, but it also is not fully healthy; folding it into the healthy list makes `startup_degraded` hard to diagnose and can trick recovery logic into thinking no server has actionable degradation details. Source: gaebal-gajae dogfood response to Clawhip message `1507450501925441616` on 2026-05-22.
|
||||
|
||||
593. **Plugin uninstall is a three-store non-atomic transaction: it deletes the plugin directory before persisting registry/settings cleanup, so an IO failure can leave registry and `enabledPlugins` pointing at a removed plugin** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 19:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@244bdb7`. Active tmux session at probe time: `gajae-pr-358-review-session-vanish-replacement-review`; no active claw-code implementation session. Focused validation: `cd rust && cargo test -p plugins installs_enables_updates_and_uninstalls_external_plugins -- --nocapture` passed 1/1, confirming the happy-path lifecycle test does not cover interrupted persistence. Code inspection: `plugins/src/lib.rs::PluginManager::uninstall` removes the plugin record from the in-memory registry, then `remove_dir_all(record.install_path)`, then `store_registry(®istry)`, then `write_enabled_state(plugin_id, None)`. If directory removal succeeds but `store_registry` fails, `installed.json` still records the old plugin while the on-disk install directory is gone. If `store_registry` succeeds but `write_enabled_state` fails, the registry and disk are removed but `.claw/settings.json` can still contain `enabledPlugins[plugin_id]=true`. The next `plugin_registry_report` may silently prune the stale registry entry, but the enabled state can remain as orphaned configuration noise and the original uninstall command has already destroyed the install before it can report a clean transactional result. Existing `installs_enables_updates_and_uninstalls_external_plugins` asserts only the success path; it does not simulate registry write failure, settings write failure, or crash after `remove_dir_all`. **Required fix shape:** (a) make plugin uninstall a transaction with a tombstone/backup phase: first persist intended disabled/removed state or acquire a lock, then move the install directory to a same-filesystem trash/backup path, then update registry/settings, then remove backup after all metadata commits succeed; (b) if metadata cleanup fails, restore or report a recoverable tombstoned state with `rollback_available:true`; (c) make stale enabled settings for missing plugins produce a structured warning and auto-clean only through a safe metadata path; (d) add tests injecting failures after directory move/removal, after `store_registry`, and after `write_enabled_state`; (e) reuse the atomic JSON/write-lock helper from #508 so registry/settings writes cannot be partially written. **Why this matters:** uninstall is the destructive plugin lifecycle verb. Deleting files before committing metadata means a transient settings/registry IO failure turns a local cleanup into stale-branch-style plugin confusion: inventory can say installed/enabled while the executable hook/tool files are already gone. Source: gaebal-gajae dogfood response to Clawhip message `1507458055812546630` on 2026-05-22.
|
||||
|
||||
Reference in New Issue
Block a user