docs(roadmap): add mcp degraded missing tools gap

2026-05-22 21:56:45 +00:00 · 2026-05-22 18:01:37 +00:00
parent f50625a04b
commit 1df41147e3
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6733,3 +6733,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 589. **Managed-proxy MCP transport has no auth field or `requires_user_auth` path, so protected proxy endpoints cannot use the same OAuth/auth lifecycle as HTTP/SSE** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 17:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a93f36d`. Active tmux sessions at probe time: `gajae-issue-351-ops-surface-inventory-snapshot`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Channel context included Jobdori's #597 WebSocket OAuth gap, so this probe checked the remaining remote/proxy MCP transport shapes. Code inspection: `runtime/src/config.rs::McpManagedProxyServerConfig` has only `{ url, id }`, `parse_mcp_server_config` for `"claudeai-proxy"` parses only those two fields, `runtime/src/mcp_client.rs::McpClientTransport::ManagedProxy` stores `McpManagedProxyTransport { url, id }` with no `auth`, and `commands/src/lib.rs` reports only URL/proxy id in text/JSON. Unlike SSE/HTTP `McpRemoteTransport`, the managed-proxy path cannot carry `McpOAuthConfig`, cannot report `requires_user_auth()`, and cannot participate in the same user-auth/preflight lifecycle. A protected managed proxy must either be unauthenticated, rely on credentials encoded in URL/query (already a reporting leak class), or use an out-of-band mechanism invisible to `mcp list/show/doctor`. **Required fix shape:** (a) add `oauth: Option<McpOAuthConfig>` or an explicit managed-proxy auth config to `McpManagedProxyServerConfig`; (b) include `auth: McpClientAuth` in `McpManagedProxyTransport` or otherwise expose a shared `requires_user_auth` trait across all remote transports; (c) parse/report the auth requirement in `mcp show/list` without leaking tokens; (d) add tests for `claudeai-proxy` with OAuth proving bootstrap requires user auth and JSON/text surfaces expose non-secret auth metadata; (e) align with the WebSocket fix so every network MCP transport (SSE/HTTP/WS/managed-proxy) has symmetric auth configuration or an explicit documented reason why not. **Why this matters:** managed proxy is a remote MCP lifecycle path. If it cannot express auth, operators lose preflight visibility and are pushed toward URL/header secret hacks that diagnostics either leak (#90) or cannot validate. Source: gaebal-gajae dogfood response to Clawhip message `1507427853031968799` on 2026-05-22.

 590. **Configured remote MCP transports are parsed and shown as first-class, but runtime `McpServerManager` only starts stdio servers and silently degrades every HTTP/SSE/WS/SDK/managed-proxy server into an unsupported registration failure** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 17:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@106c243`. Active tmux sessions at probe time: `gajae-issue-353-review-ci-verdict-transition-receipt`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/config.rs` parses transport variants `stdio`, `sse`, `http`, `ws`, `sdk`, and `claudeai-proxy`; `commands/src/lib.rs` renders their URL/header/OAuth/proxy details in `mcp list/show`. But `runtime/src/mcp_stdio.rs::McpServerManager::from_servers` inserts only `McpTransport::Stdio` into `managed_servers`; every other transport is pushed into `unsupported_servers` with reason `transport X is not supported by McpServerManager`. `discover_tools_best_effort` later turns those into `McpLifecyclePhase::ServerRegistration` failures/degraded startup, while `discover_tools` over `server_names()` ignores them entirely because `server_names()` only returns managed stdio server keys. Existing test `manager_records_unsupported_non_stdio_servers_without_panicking` locks the current behavior for http/sdk/ws as expected. **Required fix shape:** (a) decide whether remote transports are supported in this build; if not, make config parsing/show/list label them as `configured_but_runtime_unsupported` rather than implying they are usable; (b) if they are intended to work, implement separate managers/clients for HTTP/SSE/WS/managed-proxy and route discovery/tool calls by transport; (c) surface unsupported required remote servers as hard startup blockers in prompt/runtime preflight, and optional ones as typed degraded state, not just best-effort discovery noise; (d) add JSON fields in `mcp list/show/doctor` such as `runtime_supported:false`, `unsupported_reason`, and `required`; (e) add regressions proving `discover_tools` and prompt startup cannot silently ignore required non-stdio servers. **Why this matters:** users can configure and inspect remote MCP servers today, including auth fields for HTTP/SSE, but the actual runtime path does not run them. That split creates event/log opacity: `mcp show` looks configured while prompt-time tool discovery either degrades or omits the server, so operators cannot tell from control-plane surfaces whether remote MCP tools are actually reachable. Source: gaebal-gajae dogfood response to Clawhip message `1507435406277476393` on 2026-05-22.
+
+591. **MCP degraded reports always compute `missing_tools` as empty because degraded construction passes `available_tools` (or `Vec::new()`) as the expected-tool set, so failed/unsupported servers never surface which tools disappeared** — dogfooded 2026-05-22 from the `#clawcode-building-in-public` 18:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f50625a`. Active tmux sessions at probe time: `gajae-issue-355-backlog-zero-next-artifact-queue`, `omx-fooks-issue-1046-clean-epic-only-spawn-child`; no active claw-code implementation session. Code inspection: `runtime/src/mcp_lifecycle_hardened.rs::McpDegradedReport::new` can compute `missing_tools` by subtracting `available_tools` from `expected_tools`. But both producers discard the useful expected set: `runtime/src/mcp_stdio.rs::discover_tools_best_effort` passes `Vec::new()` for `expected_tools`, and `rusty-claude-cli/src/main.rs::RuntimeMcpState::new` passes `available_tools.clone()` as both `available_tools` and `expected_tools`. Therefore `missing_tools` is always empty, even when required servers fail discovery or unsupported remote transports are configured. Existing tests assert `degraded.missing_tools.is_empty()`, locking the broken contract. The only visible degraded data is failed server names/phases; operators cannot tell which qualified tool names vanished from the tool surface. **Required fix shape:** (a) retain each server's advertised/expected tool list from prior successful discovery, static config, manifest metadata, or at least the expected server/tool namespace when known; (b) pass that set into `McpDegradedReport::new` instead of `available_tools`/empty; (c) if exact tools are unknown, expose `missing_tool_names_unknown_for_servers:[...]` so the report is honest rather than falsely empty; (d) update tests to assert non-empty `missing_tools` or explicit unknown-missing metadata when a server with known tools fails; (e) thread the same degraded metadata into `ToolSearch` so models see which tools are unavailable, not just that a server failed. **Why this matters:** degraded startup is supposed to make partial success first-class. An always-empty `missing_tools` field falsely suggests no capability loss, hiding the actual impact of MCP failures and unsupported remote transports from both humans and automation. Source: gaebal-gajae dogfood response to Clawhip message `1507442954308948040` on 2026-05-22.