mirror of
https://github.com/instructkr/claude-code.git
synced 2026-05-30 09:16:44 +00:00
Compare commits
16 Commits
2b0412043a
...
docs/roadm
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3b5bafc847 | ||
|
|
e7074f47ee | ||
|
|
9468383b67 | ||
|
|
1da2781816 | ||
|
|
9037430d52 | ||
|
|
8e22f757d8 | ||
|
|
7676b376ae | ||
|
|
1011a83823 | ||
|
|
1376d92064 | ||
|
|
be53e04671 | ||
|
|
cb56dc12ab | ||
|
|
71686a20fc | ||
|
|
07992b8a1b | ||
|
|
74ea754d29 | ||
|
|
77afde768c | ||
|
|
6db68a2baa |
@@ -7,7 +7,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
- Frameworks: none detected from the supported starter markers.
|
||||
|
||||
## Verification
|
||||
- Run Rust verification from `rust/`: `cargo fmt`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
|
||||
- Run Rust verification from repo root: `scripts/fmt.sh --check`; for formatting use `scripts/fmt.sh`. Run Rust clippy/tests from `rust/`: `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
|
||||
- `src/` and `tests/` are both present; update both surfaces together when behavior changes.
|
||||
|
||||
## Repository shape
|
||||
|
||||
16
ROADMAP.md
16
ROADMAP.md
@@ -2128,7 +2128,7 @@ Original filing (2026-04-13): user requested a `-acp` parameter to support ACP p
|
||||
|
||||
**Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdJ` on main HEAD `b7539e6` in response to Clawhip pinpoint nudge at `1494744278423961742`. Adjacent to #85 (skill discovery ancestor walk) on the *discovery* side — #85 is "skills are discovered too broadly," #95 is "skills are *installed* too broadly." Together they bound the skill-surface trust problem from both the read and the write axes. Distinct sub-cluster from the permission-audit bundle (#50 / #87 / #91 / #94) and from the truth-audit cluster (#80–#87, #89): this is specifically about *scope asymmetry between install and settings* and the *missing uninstall verb*.
|
||||
|
||||
96. **`claw --help`'s "Resume-safe commands:" one-liner summary does not filter `STUB_COMMANDS` — 62 documented slash commands that are explicitly marked unimplemented still show up as valid resume-safe entries, contradicting the main Interactive slash commands list just above it (which *does* filter stubs per ROADMAP #39)** — dogfooded 2026-04-18 on main HEAD `8db8e49` from `/tmp/cdK`. The `render_help` output emits two separate enumerations of slash commands; only one of them applies the stub filter. The Resume-safe summary advertises `/budget`, `/rate-limit`, `/metrics`, `/diagnostics`, `/bookmarks`, `/workspace`, `/reasoning`, `/changelog`, `/vim`, `/summary`, `/brief`, `/advisor`, `/stickers`, `/insights`, `/thinkback`, `/keybindings`, `/privacy-settings`, `/output-style`, `/allowed-tools`, `/tool-details`, `/language`, `/max-tokens`, `/temperature`, `/system-prompt` — all of which are explicitly in `STUB_COMMANDS` with "Did you mean" guards and no parse arm.
|
||||
96. **`claw --help`'s "Resume-safe commands:" one-liner summary does not filter `STUB_COMMANDS` — 62 documented slash commands that are explicitly marked unimplemented still show up as valid resume-safe entries, contradicting the main Interactive slash commands list just above it (which *does* filter stubs per ROADMAP #39)** — **done (verified 2026-04-29):** the Resume-safe command summary now applies the same `STUB_COMMANDS` filter as the Interactive slash command block before rendering help, so unimplemented slash-command stubs no longer advertise as resume-safe. Added `stub_commands_absent_from_resume_safe_help` to lock the filtered one-liner contract alongside the existing REPL completion filter. Fresh proof: `cargo fmt --all --check`, `cargo test -p rusty-claude-cli stub_commands_absent_from_resume_safe_help -- --nocapture`, and `cargo test -p rusty-claude-cli parses_direct_cli_actions -- --nocapture` pass. Original filing below for traceability.
|
||||
|
||||
**Concrete repro.**
|
||||
```
|
||||
@@ -6158,7 +6158,8 @@ load_session('nonexistent') # raises FileNotFoundError with no structured error
|
||||
**Blocker.** None.
|
||||
|
||||
**Source.** Jobdori dogfood sweep 2026-04-22 08:46 KST — inspected `src/session_store.py` public API, confirmed only `save_session` + `load_session` present, no list/delete/exists surface.
|
||||
200. **Interactive MCP/tool permission prompts are invisible blockers** — dogfooded 2026-04-18 from `clawcode-human`. The session emitted `SessionStart hook (completed)` and `UserPromptSubmit hook (completed)`, then stalled on an interactive MCP permission gate (`Allow the omx_memory MCP server to run tool "project_memory_read"?`). From the outside this looks like a ready-but-quiet lane even though the real state is `blocked waiting for permission`. **Required fix shape:** (a) detect interactive MCP/tool permission prompts as a first-class blocked state instead of generic idle; (b) emit a typed event such as `blocked.mcp_permission` / `blocked.tool_permission` with tool/server name, prompt age, and whether the gate is session-only vs always-allow capable; (c) include this gate in startup/no-evidence evidence bundles and lane status surfaces so clawhip can say "blocked at MCP permission prompt" without pane scraping; (d) add a regression proving a prompt-gated session does not get misclassified as stale/idle/ready. **Why this matters:** prompt acceptance and startup telemetry are still incomplete if an interactive MCP gate can eat the first real action after hooks report success. Source: live dogfood session `clawcode-human` on 2026-04-18.
|
||||
200. **Interactive MCP/tool permission prompts are invisible blockers** — **done (verified 2026-04-27):** worker boot observation now detects interactive tool permission gates such as `Allow the omx_memory MCP server to run tool "project_memory_read"?` before generic readiness/idle handling, records `tool_permission_required` status, emits a structured `ToolPermissionPrompt` payload with server/tool identity, prompt age, allow-scope capability, and prompt preview, marks readiness snapshots as blocked, and carries `tool_permission_prompt_detected` through startup timeout evidence so the classifier returns `tool_permission_required` instead of a vague stale/idle/ready outcome. Regression coverage locks both the structured prompt-gate event metadata and startup-timeout classification paths. **Original filing below.**
|
||||
Original filing (2026-04-18): the session emitted `SessionStart hook (completed)` and `UserPromptSubmit hook (completed)`, then stalled on an interactive MCP permission gate (`Allow the omx_memory MCP server to run tool "project_memory_read"?`). From the outside this looks like a ready-but-quiet lane even though the real state is `blocked waiting for permission`. **Required fix shape:** (a) detect interactive MCP/tool permission prompts as a first-class blocked state instead of generic idle; (b) emit a typed event such as `blocked.mcp_permission` / `blocked.tool_permission` with tool/server name, prompt age, and whether the gate is session-only vs always-allow capable; (c) include this gate in startup/no-evidence evidence bundles and lane status surfaces so clawhip can say "blocked at MCP permission prompt" without pane scraping; (d) add a regression proving a prompt-gated session does not get misclassified as stale/idle/ready. **Why this matters:** prompt acceptance and startup telemetry are still incomplete if an interactive MCP gate can eat the first real action after hooks report success. Source: live dogfood session `clawcode-human` on 2026-04-18.
|
||||
|
||||
201. **`extract --model-payload` is not inspectable enough for deterministic dogfood: forced mode selection missing, and hybrid/no-snippet cases are opaque** — dogfooded 2026-04-19 from `dogfood-1776184671` against three real-repo files. `node dist/cli/index.js extract <file> --model-payload` succeeded and auto-selected `raw`, `raw`, and `hybrid`, but there is currently no CLI surface to force `raw` / `compressed` / `hybrid` for A/B comparison: `--mode raw` and `--mode compressed` both fail immediately with `Error: Unexpected extract argument: --mode`. That turns payload-shaping validation into guesswork because operators cannot ask the extractor to render the same file through each mode and compare the exact output. The opacity is worse in the observed hybrid case: the Formbricks checkbox file produced a hybrid payload with no snippets, leaving no visible explanation for why the extractor chose hybrid, what evidence it kept vs dropped, or whether the result is correct vs a silent fallback. **Required fix shape:** (a) add an explicit debug/inspection flag that forces extraction mode (`--mode raw|compressed|hybrid` or equivalent) without changing default auto-selection; (b) print/report the chosen mode and the decision reason in a machine-readable field when `--model-payload` is used; (c) when hybrid emits zero snippets, surface an explicit reason/count summary instead of making "no snippets" indistinguishable from silent loss; (d) add regression coverage on at least one real-world hybrid fixture so mode choice and snippet accounting stay stable. **Why this matters:** direct claw-code dogfood needs deterministic payload comparison to debug startup/context quality; without forced-mode inspection and snippet accounting, operators can see the outcome but not the extraction decision that produced it. Source: live dogfood session `dogfood-1776184671` on 2026-04-19.
|
||||
|
||||
@@ -6252,4 +6253,13 @@ load_session('nonexistent') # raises FileNotFoundError with no structured error
|
||||
|
||||
246. **Dogfood reminder cron can self-fail by timing out during active cycles, so the nudge loop itself is not trustworthy as an observability surface** — dogfooded 2026-04-21 in `#clawcode-building-in-public` after multiple consecutive alerts: `Cron job "clawcode-dogfood-cycle-reminder" failed: cron: job execution timed out` at 14:14, 14:24, 14:34, 14:44, 15:13, and 15:23 KST while the same dogfood cycle was actively producing reports and fixes. This is not just scheduler noise — it is a clawability gap in the reminder/control loop itself. A downstream claw seeing both repeated dogfood nudges and repeated cron timeouts cannot tell whether the reminder actually delivered, partially delivered, duplicated, or died after side effects. **Required fix shape:** (a) classify reminder execution outcome explicitly (`delivered`, `timed_out_after_send`, `timed_out_before_send`, `suppressed_as_duplicate`, `skipped_due_to_active_cycle`) instead of a single generic timeout; (b) attach the target message/report cycle id and whether a Discord post was already emitted before timeout; (c) add a fast-path/no-op path when the cycle state is unchanged or an active report is already in flight so the reminder job can exit cleanly instead of hanging; (d) add regression coverage proving repeated unchanged-state cycles do not stack timeouts or duplicate nudges. **Why this matters:** if the reminder loop itself is ambiguous, claws waste time responding to scheduler artifacts instead of real product state, and the dogfood surface stops being a reliable source of truth. Source: live clawhip/Jobdori dogfood cycle on 2026-04-21 with repeated timeout alerts in `#clawcode-building-in-public`.
|
||||
|
||||
247. **MCP memory permission prompts can recur after a transport failure, leaving an active worker blocked in a second consent loop instead of a typed degraded state** — dogfooded 2026-04-27 from live session `clawcode-human` while responding to the claw-code dogfood nudge. The session first asked permission for `omx_memory.project_memory_read`; after approval, the call failed with `Transport closed`, then the runtime immediately attempted `omx_memory.notepad_read` and blocked again on a fresh allow prompt. From the outside this looks like an automation-hostile MCP lifecycle gap: the worker is neither cleanly ready nor cleanly failed, and downstream claws must scrape the pane to learn that memory MCP is both consent-gated and transport-degraded. **Required fix shape:** (a) after an MCP transport closes, emit a typed degraded state such as `mcp_transport_closed` with server/tool identity; (b) suppress or batch follow-up permission prompts for the same failed MCP server until transport recovery is proven; (c) expose whether the task can continue without that MCP tool or is blocked on memory; (d) add regression coverage for `permission granted -> transport closed -> follow-up tool attempt` so it becomes one structured blocker instead of repeated interactive consent loops. **Why this matters:** MCP memory should either be available, explicitly degraded, or explicitly blocked; repeated permission prompts after a closed transport make prompt delivery and readiness ambiguous. Source: live `clawcode-human` pane on 2026-04-27 04:3x UTC.
|
||||
247. **MCP memory permission prompts can recur after a transport failure, leaving an active worker blocked in a second consent loop instead of a typed degraded state** — dogfooded 2026-04-27 from live session `clawcode-human` while responding to the claw-code dogfood nudge. The session first asked permission for `omx_memory.project_memory_read`; after approval, the call failed with `Transport closed`, then the runtime immediately attempted `omx_memory.notepad_read` and blocked again on a fresh allow prompt. From the outside this looks like an automation-hostile MCP lifecycle gap: the worker is neither cleanly ready nor cleanly failed, and downstream claws must scrape the pane to learn that memory MCP is both consent-gated and transport-degraded. **Required fix shape:** (a) after an MCP transport closes, emit a typed degraded state such as `mcp_transport_closed` with server/tool identity; (b) suppress or batch follow-up permission prompts for the same failed MCP server until transport recovery is proven; (c) expose whether the task can continue without that MCP tool or is blocked on memory; (d) add regression coverage for `permission granted -> transport closed -> follow-up tool attempt` so it becomes one structured blocker instead of repeated interactive consent loops. **Why this matters:** MCP memory should either be available, explicitly degraded, or explicitly blocked; repeated permission prompts after a closed transport make prompt delivery and readiness ambiguous. Source: live `clawcode-human` pane on 2026-04-27 04:3x UTC. **Fresh-run follow-up 2026-04-29:** owner-requested live session `claw-code-issue-247-human-fresh-run` used the actual `./rust/target/debug/claw` binary; `doctor` and `status` were green, so the remaining Phase-0 fresh-run evidence moved from MCP consent-loop reproduction to the non-interactive prompt silent-hang captured separately as #248.
|
||||
|
||||
248. **Non-interactive prompt mode can exceed caller timeouts with no in-band startup/API phase event or partial status artifact** — dogfooded 2026-04-29 from live tmux session `claw-code-issue-247-human-fresh-run` after the owner explicitly asked gaebal-gajae to make a fresh session and use `claw-code` directly. The actual `./rust/target/debug/claw` binary was launched via `clawhip tmux new` on current main. `claw doctor --output-format json` and `claw status --output-format json` both succeeded and reported auth/config/workspace ok, but minimal non-interactive prompt calls (`timeout 120 ./rust/target/debug/claw --output-format json --dangerously-skip-permissions "echo hello"` and `timeout 120 ./rust/target/debug/claw --output-format json prompt "Reply with just the word hello"`) both timed out from the outer harness after roughly 150s with only `Command exceeded timeout` visible. There was no machine-readable `api_request_started`, `waiting_for_first_token`, provider/model/base-url identity, retry count, or partial status file/event that would let clawhip distinguish slow provider, network stall, auth/OAuth drift, stream parser hang, or prompt-mode bug. **Required fix shape:** (a) emit structured non-interactive lifecycle events for `startup_ok`, `api_request_started`, `first_byte/first_token`, retry/backoff, and terminal `timeout_or_stall` states; (b) include provider/model/base URL source and auth source category without leaking secrets; (c) support a CLI/request timeout flag or env override that returns a typed JSON error before the outer orchestrator kills the process; (d) write/emit a final partial status artifact on timeout so lane monitors do not have to infer state from a dead process. **Why this matters:** non-interactive prompt mode is the automation path; if it can hang past the caller's timeout while doctor/status are green, claws lose the ability to tell whether startup, auth, transport, provider latency, or stream consumption failed. Source: live session `claw-code-issue-247-human-fresh-run` on 2026-04-29.
|
||||
|
||||
249. **`/issue` advertises GitHub issue creation but never reaches a GitHub/OAuth/auth preflight or creation path, and the non-interactive error suggests unusable resume forms** — dogfooded 2026-04-29 on current main `8e22f757` while chasing the remaining Phase-0 GitHub OAuth blocker. The visible help advertises `/issue [context]` as “Draft or create a GitHub issue from the conversation,” but the actual implementation path only renders a local `Issue` report (`format_issue_report`) and does not invoke `gh`, GitHub API, OAuth, token discovery, browser auth, or even a dry-run/auth-preflight surface. Direct non-interactive use (`./rust/target/debug/claw '/issue dogfood test'`) returns `slash command /issue dogfood test is interactive-only` and suggests `claw --resume SESSION.jsonl /issue ...` / `claw --resume latest /issue ...` “when the command is marked [resume]”, while `/help` does not mark `/issue` as resume-safe and resume dispatch rejects interactive-only commands. That leaves operators with a GitHub-labeled command whose real behavior is neither issue creation nor a clear GitHub OAuth blocker. **Required fix shape:** (a) split the contract explicitly: either rename/copy to “draft issue text” or implement a real `create` path with GitHub auth preflight; (b) surface a machine-readable GitHub auth state (`gh_cli_authenticated`, `github_token_present`, `oauth_required`, `creation_unavailable`) before any issue-create attempt; (c) make the direct-mode error avoid suggesting resume forms for commands not marked resume-safe; (d) add regression coverage proving `/issue` help, direct-mode rejection, resume support flags, and creation/draft behavior agree. **Why this matters:** Phase-0 GitHub OAuth verification cannot complete if the only GitHub issue surface stops at local prose while still advertising creation. Claws need to know whether they are missing GitHub auth, using a draft-only helper, or hitting an unimplemented creation path. Source: gaebal-gajae dogfood cycle in `#clawcode-building-in-public` on 2026-04-29.
|
||||
322. **Config deprecation warnings are emitted to stderr even under `--output-format json`, making JSON output unparseable from combined stdout+stderr capture** — dogfooded 2026-04-29 by Jobdori on current main (`8e22f75`). Running `cargo run --bin claw -- doctor --output-format json 2>&1 | python3 -c "import sys,json; json.loads(sys.stdin.read())"` fails with `Expecting value: line 1 column 1 (char 0)` because a `warning: /path/settings.json: field "enabledPlugins" is deprecated. Use "plugins.enabled" instead` line is emitted to stderr before the JSON body begins. When a caller captures combined output (the common automation pattern: `2>&1`, subprocess `STDOUT | STDERR`, PTY capture, or tmux pane scrape) the warning prefix breaks JSON parse for every downstream consumer. Root cause: `rust/crates/runtime/src/config.rs` line ~300 calls `eprintln!("warning: {warning}")` unconditionally during `ClawSettings::load_merged()` regardless of active output format. **Required fix shape:** (a) thread the active `CliOutputFormat` through the config loading path and suppress or defer human-readable warning strings when `json` mode is active; (b) instead, collect deprecation diagnostics and inject them into the JSON output as a top-level `"warnings": [...]` array (same field already used by `doctor`); (c) ensure the JSON body is always the first bytes on stdout and all prose warnings stay on stderr or are suppressed in json mode; (d) add regression coverage proving `claw <any-cmd> --output-format json` stdout is valid JSON regardless of config deprecation state. **Why this matters:** `--output-format json` is the automation/claw contract; if config warnings can silently corrupt the JSON stream, every orchestration layer that captures combined output gets broken parse-on-warning with no stable fallback. Source: Jobdori live dogfood on mengmotaHost, claw-code main `8e22f75`, 2026-04-29.
|
||||
|
||||
323. **`status --output-format json` reports `session.session = "live-repl"` while simultaneously reporting `session_lifecycle.kind = "saved_only"` — contradictory session identity in a single status snapshot** — dogfooded 2026-04-29 by Jobdori on current main (`804d96b`). Running `claw status --output-format json` from an active REPL-style invocation produced `"session": "live-repl"` in the `workspace` block and `"session_lifecycle": {"kind": "saved_only", "pane_id": null, ...}` in the same object. Those two fields carry contradictory claims: `"live-repl"` asserts there is an active interactive session, while `"saved_only"` asserts there is no live tmux pane hosting the session — the session exists only as a saved artifact. A downstream claw reading this snapshot cannot tell which claim to trust: is this a running session whose pane is undetectable, or a saved-only session that the `session` field is misclassifying? Root cause: `"live-repl"` is a fallback sentinel emitted by `main.rs:6070` when `context.session_path` is `None`, while `session_lifecycle` is computed independently by `classify_session_lifecycle_for()` from tmux pane discovery; the two fields share no common source and can diverge. **Required fix shape:** (a) derive both `session.session` and `session_lifecycle.kind` from the same lifecycle classification result so they cannot diverge; (b) replace the `"live-repl"` free-form sentinel with a structured `session_kind` field (`live_repl`, `saved`, `resume`, etc.) that carries the same type vocabulary as `session_lifecycle.kind`; (c) when `session_lifecycle.kind = "saved_only"`, never emit `"session": "live-repl"` (or vice versa); (d) add a regression test proving `status --output-format json` never emits `session.kind = "live_repl"` and `session_lifecycle.kind = "saved_only"` simultaneously. **Why this matters:** `status --output-format json` is the machine-readable truth surface for session state; if two fields in the same snapshot contradict each other, every lane, monitor, and orchestrator has to pick a winner instead of reading a coherent state. Source: Jobdori live dogfood on mengmotaHost, claw-code `804d96b`, 2026-04-29.
|
||||
|
||||
332. **`doctor --output-format json` has no top-level `status` field, breaking the `result["status"] == "ok"` check pattern that works for all other JSON commands** — dogfooded 2026-04-29 by Jobdori on current main (`e7074f4`). Running `claw doctor --output-format json | python3 -c "import sys,json; print(json.load(sys.stdin).get('status'))"` prints `None`. The doctor JSON uses `has_failures` (bool) + `summary: {failures, ok, total, warnings}` to express aggregate health, while every other JSON command (`status`, `sandbox`, `stats`, `cost`, etc.) uses a top-level `"status": "ok"|"warn"|"error"` field. A downstream lane checking `result.get("status") == "ok"` will silently misread a doctor output with 2 warnings as if no status were present, instead of getting `"warn"`. **Required fix shape:** (a) add a top-level `"status": "ok" | "warn" | "error"` field to doctor JSON output derived from the same `has_failures`/warning-count logic already present; (b) ensure the field is always present and uses the same vocabulary as other JSON commands; (c) keep `has_failures` and `summary` for backward compat but document `status` as the canonical machine-readable aggregate verdict; (d) add regression coverage proving `claw doctor --output-format json` always includes a non-null `"status"` field. **Why this matters:** `status` is the idiomatic machine-readable health verdict in claw's JSON surface; when `doctor` skips it, automation layers that unify health checks across commands (`doctor` + `status` + `sandbox`) must special-case doctor or silently miss warnings. Source: Jobdori live dogfood on mengmotaHost, claw-code `e7074f4`, 2026-04-29.
|
||||
|
||||
23
progress.txt
23
progress.txt
@@ -74,6 +74,18 @@ US-007 COMPLETE (Phase 5 - Plugin/MCP lifecycle maturity)
|
||||
- DegradedMode behavior
|
||||
- Tests: 11 unit tests passing
|
||||
|
||||
|
||||
Iteration 2026-04-27 - ROADMAP #200 COMPLETED
|
||||
------------------------------------------------
|
||||
- Selected next actionable backlog item because no active task was in progress.
|
||||
- ROADMAP #200: Interactive MCP/tool permission prompts are invisible blockers.
|
||||
- Files: rust/crates/runtime/src/worker_boot.rs, rust/crates/runtime/src/recovery_recipes.rs, ROADMAP.md, progress.txt.
|
||||
- Added tool_permission_required worker status and event classification for interactive MCP/tool permission gates.
|
||||
- Added structured ToolPermissionPrompt payload with server/tool identity and prompt preview.
|
||||
- Startup evidence now records tool_permission_prompt_detected and classifies timeout evidence as tool_permission_required.
|
||||
- Readiness snapshots now mark tool-permission-gated workers as blocked, not ready/idle.
|
||||
- Tests: targeted tool_permission regressions, full runtime test/clippy/fmt pending in Ralph verification loop.
|
||||
|
||||
VERIFICATION STATUS:
|
||||
------------------
|
||||
- cargo build --workspace: PASSED
|
||||
@@ -353,3 +365,14 @@ US-021 COMPLETED (Request body size pre-flight check - from dogfood findings)
|
||||
- Tests: 5 new tests for size estimation and limit checking
|
||||
|
||||
PROJECT STATUS: COMPLETE (21/21 stories)
|
||||
|
||||
Iteration 2026-04-29 - ROADMAP #96 COMPLETED
|
||||
------------------------------------------------
|
||||
- Pulled origin/main: already up to date.
|
||||
- Selected ROADMAP #96 as a small repo-local Immediate Backlog item: the `claw --help` Resume-safe command summary leaked slash-command stubs despite the main Interactive command listing filtering them.
|
||||
- Files: rust/crates/rusty-claude-cli/src/main.rs, ROADMAP.md, progress.txt.
|
||||
- Changed help rendering to filter `resume_supported_slash_commands()` through `STUB_COMMANDS` before building the Resume-safe one-liner.
|
||||
- Added `stub_commands_absent_from_resume_safe_help` regression coverage so future stub additions cannot leak into the Resume-safe summary.
|
||||
- Targeted verification: `cargo test -p rusty-claude-cli stub_commands_absent_from_resume_safe_help -- --nocapture` passed; `cargo test -p rusty-claude-cli parses_direct_cli_actions -- --nocapture` passed.
|
||||
- Format/check verification: `cargo fmt --all --check`, `git diff --check`, and `cargo check -p rusty-claude-cli` passed.
|
||||
- Broader clippy note: `cargo clippy -p rusty-claude-cli --all-targets -- -D warnings` is blocked by pre-existing `clippy::unnecessary_wraps` failures in `rust/crates/commands/src/lib.rs` (`render_mcp_report_for`, `render_mcp_report_json_for`), outside this diff.
|
||||
|
||||
@@ -7,7 +7,8 @@ This file provides guidance to Claw Code (clawcode.dev) when working with code i
|
||||
- Frameworks: none detected from the supported starter markers.
|
||||
|
||||
## Verification
|
||||
- Run Rust verification from the repo root: `cargo fmt`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
|
||||
- From the repository root, run Rust formatting with `scripts/fmt.sh` (or `scripts/fmt.sh --check` for CI-style checks). From this `rust/` directory, the equivalent command is `../scripts/fmt.sh`. Root-level `cargo fmt --manifest-path rust/Cargo.toml` is not the supported formatting command.
|
||||
- From this `rust/` directory, run Rust verification with `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace`.
|
||||
|
||||
## Working agreement
|
||||
- Prefer small, reviewable changes and keep generated bootstrap files aligned with actual repo workflows.
|
||||
|
||||
@@ -753,14 +753,14 @@ mod tests {
|
||||
#[test]
|
||||
fn returns_context_window_metadata_for_kimi_models() {
|
||||
// kimi-k2.5
|
||||
let k25_limit = model_token_limit("kimi-k2.5")
|
||||
.expect("kimi-k2.5 should have token limit metadata");
|
||||
let k25_limit =
|
||||
model_token_limit("kimi-k2.5").expect("kimi-k2.5 should have token limit metadata");
|
||||
assert_eq!(k25_limit.max_output_tokens, 16_384);
|
||||
assert_eq!(k25_limit.context_window_tokens, 256_000);
|
||||
|
||||
// kimi-k1.5
|
||||
let k15_limit = model_token_limit("kimi-k1.5")
|
||||
.expect("kimi-k1.5 should have token limit metadata");
|
||||
let k15_limit =
|
||||
model_token_limit("kimi-k1.5").expect("kimi-k1.5 should have token limit metadata");
|
||||
assert_eq!(k15_limit.max_output_tokens, 16_384);
|
||||
assert_eq!(k15_limit.context_window_tokens, 256_000);
|
||||
}
|
||||
@@ -768,11 +768,13 @@ mod tests {
|
||||
#[test]
|
||||
fn kimi_alias_resolves_to_kimi_k25_token_limits() {
|
||||
// The "kimi" alias resolves to "kimi-k2.5" via resolve_model_alias()
|
||||
let alias_limit = model_token_limit("kimi")
|
||||
.expect("kimi alias should resolve to kimi-k2.5 limits");
|
||||
let direct_limit = model_token_limit("kimi-k2.5")
|
||||
.expect("kimi-k2.5 should have limits");
|
||||
assert_eq!(alias_limit.max_output_tokens, direct_limit.max_output_tokens);
|
||||
let alias_limit =
|
||||
model_token_limit("kimi").expect("kimi alias should resolve to kimi-k2.5 limits");
|
||||
let direct_limit = model_token_limit("kimi-k2.5").expect("kimi-k2.5 should have limits");
|
||||
assert_eq!(
|
||||
alias_limit.max_output_tokens,
|
||||
direct_limit.max_output_tokens
|
||||
);
|
||||
assert_eq!(
|
||||
alias_limit.context_window_tokens,
|
||||
direct_limit.context_window_tokens
|
||||
|
||||
@@ -2195,9 +2195,16 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn provider_specific_size_limits_are_correct() {
|
||||
assert_eq!(OpenAiCompatConfig::dashscope().max_request_body_bytes, 6_291_456); // 6MB
|
||||
assert_eq!(OpenAiCompatConfig::openai().max_request_body_bytes, 104_857_600); // 100MB
|
||||
assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800); // 50MB
|
||||
assert_eq!(
|
||||
OpenAiCompatConfig::dashscope().max_request_body_bytes,
|
||||
6_291_456
|
||||
); // 6MB
|
||||
assert_eq!(
|
||||
OpenAiCompatConfig::openai().max_request_body_bytes,
|
||||
104_857_600
|
||||
); // 100MB
|
||||
assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800);
|
||||
// 50MB
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
@@ -2623,10 +2623,8 @@ fn render_mcp_report_json_for(
|
||||
// runs, the existing serializer adds `status: "ok"` below.
|
||||
match loader.load() {
|
||||
Ok(runtime_config) => {
|
||||
let mut value = render_mcp_summary_report_json(
|
||||
cwd,
|
||||
runtime_config.mcp().servers(),
|
||||
);
|
||||
let mut value =
|
||||
render_mcp_summary_report_json(cwd, runtime_config.mcp().servers());
|
||||
if let Some(map) = value.as_object_mut() {
|
||||
map.insert("status".to_string(), Value::String("ok".to_string()));
|
||||
map.insert("config_load_error".to_string(), Value::Null);
|
||||
|
||||
@@ -45,7 +45,9 @@ impl FailureScenario {
|
||||
#[must_use]
|
||||
pub fn from_worker_failure_kind(kind: WorkerFailureKind) -> Self {
|
||||
match kind {
|
||||
WorkerFailureKind::TrustGate => Self::TrustPromptUnresolved,
|
||||
WorkerFailureKind::TrustGate | WorkerFailureKind::ToolPermissionGate => {
|
||||
Self::TrustPromptUnresolved
|
||||
}
|
||||
WorkerFailureKind::PromptDelivery => Self::PromptMisdelivery,
|
||||
WorkerFailureKind::Protocol => Self::McpHandshakeFailure,
|
||||
WorkerFailureKind::Provider | WorkerFailureKind::StartupNoEvidence => {
|
||||
|
||||
@@ -30,6 +30,7 @@ fn now_secs() -> u64 {
|
||||
pub enum WorkerStatus {
|
||||
Spawning,
|
||||
TrustRequired,
|
||||
ToolPermissionRequired,
|
||||
ReadyForPrompt,
|
||||
Running,
|
||||
Finished,
|
||||
@@ -41,6 +42,7 @@ impl std::fmt::Display for WorkerStatus {
|
||||
match self {
|
||||
Self::Spawning => write!(f, "spawning"),
|
||||
Self::TrustRequired => write!(f, "trust_required"),
|
||||
Self::ToolPermissionRequired => write!(f, "tool_permission_required"),
|
||||
Self::ReadyForPrompt => write!(f, "ready_for_prompt"),
|
||||
Self::Running => write!(f, "running"),
|
||||
Self::Finished => write!(f, "finished"),
|
||||
@@ -53,6 +55,7 @@ impl std::fmt::Display for WorkerStatus {
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum WorkerFailureKind {
|
||||
TrustGate,
|
||||
ToolPermissionGate,
|
||||
PromptDelivery,
|
||||
Protocol,
|
||||
Provider,
|
||||
@@ -71,6 +74,7 @@ pub struct WorkerFailure {
|
||||
pub enum WorkerEventKind {
|
||||
Spawning,
|
||||
TrustRequired,
|
||||
ToolPermissionRequired,
|
||||
TrustResolved,
|
||||
ReadyForPrompt,
|
||||
PromptMisdelivery,
|
||||
@@ -104,6 +108,8 @@ pub enum WorkerPromptTarget {
|
||||
pub enum StartupFailureClassification {
|
||||
/// Trust prompt is required but not detected/resolved
|
||||
TrustRequired,
|
||||
/// Tool permission prompt is required before startup can continue
|
||||
ToolPermissionRequired,
|
||||
/// Prompt was delivered to wrong target (shell misdelivery)
|
||||
PromptMisdelivery,
|
||||
/// Prompt was sent but acceptance timed out
|
||||
@@ -130,6 +136,14 @@ pub struct StartupEvidenceBundle {
|
||||
pub prompt_acceptance_state: bool,
|
||||
/// Result of trust prompt detection at timeout
|
||||
pub trust_prompt_detected: bool,
|
||||
/// Result of tool permission prompt detection at timeout
|
||||
pub tool_permission_prompt_detected: bool,
|
||||
/// Age in seconds of the latest tool permission prompt, when observed
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub tool_permission_prompt_age_seconds: Option<u64>,
|
||||
/// Whether the prompt surface exposed only a session allow path or also an always-allow path
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
pub tool_permission_allow_scope: Option<ToolPermissionAllowScope>,
|
||||
/// Transport health summary (true = healthy/responsive)
|
||||
pub transport_healthy: bool,
|
||||
/// MCP health summary (true = all servers healthy)
|
||||
@@ -146,6 +160,15 @@ pub enum WorkerEventPayload {
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
resolution: Option<WorkerTrustResolution>,
|
||||
},
|
||||
ToolPermissionPrompt {
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
server_name: Option<String>,
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
tool_name: Option<String>,
|
||||
prompt_age_seconds: u64,
|
||||
allow_scope: ToolPermissionAllowScope,
|
||||
prompt_preview: String,
|
||||
},
|
||||
PromptDelivery {
|
||||
prompt_preview: String,
|
||||
observed_target: WorkerPromptTarget,
|
||||
@@ -163,6 +186,14 @@ pub enum WorkerEventPayload {
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
|
||||
#[serde(rename_all = "snake_case")]
|
||||
pub enum ToolPermissionAllowScope {
|
||||
SessionOnly,
|
||||
SessionOrAlways,
|
||||
Unknown,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
|
||||
pub struct WorkerTaskReceipt {
|
||||
pub repo: String,
|
||||
@@ -276,6 +307,29 @@ impl WorkerRegistry {
|
||||
.ok_or_else(|| format!("worker not found: {worker_id}"))?;
|
||||
let lowered = screen_text.to_ascii_lowercase();
|
||||
|
||||
if let Some(tool_prompt) = detect_tool_permission_prompt(screen_text, &lowered) {
|
||||
worker.status = WorkerStatus::ToolPermissionRequired;
|
||||
worker.last_error = Some(WorkerFailure {
|
||||
kind: WorkerFailureKind::ToolPermissionGate,
|
||||
message: tool_prompt.message(),
|
||||
created_at: now_secs(),
|
||||
});
|
||||
push_event(
|
||||
worker,
|
||||
WorkerEventKind::ToolPermissionRequired,
|
||||
WorkerStatus::ToolPermissionRequired,
|
||||
Some("tool permission prompt detected".to_string()),
|
||||
Some(WorkerEventPayload::ToolPermissionPrompt {
|
||||
server_name: tool_prompt.server_name,
|
||||
tool_name: tool_prompt.tool_name,
|
||||
prompt_age_seconds: 0,
|
||||
allow_scope: tool_prompt.allow_scope,
|
||||
prompt_preview: tool_prompt.prompt_preview,
|
||||
}),
|
||||
);
|
||||
return Ok(worker.clone());
|
||||
}
|
||||
|
||||
if !worker.trust_gate_cleared && detect_trust_prompt(&lowered) {
|
||||
worker.status = WorkerStatus::TrustRequired;
|
||||
worker.last_error = Some(WorkerFailure {
|
||||
@@ -503,7 +557,9 @@ impl WorkerRegistry {
|
||||
ready: worker.status == WorkerStatus::ReadyForPrompt,
|
||||
blocked: matches!(
|
||||
worker.status,
|
||||
WorkerStatus::TrustRequired | WorkerStatus::Failed
|
||||
WorkerStatus::TrustRequired
|
||||
| WorkerStatus::ToolPermissionRequired
|
||||
| WorkerStatus::Failed
|
||||
),
|
||||
replay_prompt_ready: worker.replay_prompt.is_some(),
|
||||
last_error: worker.last_error.clone(),
|
||||
@@ -624,6 +680,18 @@ impl WorkerRegistry {
|
||||
|
||||
let now = now_secs();
|
||||
let elapsed = now.saturating_sub(worker.created_at);
|
||||
let latest_tool_permission_event = worker
|
||||
.events
|
||||
.iter()
|
||||
.rev()
|
||||
.find(|event| event.kind == WorkerEventKind::ToolPermissionRequired);
|
||||
let tool_permission_allow_scope =
|
||||
latest_tool_permission_event.and_then(|event| match &event.payload {
|
||||
Some(WorkerEventPayload::ToolPermissionPrompt { allow_scope, .. }) => {
|
||||
Some(*allow_scope)
|
||||
}
|
||||
_ => None,
|
||||
});
|
||||
|
||||
// Build evidence bundle
|
||||
let evidence = StartupEvidenceBundle {
|
||||
@@ -640,6 +708,13 @@ impl WorkerRegistry {
|
||||
.events
|
||||
.iter()
|
||||
.any(|e| e.kind == WorkerEventKind::TrustRequired),
|
||||
tool_permission_prompt_detected: worker
|
||||
.events
|
||||
.iter()
|
||||
.any(|e| e.kind == WorkerEventKind::ToolPermissionRequired),
|
||||
tool_permission_prompt_age_seconds: latest_tool_permission_event
|
||||
.map(|event| now.saturating_sub(event.timestamp)),
|
||||
tool_permission_allow_scope,
|
||||
transport_healthy,
|
||||
mcp_healthy,
|
||||
elapsed_seconds: elapsed,
|
||||
@@ -694,6 +769,13 @@ fn classify_startup_failure(evidence: &StartupEvidenceBundle) -> StartupFailureC
|
||||
return StartupFailureClassification::TrustRequired;
|
||||
}
|
||||
|
||||
// Check for tool permission prompts that were not resolved
|
||||
if evidence.tool_permission_prompt_detected
|
||||
&& evidence.last_lifecycle_state == WorkerStatus::ToolPermissionRequired
|
||||
{
|
||||
return StartupFailureClassification::ToolPermissionRequired;
|
||||
}
|
||||
|
||||
// Check for prompt acceptance timeout
|
||||
if evidence.prompt_sent_at.is_some()
|
||||
&& !evidence.prompt_acceptance_state
|
||||
@@ -815,6 +897,140 @@ fn normalize_path(path: &str) -> PathBuf {
|
||||
std::fs::canonicalize(path).unwrap_or_else(|_| Path::new(path).to_path_buf())
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
struct ToolPermissionPromptObservation {
|
||||
server_name: Option<String>,
|
||||
tool_name: Option<String>,
|
||||
allow_scope: ToolPermissionAllowScope,
|
||||
prompt_preview: String,
|
||||
}
|
||||
|
||||
impl ToolPermissionPromptObservation {
|
||||
fn message(&self) -> String {
|
||||
match (&self.server_name, &self.tool_name) {
|
||||
(Some(server), Some(tool)) => {
|
||||
format!("worker boot blocked on tool permission prompt for {server}.{tool}")
|
||||
}
|
||||
(Some(server), None) => {
|
||||
format!("worker boot blocked on tool permission prompt for {server}")
|
||||
}
|
||||
(None, Some(tool)) => {
|
||||
format!("worker boot blocked on tool permission prompt for {tool}")
|
||||
}
|
||||
(None, None) => "worker boot blocked on tool permission prompt".to_string(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn detect_tool_permission_prompt(
|
||||
screen_text: &str,
|
||||
lowered: &str,
|
||||
) -> Option<ToolPermissionPromptObservation> {
|
||||
let looks_like_prompt = lowered.contains("allow the")
|
||||
&& lowered.contains("server")
|
||||
&& lowered.contains("tool")
|
||||
&& lowered.contains("run");
|
||||
let looks_like_tool_gate = lowered.contains("allow tool") && lowered.contains("run");
|
||||
if !looks_like_prompt && !looks_like_tool_gate {
|
||||
return None;
|
||||
}
|
||||
|
||||
let prompt_line = screen_text
|
||||
.lines()
|
||||
.rev()
|
||||
.find(|line| {
|
||||
let lowered_line = line.to_ascii_lowercase();
|
||||
lowered_line.contains("allow")
|
||||
&& lowered_line.contains("tool")
|
||||
&& (lowered_line.contains("run") || lowered_line.contains("server"))
|
||||
})
|
||||
.unwrap_or(screen_text)
|
||||
.trim();
|
||||
|
||||
let tool_name = extract_quoted_value(prompt_line)
|
||||
.or_else(|| extract_after(prompt_line, "tool ").map(|token| normalize_tool_token(&token)));
|
||||
let server_name = extract_between(prompt_line, "the ", " server")
|
||||
.map(|server| server.trim_end_matches(" MCP").to_string())
|
||||
.or_else(|| {
|
||||
tool_name
|
||||
.as_deref()
|
||||
.and_then(extract_server_from_qualified_tool)
|
||||
});
|
||||
|
||||
Some(ToolPermissionPromptObservation {
|
||||
server_name,
|
||||
tool_name,
|
||||
allow_scope: detect_tool_permission_allow_scope(lowered),
|
||||
prompt_preview: prompt_preview(prompt_line),
|
||||
})
|
||||
}
|
||||
|
||||
fn detect_tool_permission_allow_scope(lowered: &str) -> ToolPermissionAllowScope {
|
||||
let always_allow_capable = [
|
||||
"always allow",
|
||||
"allow always",
|
||||
"allow this tool always",
|
||||
"allow for all sessions",
|
||||
]
|
||||
.iter()
|
||||
.any(|needle| lowered.contains(needle));
|
||||
|
||||
if always_allow_capable {
|
||||
return ToolPermissionAllowScope::SessionOrAlways;
|
||||
}
|
||||
|
||||
let session_allow_capable = [
|
||||
"allow once",
|
||||
"allow for this session",
|
||||
"allow this session",
|
||||
"yes, allow",
|
||||
]
|
||||
.iter()
|
||||
.any(|needle| lowered.contains(needle));
|
||||
|
||||
if session_allow_capable {
|
||||
ToolPermissionAllowScope::SessionOnly
|
||||
} else {
|
||||
ToolPermissionAllowScope::Unknown
|
||||
}
|
||||
}
|
||||
|
||||
fn extract_quoted_value(text: &str) -> Option<String> {
|
||||
let start = text.find('"')? + 1;
|
||||
let rest = &text[start..];
|
||||
let end = rest.find('"')?;
|
||||
Some(rest[..end].to_string())
|
||||
}
|
||||
|
||||
fn extract_between(text: &str, prefix: &str, suffix: &str) -> Option<String> {
|
||||
let start = text.find(prefix)? + prefix.len();
|
||||
let rest = &text[start..];
|
||||
let end = rest.find(suffix)?;
|
||||
let value = rest[..end].trim();
|
||||
(!value.is_empty()).then(|| value.to_string())
|
||||
}
|
||||
|
||||
fn extract_after(text: &str, prefix: &str) -> Option<String> {
|
||||
let start = text.to_ascii_lowercase().find(prefix)? + prefix.len();
|
||||
let value = text[start..]
|
||||
.split_whitespace()
|
||||
.next()?
|
||||
.trim_matches(|ch: char| ch == '?' || ch == ':' || ch == '"' || ch == '\'');
|
||||
(!value.is_empty()).then(|| value.to_string())
|
||||
}
|
||||
|
||||
fn normalize_tool_token(token: &str) -> String {
|
||||
token
|
||||
.trim_matches(|ch: char| ch == '?' || ch == ':' || ch == '"' || ch == '\'')
|
||||
.to_string()
|
||||
}
|
||||
|
||||
fn extract_server_from_qualified_tool(tool: &str) -> Option<String> {
|
||||
let rest = tool.strip_prefix("mcp__")?;
|
||||
let (server, _) = rest.split_once("__")?;
|
||||
(!server.is_empty()).then(|| server.to_string())
|
||||
}
|
||||
|
||||
fn detect_trust_prompt(lowered: &str) -> bool {
|
||||
[
|
||||
"do you trust the files in this folder",
|
||||
@@ -1134,6 +1350,96 @@ mod tests {
|
||||
assert!(detect_ready_for_prompt("│ >", "│ >"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn tool_permission_prompt_blocks_worker_with_structured_event() {
|
||||
let registry = WorkerRegistry::new();
|
||||
let worker = registry.create("/tmp/repo-mcp", &[], true);
|
||||
|
||||
let blocked = registry
|
||||
.observe(
|
||||
&worker.worker_id,
|
||||
"Allow the omx_memory MCP server to run tool \"project_memory_read\"?\n\
|
||||
1. Yes, allow once\n\
|
||||
2. Always allow this tool",
|
||||
)
|
||||
.expect("tool permission observe should succeed");
|
||||
|
||||
assert_eq!(blocked.status, WorkerStatus::ToolPermissionRequired);
|
||||
assert_eq!(
|
||||
blocked
|
||||
.last_error
|
||||
.as_ref()
|
||||
.expect("tool permission error should exist")
|
||||
.kind,
|
||||
WorkerFailureKind::ToolPermissionGate
|
||||
);
|
||||
let event = blocked
|
||||
.events
|
||||
.iter()
|
||||
.find(|event| event.kind == WorkerEventKind::ToolPermissionRequired)
|
||||
.expect("tool permission event should exist");
|
||||
assert_eq!(
|
||||
event.payload,
|
||||
Some(WorkerEventPayload::ToolPermissionPrompt {
|
||||
server_name: Some("omx_memory".to_string()),
|
||||
tool_name: Some("project_memory_read".to_string()),
|
||||
prompt_age_seconds: 0,
|
||||
allow_scope: ToolPermissionAllowScope::SessionOrAlways,
|
||||
prompt_preview: prompt_preview(
|
||||
"Allow the omx_memory MCP server to run tool \"project_memory_read\"?",
|
||||
),
|
||||
})
|
||||
);
|
||||
|
||||
let readiness = registry
|
||||
.await_ready(&worker.worker_id)
|
||||
.expect("ready snapshot should load");
|
||||
assert!(readiness.blocked);
|
||||
assert!(!readiness.ready);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn startup_timeout_classifies_tool_permission_prompt() {
|
||||
let registry = WorkerRegistry::new();
|
||||
let worker = registry.create("/tmp/repo-mcp-timeout", &[], true);
|
||||
|
||||
registry
|
||||
.observe(
|
||||
&worker.worker_id,
|
||||
"Allow the omx_memory MCP server to run tool \"notepad_read\"?\n\
|
||||
1. Yes, allow once",
|
||||
)
|
||||
.expect("tool permission observe should succeed");
|
||||
|
||||
let timed_out = registry
|
||||
.observe_startup_timeout(&worker.worker_id, "claw prompt", true, true)
|
||||
.expect("startup timeout observe should succeed");
|
||||
let event = timed_out
|
||||
.events
|
||||
.iter()
|
||||
.find(|event| event.kind == WorkerEventKind::StartupNoEvidence)
|
||||
.expect("startup no evidence event should exist");
|
||||
|
||||
match event.payload.as_ref() {
|
||||
Some(WorkerEventPayload::StartupNoEvidence {
|
||||
classification,
|
||||
evidence,
|
||||
}) => {
|
||||
assert_eq!(
|
||||
*classification,
|
||||
StartupFailureClassification::ToolPermissionRequired
|
||||
);
|
||||
assert!(evidence.tool_permission_prompt_detected);
|
||||
assert_eq!(
|
||||
evidence.tool_permission_allow_scope,
|
||||
Some(ToolPermissionAllowScope::SessionOnly)
|
||||
);
|
||||
assert!(evidence.tool_permission_prompt_age_seconds.is_some());
|
||||
}
|
||||
_ => panic!("expected StartupNoEvidence payload"),
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn prompt_misdelivery_is_detected_and_replay_can_be_rearmed() {
|
||||
let registry = WorkerRegistry::new();
|
||||
@@ -1634,6 +1940,9 @@ mod tests {
|
||||
prompt_sent_at: Some(1_234_567_890),
|
||||
prompt_acceptance_state: false,
|
||||
trust_prompt_detected: true,
|
||||
tool_permission_prompt_detected: false,
|
||||
tool_permission_prompt_age_seconds: None,
|
||||
tool_permission_allow_scope: None,
|
||||
transport_healthy: true,
|
||||
mcp_healthy: false,
|
||||
elapsed_seconds: 60,
|
||||
@@ -1661,6 +1970,9 @@ mod tests {
|
||||
prompt_sent_at: None,
|
||||
prompt_acceptance_state: false,
|
||||
trust_prompt_detected: false,
|
||||
tool_permission_prompt_detected: false,
|
||||
tool_permission_prompt_age_seconds: None,
|
||||
tool_permission_allow_scope: None,
|
||||
transport_healthy: false,
|
||||
mcp_healthy: true,
|
||||
elapsed_seconds: 30,
|
||||
@@ -1678,6 +1990,9 @@ mod tests {
|
||||
prompt_sent_at: None,
|
||||
prompt_acceptance_state: false,
|
||||
trust_prompt_detected: false,
|
||||
tool_permission_prompt_detected: false,
|
||||
tool_permission_prompt_age_seconds: None,
|
||||
tool_permission_allow_scope: None,
|
||||
transport_healthy: true,
|
||||
mcp_healthy: true,
|
||||
elapsed_seconds: 10,
|
||||
@@ -1697,6 +2012,9 @@ mod tests {
|
||||
prompt_sent_at: None, // No prompt sent yet
|
||||
prompt_acceptance_state: false,
|
||||
trust_prompt_detected: false,
|
||||
tool_permission_prompt_detected: false,
|
||||
tool_permission_prompt_age_seconds: None,
|
||||
tool_permission_allow_scope: None,
|
||||
transport_healthy: true,
|
||||
mcp_healthy: false, // MCP unhealthy but transport healthy suggests crash
|
||||
elapsed_seconds: 45,
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -172,7 +172,10 @@ stderr:
|
||||
);
|
||||
let stdout = String::from_utf8(output.stdout).expect("stdout should be utf8");
|
||||
let parsed: Value = serde_json::from_str(&stdout).expect("compact json stdout should parse");
|
||||
assert_eq!(parsed["message"], "Mock streaming says hello from the parity harness.");
|
||||
assert_eq!(
|
||||
parsed["message"],
|
||||
"Mock streaming says hello from the parity harness."
|
||||
);
|
||||
assert_eq!(parsed["compact"], true);
|
||||
assert_eq!(parsed["model"], "claude-sonnet-4-6");
|
||||
assert!(parsed["usage"].is_object());
|
||||
|
||||
@@ -240,6 +240,13 @@ impl GlobalToolRegistry {
|
||||
}
|
||||
}
|
||||
|
||||
if allowed.is_empty() {
|
||||
return Err(format!(
|
||||
"--allowedTools was provided with no usable tool names (got `{}`). Omit the flag to allow all tools.",
|
||||
values.join(" ")
|
||||
));
|
||||
}
|
||||
|
||||
Ok(Some(allowed))
|
||||
}
|
||||
|
||||
@@ -6883,6 +6890,21 @@ mod tests {
|
||||
assert!(empty_permission.contains("unsupported plugin permission: "));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn allowed_tools_rejects_empty_token_lists() {
|
||||
let registry = GlobalToolRegistry::builtin();
|
||||
|
||||
for raw in ["", ",,", " "] {
|
||||
let err = registry
|
||||
.normalize_allowed_tools(&[raw.to_string()])
|
||||
.expect_err("empty allow-list input should be rejected");
|
||||
assert!(
|
||||
err.contains("--allowedTools was provided with no usable tool names"),
|
||||
"unexpected error for {raw:?}: {err}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn runtime_tools_extend_registry_definitions_permissions_and_search() {
|
||||
let registry = GlobalToolRegistry::builtin()
|
||||
|
||||
7
scripts/fmt.sh
Executable file
7
scripts/fmt.sh
Executable file
@@ -0,0 +1,7 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
cd "$REPO_ROOT/rust"
|
||||
exec cargo fmt "$@"
|
||||
Reference in New Issue
Block a user