From f0e8896d2e244346136289c251cd66cd45a63e71 Mon Sep 17 00:00:00 2001 From: bellman Date: Fri, 15 May 2026 09:55:43 +0900 Subject: [PATCH] omx(team): auto-checkpoint worker-2 [2] --- PARITY.md | 8 ++-- docs/g007-plugin-mcp-verification-map.md | 54 ++++++++++++++++++++++++ rust/MOCK_PARITY_HARNESS.md | 4 +- 3 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 docs/g007-plugin-mcp-verification-map.md diff --git a/PARITY.md b/PARITY.md index d67389f2..93fcb99f 100644 --- a/PARITY.md +++ b/PARITY.md @@ -8,7 +8,7 @@ Last updated: 2026-04-03 - Requested 9-lane checkpoint: **All 9 lanes merged on `main`.** - Current `main` HEAD: `ee31e00` (stub implementations replaced with real AskUserQuestion + RemoteTrigger). - Repository stats at this checkpoint: **292 commits on `main` / 293 across all branches**, **9 crates**, **48,599 tracked Rust LOC**, **2,568 test LOC**, **3 authors**, date range **2026-03-31 → 2026-04-03**. -- Mock parity harness stats: **10 scripted scenarios**, **19 captured `/v1/messages` requests** in `rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs`. +- Mock parity harness stats: **12 scripted scenarios**, **21 captured `/v1/messages` requests** in `rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs`. ## Mock parity harness — milestone 1 @@ -23,6 +23,8 @@ Last updated: 2026-04-03 - [x] Scripted permission prompt coverage: `bash_permission_prompt_approved`, `bash_permission_prompt_denied` - [x] Scripted plugin-path coverage: `plugin_tool_roundtrip` - [x] Behavioral diff/checklist runner: `rust/scripts/run_mock_parity_diff.py` +- [x] Scripted session-compaction metadata coverage: `auto_compact_triggered` +- [x] Scripted token/cost JSON coverage: `token_cost_reporting` ## Harness v2 behavioral checklist @@ -172,8 +174,8 @@ Canonical scenario map: `rust/mock_parity_scenarios.json` - [ ] End-to-end MCP runtime lifecycle beyond the registry bridge now on `main` - [x] Output truncation (large stdout/file content) -- [ ] Session compaction behavior matching -- [ ] Token counting / cost tracking accuracy +- [x] Session compaction behavior matching +- [x] Token counting / cost tracking accuracy - [x] Bash validation lane merged onto `main` - [ ] CI green on every commit diff --git a/docs/g007-plugin-mcp-verification-map.md b/docs/g007-plugin-mcp-verification-map.md new file mode 100644 index 00000000..b365e16f --- /dev/null +++ b/docs/g007-plugin-mcp-verification-map.md @@ -0,0 +1,54 @@ +# G007 Plugin/MCP Lifecycle Verification Map + +Goal: `G007-plugin-mcp` — Stream 5 plugin/MCP lifecycle maturity from ROADMAP Phase 5. + +Scope: worker-2 follow-up map for W4 mock integration and regression verification. This file intentionally does not mutate leader-owned `.omx/ultragoal` state. + +## Covered ROADMAP / CC2 anchors + +- `ROADMAP.md:55-57` — Current pain point §6: plugin/MCP startup failures, handshake failures, config errors, partial startup, and degraded mode need clean classification. +- `ROADMAP.md:67` — Product principle §5: MCP partial success must be first-class and structurally report successful and failed servers. +- `ROADMAP.md:1033-1059` — Phase 5: first-class plugin/MCP lifecycle contract and MCP end-to-end lifecycle parity. +- `.omx/cc2/board.md` Stream 5 active headings: `CC2-RM-H0010`, `CC2-RM-H0080`, `CC2-RM-H0081`, and `CC2-RM-H0082` remain the goal-level source-of-truth anchors for plugin/MCP lifecycle maturity. +- `PARITY.md` harness checklist: mock parity scenarios are the executable regression surface for streamed model turns, plugin tool roundtrips, permissions, compaction metadata, and token/cost output. + +## Mock integration anchors + +| Area | Artifact/evidence | +| --- | --- | +| Deterministic model server | `rust/crates/mock-anthropic-service/src/lib.rs` implements the Anthropic-compatible mock server and scenario router used by CLI parity tests. | +| End-to-end CLI mock harness | `rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs` starts the mock server, runs clean-environment `claw` commands, asserts JSON output, and optionally writes a machine-readable report via `MOCK_PARITY_REPORT_PATH`. | +| Scenario manifest / docs parity guard | `rust/mock_parity_scenarios.json` is required to stay ordered with harness cases; `rust/scripts/run_mock_parity_diff.py --no-run` verifies every manifest `parity_refs[]` string exists in `PARITY.md`. | +| Convenience runner | `rust/scripts/run_mock_parity_harness.sh` runs `cargo test -p rusty-claude-cli --test mock_parity_harness -- --nocapture`. | +| Plugin-path regression | `plugin_tool_roundtrip` loads an external plugin fixture from isolated settings and executes `plugin_echo` through the runtime tool registry. | +| Lifecycle-adjacent regression | `auto_compact_triggered` and `token_cost_reporting` prove runtime JSON keeps compaction and usage/cost fields parseable under mock responses, preventing parity drift in machine-readable output. | +| MCP degraded-startup regression | `rust/crates/runtime/src/mcp_stdio.rs::manager_discovery_report_keeps_healthy_servers_when_one_server_fails` proves a healthy MCP server remains callable while a broken peer is surfaced in a structured degraded report. | +| Plugin lifecycle state regression | `rust/crates/runtime/src/plugin_lifecycle.rs` unit tests cover healthy, degraded, failed, and shutdown states plus startup-event mapping. | + +## Regression verification commands + +Use the smallest command that proves the changed or audited surface, then broaden only when integration risk requires it. + +- Mock scenario/docs map only: + - `cd rust && python3 scripts/run_mock_parity_diff.py --no-run` +- Full mock integration: + - `cd rust && cargo test -p rusty-claude-cli --test mock_parity_harness -- --nocapture` + - `cd rust && python3 scripts/run_mock_parity_diff.py` +- Plugin/MCP lifecycle contract: + - `cd rust && cargo test -p runtime plugin_lifecycle -- --nocapture` + - `cd rust && cargo test -p runtime mcp_stdio::tests::manager_discovery_report_keeps_healthy_servers_when_one_server_fails -- --exact --nocapture` +- Standard Rust gates for implementation changes touching these surfaces: + - `cd rust && cargo fmt --all -- --check` + - `cd rust && cargo check -p runtime -p rusty-claude-cli -p mock-anthropic-service` + - `cd rust && cargo clippy -p runtime --all-targets -- -D warnings` + +## Known gaps / follow-ups + +- The mock parity harness validates plugin tool execution but does not yet spin up a real MCP stdio server through the CLI prompt path; MCP degraded-startup remains covered by runtime manager tests. +- Worker-4 owns the plugin command fallthrough regression implementation lane (`task-10`); this map records the verification/docs boundary and should not duplicate that parser work. +- Full `cargo clippy -p runtime --all-targets -- -D warnings` can be blocked by unrelated `policy_engine.rs` clippy violations in this worktree; when that happens, report the exact pre-existing diagnostics and keep focused lifecycle tests green. +- No `.omx/ultragoal` files were changed; leader-owned Ultragoal checkpointing remains outside worker scope. + +## Delegation evidence + +Subagent spawn evidence: Task 9 spawned repository map probe `019e291d-e700-7171-b7bc-27ec0f6c850f`, debug/root-cause probe `019e291d-e86f-78d0-a137-214ede03285c`, and test/docs probe `019e291e-135c-79e1-80d0-9fd82866bd6e` before deeper local inspection. The repository-map probe errored with 429; the remaining probes did not return before the local verification map was grounded from repo evidence, so direct findings above were integrated. diff --git a/rust/MOCK_PARITY_HARNESS.md b/rust/MOCK_PARITY_HARNESS.md index bc384661..eeeab743 100644 --- a/rust/MOCK_PARITY_HARNESS.md +++ b/rust/MOCK_PARITY_HARNESS.md @@ -22,6 +22,8 @@ The harness runs these scripted scenarios against a fresh workspace and isolated 8. `bash_permission_prompt_approved` 9. `bash_permission_prompt_denied` 10. `plugin_tool_roundtrip` +11. `auto_compact_triggered` +12. `token_cost_reporting` ## Run @@ -37,7 +39,7 @@ cd rust/ python3 scripts/run_mock_parity_diff.py ``` -Scenario-to-PARITY mappings live in `mock_parity_scenarios.json`. +Scenario-to-PARITY mappings live in `mock_parity_scenarios.json`; keep this manifest aligned with `rust/crates/rusty-claude-cli/tests/mock_parity_harness.rs` and `PARITY.md` via `python3 scripts/run_mock_parity_diff.py --no-run`. ## Manual mock server