docs(roadmap): add #371 — /cache returns kind=stats; /providers returns doctor response

Merge pull request #2899 from ultraworkers/docs/roadmap-353-plugins-uninstall-stderr-only
docs(roadmap): add #353 — plugins uninstall json error is stderr-only
2026-05-13 17:36:44 +00:00 · 2026-04-30 09:31:01 +09:00 · 2026-04-30 09:01:02 +09:00 · 2026-04-29 23:31:56 +00:00 · 2026-04-30 08:30:58 +09:00 · 2026-04-29 23:02:03 +00:00
48 changed files with 3485 additions and 7300 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -8,8 +8,5 @@ archive/
 # Claw Code local artifacts
 .claw/settings.local.json
 .claw/sessions/
-# #160/#166: default session storage directory (flush-transcript output,
-# dogfood runs, etc.). Claws specifying --directory elsewhere are fine.
-.port_sessions/
 .clawhip/
 status-help.txt
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,195 +1,21 @@
-# CLAUDE.md — Python Reference Implementation
+# CLAUDE.md

-**This file guides work on `src/` and `tests/` — the Python reference harness for claw-code protocol.**
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

-The production CLI lives in `rust/`; this directory (`src/`, `tests/`, `.py` files) is a **protocol validation and dogfood surface**.
+## Detected stack
+- Languages: Rust.
+- Frameworks: none detected from the supported starter markers.

-## What this Python harness does
-
-**Machine-first orchestration layer** — proves that the claw-code JSON protocol is:
- Deterministic and recoverable (every output is reproducible)
- Self-describing (SCHEMAS.md documents every field)
- Clawable (external agents can build ONE error handler for all commands)
-
-## Stack
- **Language:** Python 3.13+
- **Dependencies:** minimal (no frameworks; pure stdlibs + attrs/dataclasses)
- **Test runner:** pytest
- **Protocol contract:** SCHEMAS.md (machine-readable JSON envelope)
-
-## Quick start
-
-```bash
-# 1. Install dependencies (if not already in venv)
-python3 -m venv .venv && source .venv/bin/activate
-# (dependencies minimal; standard library mostly)
-
-# 2. Run tests
-python3 -m pytest tests/ -q
-
-# 3. Try a command
-python3 -m src.main bootstrap "hello" --output-format json | python3 -m json.tool
-```
-
-## Verification workflow
-
-```bash
-# Unit tests (fast)
-python3 -m pytest tests/ -q 2>&1 | tail -3
-
-# Type checking (optional but recommended)
-python3 -m mypy src/ --ignore-missing-imports 2>&1 | tail -5
-```
+## Verification
+- Run Rust verification from repo root: `scripts/fmt.sh --check`; for formatting use `scripts/fmt.sh`. Run Rust clippy/tests from `rust/`: `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
+- `src/` and `tests/` are both present; update both surfaces together when behavior changes.

 ## Repository shape
+- `rust/` contains the Rust workspace and active CLI/runtime implementation.
+- `src/` contains source files that should stay consistent with generated guidance and tests.
+- `tests/` contains validation surfaces that should be reviewed alongside code changes.

- **`src/`** — Python reference harness implementing SCHEMAS.md protocol
-  - `main.py` — CLI entry point; all 14 clawable commands
-  - `query_engine.py` — core TurnResult / QueryEngineConfig
-  - `runtime.py` — PortRuntime; turn loop + cancellation (#164 Stage A/B)
-  - `session_store.py` — session persistence
-  - `transcript.py` — turn transcript assembly
-  - `commands.py`, `tools.py` — simulated command/tool trees
-  - `models.py` — PermissionDenial, UsageSummary, etc.
-
- **`tests/`** — comprehensive protocol validation (22 baseline → 192 passing as of 2026-04-22)
-  - `test_cli_parity_audit.py` — proves all 14 clawable commands accept --output-format
-  - `test_json_envelope_field_consistency.py` — validates SCHEMAS.md contract
-  - `test_cancel_observed_field.py` — #164 Stage B: cancellation observability + safe-to-reuse semantics
-  - `test_run_turn_loop_*.py` — turn loop behavior (timeout, cancellation, continuation, permissions)
-  - `test_submit_message_*.py` — budget, cancellation contracts
-  - `test_*_cli.py` — command-specific JSON output validation
-
- **`SCHEMAS.md`** — canonical JSON contract
-  - Common fields (all envelopes): timestamp, command, exit_code, output_format, schema_version
-  - Error envelope shape
-  - Not-found envelope shape
-  - Per-command success schemas (14 commands documented)
-  - Turn Result fields (including cancel_observed as of #164 Stage B)
-
- **`.gitignore`** — excludes `.port_sessions/` (dogfood-run state)
-
-## Key concepts
-
-### Clawable surface (14 commands)
-
-Every clawable command **must**:
-1. Accept `--output-format {text,json}`
-2. Return JSON envelopes matching SCHEMAS.md
-3. Use common fields (timestamp, command, exit_code, output_format, schema_version)
-4. Exit 0 on success, 1 on error/not-found, 2 on timeout
-
-**Commands:** list-sessions, delete-session, load-session, flush-transcript, show-command, show-tool, exec-command, exec-tool, route, bootstrap, command-graph, tool-pool, bootstrap-graph, turn-loop
-
-**Validation:** `test_cli_parity_audit.py` auto-tests all 14 for --output-format acceptance.
-
-### OPT_OUT surfaces (12 commands)
-
-Explicitly exempt from --output-format requirement (for now):
- Rich-Markdown reports: summary, manifest, parity-audit, setup-report
- List commands with query filters: subsystems, commands, tools
- Simulation/debug: remote-mode, ssh-mode, teleport-mode, direct-connect-mode, deep-link-mode
-
-**Future work:** audit OPT_OUT surfaces for JSON promotion (post-#164).
-
-### Protocol layers
-
-**Coverage (#167–#170):** All clawable commands emit JSON
-**Enforcement (#171):** Parity CI prevents new commands skipping JSON
-**Documentation (#172):** SCHEMAS.md locks field contract
-**Alignment (#173):** Test framework validates docs ↔ code match
-**Field evolution (#164 Stage B):** cancel_observed proves protocol extensibility
-
-## Testing & coverage
-
-### Run full suite
-```bash
-python3 -m pytest tests/ -q
-```
-
-### Run one test file
-```bash
-python3 -m pytest tests/test_cancel_observed_field.py -v
-```
-
-### Run one test
-```bash
-python3 -m pytest tests/test_cancel_observed_field.py::TestCancelObservedField::test_default_value_is_false -v
-```
-
-### Check coverage (optional)
-```bash
-python3 -m pip install coverage  # if not already installed
-python3 -m coverage run -m pytest tests/
-python3 -m coverage report --skip-covered
-```
-
-Target: >90% line coverage for src/ (currently ~85%).
-
-## Common workflows
-
-### Add a new clawable command
-
-1. Add parser in `main.py` (argparse)
-2. Add `--output-format` flag
-3. Emit JSON envelope using `wrap_json_envelope(data, command_name)`
-4. Add command to CLAWABLE_SURFACES in test_cli_parity_audit.py
-5. Document in SCHEMAS.md (schema + example)
-6. Write test in tests/test_*_cli.py or tests/test_json_envelope_field_consistency.py
-7. Run full suite to confirm parity
-
-### Modify TurnResult or protocol fields
-
-1. Update dataclass in `query_engine.py`
-2. Update SCHEMAS.md with new field + rationale
-3. Write test in `tests/test_json_envelope_field_consistency.py` that validates field presence
-4. Update all places that construct TurnResult (grep for `TurnResult(`)
-5. Update bootstrap/turn-loop JSON builders in main.py
-6. Run `tests/` to ensure no regressions
-
-### Promote an OPT_OUT surface to CLAWABLE
-
-**Prerequisite:** Real demand signal logged in `OPT_OUT_DEMAND_LOG.md` (threshold: 2+ independent signals per surface). Speculative promotions are not allowed.
-
-Once demand is evidenced:
-1. Add --output-format flag to argparse
-2. Emit wrap_json_envelope() output in JSON path
-3. Move command from OPT_OUT_SURFACES to CLAWABLE_SURFACES
-4. Document in SCHEMAS.md
-5. Write test for JSON output
-6. Run parity audit to confirm no regressions
-7. Update `OPT_OUT_DEMAND_LOG.md` to mark signal as resolved
-
-### File a demand signal (when a claw actually needs JSON from an OPT_OUT surface)
-
-1. Open `OPT_OUT_DEMAND_LOG.md`
-2. Find the surface's entry under Group A/B/C
-3. Append a dated entry with Source, Use Case, and Markdown-alternative-checked explanation
-4. If this is the 2nd signal for the same surface, file a promotion pinpoint in ROADMAP.md
-
-## Dogfood principles
-
-The Python harness is continuously dogfood-tested:
- Every cycle ships to `main` with detailed commit messages
- New tests are written before/alongside implementation
- Test suite must pass before pushing (zero-regression principle)
- Commits grouped by pinpoint (#159, #160, ..., #174)
- Failure modes classified per exit code: 0=success, 1=error, 2=timeout
-
-## Protocol governance
-
- **SCHEMAS.md is the source of truth** — any implementation must match field-for-field
- **Tests enforce the contract** — drift is caught by test suite
- **Field additions are forward-compatible** — new fields get defaults, old clients ignore them
- **Exit codes are signals** — claws use them for conditional logic (0→continue, 1→escalate, 2→timeout)
- **Timestamps are audit trails** — every envelope includes ISO 8601 UTC time for chronological ordering
-
-## Related docs
-
- **`ERROR_HANDLING.md`** — Unified error-handling pattern for claws (one handler for all 14 clawable commands)
- **`SCHEMAS.md`** — JSON protocol specification (read before implementing)
- **`OPT_OUT_AUDIT.md`** — Governance for the 12 non-clawable surfaces
- **`OPT_OUT_DEMAND_LOG.md`** — Active survey recording real demand signals (evidence base for decisions)
- **`ROADMAP.md`** — macro roadmap and macro pain points
- **`PHILOSOPHY.md`** — system design intent
- **`PARITY.md`** — status of Python ↔ Rust protocol equivalence
+## Working agreement
+- Prefer small, reviewable changes and keep generated bootstrap files aligned with actual repo workflows.
+- Keep shared defaults in `.claude.json`; reserve `.claude/settings.local.json` for machine-local overrides.
+- Do not overwrite existing `CLAUDE.md` content automatically; update it intentionally when repo workflows change.
--- a/ERROR_HANDLING.md
+++ b/ERROR_HANDLING.md
@@ -1,489 +0,0 @@
-# Error Handling for Claw Code Claws
-
-**Purpose:** Build a unified error handler for orchestration code using claw-code as a library or subprocess.
-
-After cycles #178–#179 (parser-front-door hole closure), claw-code's error interface is deterministic, machine-readable, and clawable: **one error handler for all 14 clawable commands.**
-
---
-
-## Quick Reference: Exit Codes and Envelopes
-
-Every clawable command returns JSON on stdout when `--output-format json` is requested.
-
-**IMPORTANT:** The exit code contract below applies **only when `--output-format json` is explicitly set**. Text mode follows argparse conventions and may return different exit codes (e.g., `2` for argparse parse errors). Claws consuming claw-code as a subprocess MUST always pass `--output-format json` to get the documented contract.
-
-| Exit Code | Meaning | Response Format | Example |
-|---|---|---|---|
-| **0** | Success | `{success fields}` | `{"session_id": "...", "loaded": true}` |
-| **1** | Error / Not Found | `{error: {kind, message, ...}}` | `{"error": {"kind": "session_not_found", ...}}` |
-| **2** | Timeout | `{final_stop_reason: "timeout", final_cancel_observed: ...}` | `{"final_stop_reason": "timeout", ...}` |
-
-### Text mode vs JSON mode exit codes
-
-| Scenario | Text mode exit | JSON mode exit | Why |
-|---|---|---|---|
-| Unknown subcommand | 2 (argparse default) | 1 (parse error envelope) | argparse defaults to 2; JSON mode normalizes to contract |
-| Missing required arg | 2 (argparse default) | 1 (parse error envelope) | Same reason |
-| Session not found | 1 | 1 | Application-level error, same in both |
-| Command executed OK | 0 | 0 | Success path, identical |
-| Turn-loop timeout | 2 | 2 | Identical (#161 implementation) |
-
-**Practical rule for claws:** always pass `--output-format json`. This eliminates text-mode surprises and gives you the documented exit-code contract for every error path.
-
---
-
-## One-Handler Pattern
-
-Build a single error-recovery function that works for all 14 clawable commands:
-
-```python
-import subprocess
-import json
-import sys
-from typing import Any
-
-def run_claw_command(command: list[str], timeout_seconds: float = 30.0) -> dict[str, Any]:
-    """
-    Run a clawable claw-code command and handle errors uniformly.
-    
-    Args:
-        command: Full command list, e.g. ["claw", "load-session", "id", "--output-format", "json"]
-        timeout_seconds: Wall-clock timeout
-    
-    Returns:
-        Parsed JSON result from stdout
-    
-    Raises:
-        ClawError: Classified by error.kind (parse, session_not_found, runtime, timeout, etc.)
-    """
-    try:
-        result = subprocess.run(
-            command,
-            capture_output=True,
-            text=True,
-            timeout=timeout_seconds,
-        )
-    except subprocess.TimeoutExpired:
-        raise ClawError(
-            kind='subprocess_timeout',
-            message=f'Command exceeded {timeout_seconds}s wall-clock timeout',
-            retryable=True,  # Caller's decision; subprocess timeout != engine timeout
-        )
-    
-    # Parse JSON (valid for all success/error/timeout paths in claw-code)
-    try:
-        envelope = json.loads(result.stdout)
-    except json.JSONDecodeError as err:
-        raise ClawError(
-            kind='parse_failure',
-            message=f'Command output is not JSON: {err}',
-            hint='Check that --output-format json is being passed',
-            retryable=False,
-        )
-    
-    # Classify by exit code and error.kind
-    match (result.returncode, envelope.get('error', {}).get('kind')):
-        case (0, _):
-            # Success
-            return envelope
-        
-        case (1, 'parse'):
-            # #179: argparse error — typically a typo or missing required argument
-            raise ClawError(
-                kind='parse',
-                message=envelope['error']['message'],
-                hint=envelope['error'].get('hint'),
-                retryable=False,  # Typos don't fix themselves
-            )
-        
-        case (1, 'session_not_found'):
-            # Common: load-session on nonexistent ID
-            raise ClawError(
-                kind='session_not_found',
-                message=envelope['error']['message'],
-                session_id=envelope.get('session_id'),
-                retryable=False,  # Session won't appear on retry
-            )
-        
-        case (1, 'filesystem'):
-            # Directory missing, permission denied, disk full
-            raise ClawError(
-                kind='filesystem',
-                message=envelope['error']['message'],
-                retryable=True,  # Might be transient (disk space, NFS flake)
-            )
-        
-        case (1, 'runtime'):
-            # Generic engine error (unexpected exception, malformed input, etc.)
-            raise ClawError(
-                kind='runtime',
-                message=envelope['error']['message'],
-                retryable=envelope['error'].get('retryable', False),
-            )
-        
-        case (1, _):
-            # Catch-all for any new error.kind values
-            raise ClawError(
-                kind=envelope['error']['kind'],
-                message=envelope['error']['message'],
-                retryable=envelope['error'].get('retryable', False),
-            )
-        
-        case (2, _):
-            # Timeout (engine was asked to cancel and had fair chance to observe)
-            cancel_observed = envelope.get('final_cancel_observed', False)
-            raise ClawError(
-                kind='timeout',
-                message=f'Turn exceeded timeout (cancel_observed={cancel_observed})',
-                cancel_observed=cancel_observed,
-                retryable=True,  # Caller can retry with a fresh session
-                safe_to_reuse_session=(cancel_observed is True),
-            )
-        
-        case (exit_code, _):
-            # Unexpected exit code
-            raise ClawError(
-                kind='unexpected_exit_code',
-                message=f'Unexpected exit code {exit_code}',
-                retryable=False,
-            )
-
-
-class ClawError(Exception):
-    """Unified error type for claw-code commands."""
-    
-    def __init__(
-        self,
-        kind: str,
-        message: str,
-        hint: str | None = None,
-        retryable: bool = False,
-        cancel_observed: bool = False,
-        safe_to_reuse_session: bool = False,
-        session_id: str | None = None,
-    ):
-        self.kind = kind
-        self.message = message
-        self.hint = hint
-        self.retryable = retryable
-        self.cancel_observed = cancel_observed
-        self.safe_to_reuse_session = safe_to_reuse_session
-        self.session_id = session_id
-        super().__init__(self.message)
-    
-    def __str__(self) -> str:
-        parts = [f"{self.kind}: {self.message}"]
-        if self.hint:
-            parts.append(f"Hint: {self.hint}")
-        if self.retryable:
-            parts.append("(retryable)")
-        if self.cancel_observed:
-            parts.append(f"(safe_to_reuse_session={self.safe_to_reuse_session})")
-        return "\n".join(parts)
-```
-
---
-
-## Practical Recovery Patterns
-
-### Pattern 1: Retry on transient errors
-
-```python
-from time import sleep
-
-def run_with_retry(
-    command: list[str],
-    max_attempts: int = 3,
-    backoff_seconds: float = 0.5,
-) -> dict:
-    """Retry on transient errors (filesystem, timeout)."""
-    for attempt in range(1, max_attempts + 1):
-        try:
-            return run_claw_command(command)
-        except ClawError as err:
-            if not err.retryable:
-                raise  # Non-transient; fail fast
-            
-            if attempt == max_attempts:
-                raise  # Last attempt; propagate
-            
-            print(f"Attempt {attempt} failed ({err.kind}); retrying in {backoff_seconds}s...", file=sys.stderr)
-            sleep(backoff_seconds)
-            backoff_seconds *= 1.5  # exponential backoff
-    
-    raise RuntimeError("Unreachable")
-```
-
-### Pattern 2: Reuse session after timeout (if safe)
-
-```python
-def run_with_timeout_recovery(
-    command: list[str],
-    timeout_seconds: float = 30.0,
-    fallback_timeout: float = 60.0,
-) -> dict:
-    """
-    On timeout, check cancel_observed. If True, the session is safe for retry.
-    If False, the session is potentially wedged; use a fresh one.
-    """
-    try:
-        return run_claw_command(command, timeout_seconds=timeout_seconds)
-    except ClawError as err:
-        if err.kind != 'timeout':
-            raise
-        
-        if err.safe_to_reuse_session:
-            # Engine saw the cancel signal; safe to reuse this session with a larger timeout
-            print(f"Timeout observed (cancel_observed=true); retrying with {fallback_timeout}s...", file=sys.stderr)
-            return run_claw_command(command, timeout_seconds=fallback_timeout)
-        else:
-            # Engine didn't see the cancel signal; session may be wedged
-            print(f"Timeout not observed (cancel_observed=false); session is potentially wedged", file=sys.stderr)
-            raise  # Caller should allocate a fresh session
-```
-
-### Pattern 3: Detect parse errors (typos in command-line construction)
-
-```python
-def validate_command_before_dispatch(command: list[str]) -> None:
-    """
-    Dry-run with --help to detect obvious syntax errors before dispatching work.
-    
-    This is cheap (no API call) and catches typos like:
-    - Unknown subcommand: `claw typo-command`
-    - Unknown flag: `claw bootstrap --invalid-flag`
-    - Missing required argument: `claw load-session` (no session_id)
-    """
-    help_cmd = command + ['--help']
-    try:
-        result = subprocess.run(help_cmd, capture_output=True, timeout=2.0)
-        if result.returncode != 0:
-            print(f"Warning: {' '.join(help_cmd)} returned {result.returncode}", file=sys.stderr)
-            print("(This doesn't prove the command is invalid, just that --help failed)", file=sys.stderr)
-    except subprocess.TimeoutExpired:
-        pass  # --help shouldn't hang, but don't block on it
-```
-
-### Pattern 4: Log and forward errors to observability
-
-```python
-import logging
-
-logger = logging.getLogger(__name__)
-
-def run_claw_with_logging(command: list[str]) -> dict:
-    """Run command and log errors for observability."""
-    try:
-        result = run_claw_command(command)
-        logger.info(f"Claw command succeeded: {' '.join(command)}")
-        return result
-    except ClawError as err:
-        logger.error(
-            "Claw command failed",
-            extra={
-                'command': ' '.join(command),
-                'error_kind': err.kind,
-                'error_message': err.message,
-                'retryable': err.retryable,
-                'cancel_observed': err.cancel_observed,
-            },
-        )
-        raise
-```
-
---
-
-## Error Kinds (Enumeration)
-
-After cycles #178–#179, the complete set of `error.kind` values is:
-
-| Kind | Exit Code | Meaning | Retryable | Notes |
-|---|---|---|---|---|
-| **parse** | 1 | Argparse error (unknown command, missing arg, invalid flag) | No | Real error message included (#179); valid choices list for discoverability |
-| **session_not_found** | 1 | load-session target doesn't exist | No | session_id and directory included in envelope |
-| **filesystem** | 1 | Directory missing, permission denied, disk full | Yes | Transient issues (disk space, NFS flake) can be retried |
-| **runtime** | 1 | Engine error (unexpected exception, malformed input) | Depends | `error.retryable` field in envelope specifies |
-| **timeout** | 2 | Engine timeout with cooperative cancellation | Yes* | `cancel_observed` field signals session safety (#164) |
-
-*Retry safety depends on `cancel_observed`:
- `cancel_observed=true` → session is safe to reuse
- `cancel_observed=false` → session may be wedged; allocate fresh one
-
---
-
-## What We Did to Make This Work
-
-### Cycle #178: Parse-Error Envelope
-
-**Problem:** `claw nonexistent --output-format json` returned argparse help text on stderr instead of an envelope.
-**Solution:** Catch argparse `SystemExit` in JSON mode and emit a structured error envelope.
-**Benefit:** Claws no longer need to parse human help text to understand parse errors.
-
-### Cycle #179: Stderr Hygiene + Real Error Message
-
-**Problem:** Even after #178, argparse usage was leaking to stderr AND the envelope message was generic ("invalid command or argument").
-**Solution:** Monkey-patch `parser.error()` in JSON mode to raise an internal exception, preserving argparse's real message verbatim. Suppress stderr entirely in JSON mode.
-**Benefit:** Claws see one stream (stdout), one envelope, and real error context (e.g., "invalid choice: typo (choose from ...)") for discoverability.
-
-### Contract: #164 Stage B (`cancel_observed` field)
-
-**Problem:** Timeout results didn't signal whether the engine actually observed the cancellation request.
-**Solution:** Add `cancel_observed: bool` field to timeout TurnResult; signal true iff the engine had a fair chance to observe the cancel event.
-**Benefit:** Claws can decide "retry with fresh session" vs "reuse this session with larger timeout" based on a single boolean.
-
---
-
-## Common Mistakes to Avoid
-
-❌ **Don't parse exit code alone**  
-```python
-# BAD: Exit code 1 could mean parse error, not-found, filesystem, or runtime
-if result.returncode == 1:
-    # What should I do? Unclear.
-    pass
-```
-
-✅ **Do parse error.kind**  
-```python
-# GOOD: error.kind tells you exactly how to recover
-match envelope['error']['kind']:
-    case 'parse': ...
-    case 'session_not_found': ...
-    case 'filesystem': ...
-```
-
---
-
-❌ **Don't capture both stdout and stderr and assume they're separate concerns**  
-```python
-# BAD (pre-#179): Capture stdout + stderr, then parse stdout as JSON
-# But stderr might contain argparse noise that you have to string-match
-result = subprocess.run(..., capture_output=True, text=True)
-if "invalid choice" in result.stderr:
-    # ... custom error handling
-```
-
-✅ **Do silence stderr in JSON mode**  
-```python
-# GOOD (post-#179): In JSON mode, stderr is guaranteed silent
-# Envelope on stdout is your single source of truth
-result = subprocess.run(..., capture_output=True, text=True)
-envelope = json.loads(result.stdout)  # Always valid in JSON mode
-```
-
---
-
-❌ **Don't retry on parse errors**  
-```python
-# BAD: Typos don't fix themselves
-error_kind = envelope['error']['kind']
-if error_kind == 'parse':
-    retry()  # Will fail again
-```
-
-✅ **Do check retryable before retrying**  
-```python
-# GOOD: Let the error tell you
-error = envelope['error']
-if error.get('retryable', False):
-    retry()
-else:
-    raise
-```
-
---
-
-❌ **Don't reuse a session after timeout without checking cancel_observed**  
-```python
-# BAD: Reuse session = potential wedge
-result = run_claw_command(...)  # times out
-# ... later, reuse same session
-result = run_claw_command(...)  # might be stuck in the previous turn
-```
-
-✅ **Do allocate a fresh session if cancel_observed=false**  
-```python
-# GOOD: Allocate fresh session if wedge is suspected
-try:
-    result = run_claw_command(...)
-except ClawError as err:
-    if err.cancel_observed:
-        # Safe to reuse
-        result = run_claw_command(...)
-    else:
-        # Allocate fresh session
-        fresh_session = create_session()
-        result = run_claw_command_in_session(fresh_session, ...)
-```
-
---
-
-## Testing Your Error Handler
-
-```python
-def test_error_handler_parse_error():
-    """Verify parse errors are caught and classified."""
-    try:
-        run_claw_command(['claw', 'nonexistent', '--output-format', 'json'])
-        assert False, "Should have raised ClawError"
-    except ClawError as err:
-        assert err.kind == 'parse'
-        assert 'invalid choice' in err.message.lower()
-        assert err.retryable is False
-
-def test_error_handler_timeout_safe():
-    """Verify timeout with cancel_observed=true marks session as safe."""
-    # Requires a live claw-code server; mock this test
-    try:
-        run_claw_command(
-            ['claw', 'turn-loop', '"x"', '--timeout-seconds', '0.0001'],
-            timeout_seconds=2.0,
-        )
-        assert False, "Should have raised ClawError"
-    except ClawError as err:
-        assert err.kind == 'timeout'
-        assert err.safe_to_reuse_session is True  # cancel_observed=true
-
-def test_error_handler_not_found():
-    """Verify session_not_found is clearly classified."""
-    try:
-        run_claw_command(['claw', 'load-session', 'nonexistent', '--output-format', 'json'])
-        assert False, "Should have raised ClawError"
-    except ClawError as err:
-        assert err.kind == 'session_not_found'
-        assert err.retryable is False
-```
-
---
-
-## Appendix: SCHEMAS.md Error Shape
-
-For reference, the canonical JSON error envelope shape (SCHEMAS.md):
-
-```json
-{
-  "timestamp": "2026-04-22T11:40:00Z",
-  "command": "load-session",
-  "exit_code": 1,
-  "output_format": "json",
-  "schema_version": "1.0",
-  "error": {
-    "kind": "session_not_found",
-    "operation": "session_store.load_session",
-    "target": "nonexistent",
-    "retryable": false,
-    "message": "session 'nonexistent' not found in .port_sessions",
-    "hint": "use 'list-sessions' to see available sessions"
-  }
-}
-```
-
-All commands that emit errors follow this shape (with error.kind varying). See `SCHEMAS.md` for the complete contract.
-
---
-
-## Summary
-
-After cycles #178–#179, **one error handler works for all 14 clawable commands.** No more string-matching, no more stderr parsing, no more exit-code ambiguity. Just parse the JSON, check `error.kind`, and decide: retry, escalate, or reuse session (if safe).
-
-The handler itself is ~80 lines of Python; the patterns are reusable across any language that can speak JSON.
--- a/OPT_OUT_AUDIT.md
+++ b/OPT_OUT_AUDIT.md
@@ -1,151 +0,0 @@
-# OPT_OUT Surface Audit Roadmap
-
-**Status:** Pre-audit (decision table ready, survey pending)
-
-This document governs the audit and potential promotion of 12 OPT_OUT surfaces (commands that currently do **not** support `--output-format json`).
-
-## OPT_OUT Classification Rationale
-
-A surface is classified as OPT_OUT when:
-1. **Human-first by nature:** Rich Markdown prose / diagrams / structured text where JSON would be information loss
-2. **Query-filtered alternative exists:** Commands with internal `--query` / `--limit` don't need JSON (users already have escape hatch)
-3. **Simulation/debug only:** Not meant for production orchestration (e.g., mode simulators)
-4. **Future JSON work is planned:** Documented in ROADMAP with clear upgrade path
-
---
-
-## OPT_OUT Surfaces (12 Total)
-
-### Group A: Rich-Markdown Reports (4 commands)
-
-**Rationale:** These emit structured narrative prose. JSON would require lossy serialization.
-
-| Command | Output | Current use | JSON case |
-|---|---|---|---|
-| `summary` | Multi-section workspace summary (Markdown) | Human readability | Not applicable; Markdown is the output |
-| `manifest` | Workspace manifest with project tree (Markdown) | Human readability | Not applicable; Markdown is the output |
-| `parity-audit` | TypeScript/Python port comparison report (Markdown) | Human readability | Not applicable; Markdown is the output |
-| `setup-report` | Preflight + startup diagnostics (Markdown) | Human readability | Not applicable; Markdown is the output |
-
-**Audit decision:** These likely remain OPT_OUT long-term (Markdown-as-output is intentional). If JSON version needed in future, would be a separate `--output-format json` path generating structured data (project summary object, manifest array, audit deltas, setup checklist) — but that's a **new contract**, not an addition to existing Markdown surfaces.
-
-**Pinpoint:** #175 (deferred) — audit whether `summary`/`manifest` should emit JSON structured versions *in parallel* with Markdown, or if Markdown-only is the right UX.
-
---
-
-### Group B: List Commands with Query Filters (3 commands)
-
-**Rationale:** These already support `--query` and `--limit` for filtering. JSON output would be redundant; users can pipe to `jq`.
-
-| Command | Filtering | Current output | JSON case |
-|---|---|---|---|
-| `subsystems` | `--limit` | Human-readable list | Use `--query` to filter, users can parse if needed |
-| `commands` | `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` | Human-readable list | Use `--query` to filter, users can parse if needed |
-| `tools` | `--query`, `--limit`, `--simple-mode` | Human-readable list | Use `--query` to filter, users can parse if needed |
-
-**Audit decision:** `--query` / `--limit` are already the machine-friendly escape hatch. These commands are **intentionally** list-filter-based (not orchestration-primary). Promoting to CLAWABLE would require:
-1. Formalizing what the structured output *is* (command array? tool array?)
-2. Versioning the schema per command
-3. Updating tests to validate per-command schemas
-
-**Cost-benefit:** Low. Users who need structured data can already use `--query` to narrow results, then parse. Effort to promote > value.
-
-**Pinpoint:** #176 (backlog) — audit `--query` UX; consider if a `--query-json` escape hatch (output JSON of matching items) is worth the schema tax.
-
---
-
-### Group C: Simulation / Debug Surfaces (5 commands)
-
-**Rationale:** These are intentionally **not production-orchestrated**. They simulate behavior, test modes, or debug scenarios. JSON output doesn't add value.
-
-| Command | Purpose | Output | Use case |
-|---|---|---|---|
-| `remote-mode` | Simulate remote execution | Text (mock session) | Testing harness behavior under remote constraints |
-| `ssh-mode` | Simulate SSH execution | Text (mock SSH session) | Testing harness behavior over SSH-like transport |
-| `teleport-mode` | Simulate teleport hop | Text (mock hop session) | Testing harness behavior with teleport bouncing |
-| `direct-connect-mode` | Simulate direct network | Text (mock session) | Testing harness behavior with direct connectivity |
-| `deep-link-mode` | Simulate deep-link invocation | Text (mock deep-link) | Testing harness behavior from URL/deeplink |
-
-**Audit decision:** These are **intentionally simulation-only**. Promoting to CLAWABLE means:
-1. "This simulated mode is now a valid orchestration surface"
-2. Need to define what JSON output *means* (mock session state? simulation log?)
-3. Need versioning + test coverage
-
-**Cost-benefit:** Very low. These are debugging tools, not orchestration endpoints. Effort to promote >> value.
-
-**Pinpoint:** #177 (backlog) — decide if mode simulators should ever be CLAWABLE (probably no).
-
---
-
-## Audit Workflow (Future Cycles)
-
-### For each surface:
-1. **Survey:** Check if any external claw actually uses --output-format with this surface
-2. **Cost estimate:** How much schema work + testing?
-3. **Value estimate:** How much demand for JSON version?
-4. **Decision:** CLAWABLE, remain OPT_OUT, or new pinpoint?
-
-### Promotion criteria (if promoting to CLAWABLE):
-
-A surface moves from OPT_OUT → CLAWABLE **only if**:
- ✅ Clear use case for JSON (not just "hypothetically could be JSON")
- ✅ Schema is simple and stable (not 20+ fields)
- ✅ At least one external claw has requested it
- ✅ Tests can be added without major refactor
- ✅ Maintainability burden is worth the value
-
-### Demote criteria (if staying OPT_OUT):
-
-A surface stays OPT_OUT **if**:
- ✅ JSON would be information loss (Markdown reports)
- ✅ Equivalent filtering already exists (`--query` / `--limit`)
- ✅ Use case is simulation/debug, not production
- ✅ Promotion effort > value to users
-
---
-
-## Post-Audit Outcomes
-
-### Likely scenario (high confidence)
-
-**Group A (Markdown reports):** Remain OPT_OUT
- `summary`, `manifest`, `parity-audit`, `setup-report` are **intentionally** human-first
- If JSON-like structure is needed in future, would be separate `*-json` commands or distinct `--output-format`, not added to Markdown surfaces
-
-**Group B (List filters):** Remain OPT_OUT
- `subsystems`, `commands`, `tools` have `--query` / `--limit` as query layer
- Users who need structured data already have escape hatch
-
-**Group C (Mode simulators):** Remain OPT_OUT
- `remote-mode`, `ssh-mode`, etc. are debug tools, not orchestration endpoints
- No demand for JSON version; promotion would be forced, not driven
-
-**Result:** OPT_OUT audit concludes that 12/12 surfaces should **remain OPT_OUT** (no promotions).
-
-### If demand emerges
-
-If external claws report needing JSON from any OPT_OUT surface:
-1. File pinpoint with use case + rationale
-2. Estimate cost + value
-3. If value > cost, promote to CLAWABLE with full test coverage
-4. Update SCHEMAS.md
-5. Update CLAUDE.md
-
---
-
-## Timeline
-
- **Post-#174 (now):** OPT_OUT audit documented (this file)
- **Cycles #19–#21 (deferred):** Survey period — collect data on external demand
- **Cycle #22 (deferred):** Final audit decision + any promotions
- **Post-audit:** Move to protocol maintenance mode (new commands/fields/surfaces)
-
---
-
-## Related
-
- **OPT_OUT_DEMAND_LOG.md** — Active survey recording real demand signals (evidentiary base for any promotion decision)
- **SCHEMAS.md** — Clawable surface contracts
- **CLAUDE.md** — Development guidance
- **test_cli_parity_audit.py** — Parametrized tests for CLAWABLE_SURFACES enforcement
- **ROADMAP.md** — Macro phases (this audit is Phase 3 before Phase 2 closure)
--- a/OPT_OUT_DEMAND_LOG.md
+++ b/OPT_OUT_DEMAND_LOG.md
@@ -1,167 +0,0 @@
-# OPT_OUT Demand Log
-
-**Purpose:** Record real demand signals for promoting OPT_OUT surfaces to CLAWABLE. Without this log, the audit criteria in `OPT_OUT_AUDIT.md` have no evidentiary base.
-
-**Status:** Active survey window (post-#178/#179, cycles #21+)
-
-## How to file a demand signal
-
-When any external claw, operator, or downstream consumer actually needs JSON output from one of the 12 OPT_OUT surfaces, add an entry below. **Speculation, "could be useful someday," and internal hypotheticals do NOT count.**
-
-A valid signal requires:
- **Source:** Who/what asked (human, automation, agent session, external tool)
- **Surface:** Which OPT_OUT command (from the 12)
- **Use case:** The concrete orchestration problem they're trying to solve
- **Would-parse-Markdown alternative checked?** Why the existing OPT_OUT output is insufficient
- **Date:** When the signal was received
-
-## Promotion thresholds
-
-Per `OPT_OUT_AUDIT.md` criteria:
- **2+ independent signals** for the same surface within a survey window → file promotion pinpoint
- **1 signal + existing stable schema** → file pinpoint for discussion
- **0 signals** → surface stays OPT_OUT (documented rationale in audit file)
-
-The threshold is intentionally high. Single-use hacks can be served via one-off Markdown parsing; schema promotion is expensive (docs, tests, maintenance).
-
---
-
-## Demand Signals Received
-
-### Group A: Rich-Markdown Reports
-
-#### `summary`
-**Signals received: 0**
-
-Notes: No demand recorded. Markdown output is intentional and useful for human review.
-
-#### `manifest`
-**Signals received: 0**
-
-Notes: No demand recorded.
-
-#### `parity-audit`
-**Signals received: 0**
-
-Notes: No demand recorded. Report consumers are humans reviewing porting progress, not automation.
-
-#### `setup-report`
-**Signals received: 0**
-
-Notes: No demand recorded.
-
---
-
-### Group B: List Commands with Query Filters
-
-#### `subsystems`
-**Signals received: 0**
-
-Notes: `--limit` already provides filtering. No claws requesting JSON.
-
-#### `commands`
-**Signals received: 0**
-
-Notes: `--query`, `--limit`, `--no-plugin-commands`, `--no-skill-commands` already allow filtering. No demand recorded.
-
-#### `tools`
-**Signals received: 0**
-
-Notes: `--query`, `--limit`, `--simple-mode` provide filtering. No demand recorded.
-
---
-
-### Group C: Simulation / Debug Surfaces
-
-#### `remote-mode`
-**Signals received: 0**
-
-Notes: Simulation-only. No production orchestration need.
-
-#### `ssh-mode`
-**Signals received: 0**
-
-Notes: Simulation-only.
-
-#### `teleport-mode`
-**Signals received: 0**
-
-Notes: Simulation-only.
-
-#### `direct-connect-mode`
-**Signals received: 0**
-
-Notes: Simulation-only.
-
-#### `deep-link-mode`
-**Signals received: 0**
-
-Notes: Simulation-only.
-
---
-
-## Survey Window Status
-
-| Cycle | Date | New Signals | Running Total | Action |
-|---|---|---|---|---|
-| #21 | 2026-04-22 | 0 | 0 | Survey opened; log established |
-
-**Current assessment:** Zero demand for any OPT_OUT surface promotion. This is consistent with `OPT_OUT_AUDIT.md` prediction that all 12 likely stay OPT_OUT long-term.
-
---
-
-## Signal Entry Template
-
-```
-### <surface-name>
-**Signal received: [N]**
-
-Entry N (YYYY-MM-DD):
- Source: <who/what>
- Use case: <concrete orchestration problem>
- Markdown-alternative-checked: <yes/no + why insufficient>
- Follow-up: <filed pinpoint / discussion thread / closed>
-```
-
---
-
-## Decision Framework
-
-At cycle #22 (or whenever survey window closes):
-
-### If 0 signals total (likely):
- Move all 12 surfaces to `PERMANENTLY_OPT_OUT` or similar
- Remove `OPT_OUT_SURFACES` from `test_cli_parity_audit.py` (everything is explicitly non-goal)
- Update `CLAUDE.md` to reflect maintainership mode
- Close `OPT_OUT_AUDIT.md` with "audit complete, no promotions"
-
-### If 1–2 signals on isolated surfaces:
- File individual promotion pinpoints per surface with demand evidence
- Each goes through standard #171/#172/#173 loop (parity audit, SCHEMAS.md, consistency test)
-
-### If high demand (3+ signals):
- Reopen audit: is the OPT_OUT classification actually correct?
- Review whether protocol expansion is warranted
-
---
-
-## Related Files
-
- **`OPT_OUT_AUDIT.md`** — Audit criteria, decision table, rationale by group
- **`SCHEMAS.md`** — JSON contract for the 14 CLAWABLE surfaces
- **`tests/test_cli_parity_audit.py`** — Machine enforcement of CLAWABLE/OPT_OUT classification
- **`CLAUDE.md`** — Development posture (maintainership mode)
-
---
-
-## Philosophy
-
-**Prevent speculative expansion.** The discipline of requiring real signals before promotion protects the protocol from schema bloat. Every new CLAWABLE surface adds:
- A SCHEMAS.md section (maintenance burden)
- Test coverage (test suite tax)
- Documentation (cognitive load for new developers)
- Version compatibility (schema_version bump risk)
-
-If a claw can't articulate *why* it needs JSON for `summary` beyond "it would be nice," then JSON for `summary` is not needed. The Markdown output is a feature, not a gap.
-
-The audit log closes the loop on "governed non-goals": OPT_OUT surfaces are intentionally not clawable until proven otherwise by evidence.
--- a/README.md
+++ b/README.md
@@ -5,8 +5,6 @@
  ·
  <a href="./USAGE.md">Usage</a>
  ·
-  <a href="./ERROR_HANDLING.md">Error Handling</a>
-  ·
  <a href="./rust/README.md">Rust workspace</a>
  ·
  <a href="./PARITY.md">Parity</a>
@@ -42,11 +40,9 @@ The canonical implementation lives in [`rust/`](./rust), and the current source

 - **`rust/`** — canonical Rust workspace and the `claw` CLI binary
 - **`USAGE.md`** — task-oriented usage guide for the current product surface
- **`ERROR_HANDLING.md`** — unified error-handling pattern for orchestration code
 - **`PARITY.md`** — Rust-port parity status and migration notes
 - **`ROADMAP.md`** — active roadmap and cleanup backlog
 - **`PHILOSOPHY.md`** — project intent and system-design framing
- **`SCHEMAS.md`** — JSON protocol contract (Python harness reference)
 - **`src/` + `tests/`** — companion Python/reference workspace and audit helpers; not the primary runtime surface

 ## Quick start
--- a/ROADMAP.md
+++ b/ROADMAP.md
--- a/SCHEMAS.md
+++ b/SCHEMAS.md
@@ -1,377 +0,0 @@
-# JSON Envelope Schemas — Clawable CLI Contract
-
-This document locks the field-level contract for all clawable-surface commands. Every command accepting `--output-format json` must conform to the envelope shapes below.
-
-**Target audience:** Claws building orchestrators, automation, or monitoring against claw-code's JSON output.
-
---
-
-## Common Fields (All Envelopes)
-
-Every command response, success or error, carries:
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "list-sessions",
-  "exit_code": 0,
-  "output_format": "json",
-  "schema_version": "1.0"
-}
-```
-
-| Field | Type | Required | Notes |
-|---|---|---|---|
-| `timestamp` | ISO 8601 UTC | Yes | Time command completed |
-| `command` | string | Yes | argv[1] (e.g. "list-sessions") |
-| `exit_code` | int (0/1/2) | Yes | 0=success, 1=error/not-found, 2=timeout |
-| `output_format` | string | Yes | Always "json" (for symmetry with text mode) |
-| `schema_version` | string | Yes | "1.0" (bump for breaking changes) |
-
---
-
-## Turn Result Fields (Multi-Turn Sessions)
-
-When a command's response includes a `turn` object (e.g., in `bootstrap` or `turn-loop`), it carries:
-
-| Field | Type | Required | Notes |
-|---|---|---|---|
-| `prompt` | string | Yes | User input for this turn |
-| `output` | string | Yes | Assistant response |
-| `stop_reason` | enum | Yes | One of: `completed`, `timeout`, `cancelled`, `max_budget_reached`, `max_turns_reached` |
-| `cancel_observed` | bool | Yes | #164 Stage B: cancellation was signaled and observed (#161/#164) |
-
---
-
-## Error Envelope
-
-When a command fails (exit code 1), responses carry:
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "exec-command",
-  "exit_code": 1,
-  "error": {
-    "kind": "filesystem",
-    "operation": "write",
-    "target": "/tmp/nonexistent/out.md",
-    "retryable": true,
-    "message": "No such file or directory",
-    "hint": "intermediate directory does not exist; try mkdir -p /tmp/nonexistent"
-  }
-}
-```
-
-| Field | Type | Required | Notes |
-|---|---|---|---|
-| `error.kind` | enum | Yes | One of: `filesystem`, `auth`, `session`, `parse`, `runtime`, `mcp`, `delivery`, `usage`, `policy`, `unknown` |
-| `error.operation` | string | Yes | Syscall/method that failed (e.g. "write", "open", "resolve_session") |
-| `error.target` | string | Yes | Resource that failed (path, session-id, server-name, etc.) |
-| `error.retryable` | bool | Yes | Whether caller can safely retry without intervention |
-| `error.message` | string | Yes | Platform error message (e.g. errno text) |
-| `error.hint` | string | No | Optional actionable next step |
-
---
-
-## Not-Found Envelope
-
-When an entity does not exist (exit code 1, but not a failure):
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "load-session",
-  "exit_code": 1,
-  "name": "does-not-exist",
-  "found": false,
-  "error": {
-    "kind": "session_not_found",
-    "message": "session 'does-not-exist' not found in .claw/sessions/",
-    "retryable": false
-  }
-}
-```
-
-| Field | Type | Required | Notes |
-|---|---|---|---|
-| `name` | string | Yes | Entity name/id that was looked up |
-| `found` | bool | Yes | Always `false` for not-found |
-| `error.kind` | enum | Yes | One of: `command_not_found`, `tool_not_found`, `session_not_found` |
-| `error.message` | string | Yes | User-visible explanation |
-| `error.retryable` | bool | Yes | Usually `false` (entity will not magically appear) |
-
---
-
-## Per-Command Success Schemas
-
-### `list-sessions`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "list-sessions",
-  "exit_code": 0,
-  "output_format": "json",
-  "schema_version": "1.0",
-  "directory": ".claw/sessions",
-  "sessions_count": 2,
-  "sessions": [
-    {
-      "session_id": "sess_abc123",
-      "created_at": "2026-04-21T15:30:00Z",
-      "last_modified": "2026-04-22T09:45:00Z",
-      "prompt_count": 5,
-      "stopped": false
-    }
-  ]
-}
-```
-
-### `delete-session`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "delete-session",
-  "exit_code": 0,
-  "session_id": "sess_abc123",
-  "deleted": true,
-  "directory": ".claw/sessions"
-}
-```
-
-### `load-session`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "load-session",
-  "exit_code": 0,
-  "session_id": "sess_abc123",
-  "loaded": true,
-  "directory": ".claw/sessions",
-  "path": ".claw/sessions/sess_abc123.jsonl"
-}
-```
-
-### `flush-transcript`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "flush-transcript",
-  "exit_code": 0,
-  "session_id": "sess_abc123",
-  "path": ".claw/sessions/sess_abc123.jsonl",
-  "flushed": true,
-  "messages_count": 12,
-  "input_tokens": 4500,
-  "output_tokens": 1200
-}
-```
-
-### `show-command`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "show-command",
-  "exit_code": 0,
-  "name": "add-dir",
-  "found": true,
-  "source_hint": "commands/add-dir/add-dir.tsx",
-  "responsibility": "creates a new directory in the worktree"
-}
-```
-
-### `show-tool`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "show-tool",
-  "exit_code": 0,
-  "name": "BashTool",
-  "found": true,
-  "source_hint": "tools/BashTool/BashTool.tsx"
-}
-```
-
-### `exec-command`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "exec-command",
-  "exit_code": 0,
-  "name": "add-dir",
-  "prompt": "create src/util/",
-  "handled": true,
-  "message": "created directory",
-  "source_hint": "commands/add-dir/add-dir.tsx"
-}
-```
-
-### `exec-tool`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "exec-tool",
-  "exit_code": 0,
-  "name": "BashTool",
-  "payload": "cargo build",
-  "handled": true,
-  "message": "exit code 0",
-  "source_hint": "tools/BashTool/BashTool.tsx"
-}
-```
-
-### `route`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "route",
-  "exit_code": 0,
-  "prompt": "add a test",
-  "limit": 10,
-  "match_count": 3,
-  "matches": [
-    {
-      "kind": "command",
-      "name": "add-file",
-      "score": 0.92,
-      "source_hint": "commands/add-file/add-file.tsx"
-    }
-  ]
-}
-```
-
-### `bootstrap`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "bootstrap",
-  "exit_code": 0,
-  "prompt": "hello",
-  "setup": {
-    "python_version": "3.13.12",
-    "implementation": "CPython",
-    "platform_name": "darwin",
-    "test_command": "pytest"
-  },
-  "routed_matches": [
-    {"kind": "command", "name": "init", "score": 0.85, "source_hint": "..."}
-  ],
-  "turn": {
-    "prompt": "hello",
-    "output": "...",
-    "stop_reason": "completed"
-  },
-  "persisted_session_path": ".claw/sessions/sess_abc.jsonl"
-}
-```
-
-### `command-graph`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "command-graph",
-  "exit_code": 0,
-  "builtins_count": 185,
-  "plugin_like_count": 20,
-  "skill_like_count": 2,
-  "total_count": 207,
-  "builtins": [
-    {"name": "add-dir", "source_hint": "commands/add-dir/add-dir.tsx"}
-  ],
-  "plugin_like": [],
-  "skill_like": []
-}
-```
-
-### `tool-pool`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "tool-pool",
-  "exit_code": 0,
-  "simple_mode": false,
-  "include_mcp": true,
-  "tool_count": 184,
-  "tools": [
-    {"name": "BashTool", "source_hint": "tools/BashTool/BashTool.tsx"}
-  ]
-}
-```
-
-### `bootstrap-graph`
-
-```json
-{
-  "timestamp": "2026-04-22T10:10:00Z",
-  "command": "bootstrap-graph",
-  "exit_code": 0,
-  "stages": ["stage 1", "stage 2", "..."],
-  "note": "bootstrap-graph is markdown-only in this version"
-}
-```
-
---
-
-## Versioning & Compatibility
-
- **schema_version = "1.0":** Current as of 2026-04-22. Covers all 13 clawable commands.
- **Breaking changes** (e.g. renaming a field) bump schema_version to "2.0".
- **Additive changes** (e.g. new optional field) stay at "1.0" and are backward compatible.
- Downstream claws **must** check `schema_version` before relying on field presence.
-
---
-
-## Regression Testing
-
-Each command is covered by:
-1. **Fixture file** (golden JSON snapshot under `tests/fixtures/json/<command>.json`)
-2. **Parametrised test** in `test_cli_parity_audit.py::TestJsonOutputContractEndToEnd`
-3. **Field consistency test** (new, tracked as ROADMAP #172)
-
-To update a fixture after a intentional schema change:
-```bash
-claw <command> --output-format json <args> > tests/fixtures/json/<command>.json
-# Review the diff, commit
-git add tests/fixtures/json/<command>.json
-```
-
-To verify no regressions:
-```bash
-cargo test --release test_json_envelope_field_consistency
-```
-
---
-
-## Design Notes
-
-**Why common fields on every response?**
- Downstream claws can build one error handler that works for all commands
- Timestamp + command + exit_code give context without scraping argv or timestamps from command output
- `schema_version` signals compatibility for future upgrades
-
-**Why both "found" and "error" on not-found?**
- Exit code 1 covers both "entity missing" and "operation failed"
- `found=false` distinguishes not-found from error without string matching
- `error.kind` and `error.retryable` let automation decide: retry a temporary miss vs escalate a permanent refusal
-
-**Why "operation" and "target" in error?**
- Claws can aggregate failures by operation type (e.g. "how many `write` ops failed?")
- Claws can implement per-target retry policy (e.g. "skip missing files, retry networking")
- Pure text errors ("No such file") do not provide enough structure for pattern matching
-
-**Why "handled" vs "found"?**
- `show-command` reports `found: bool` (inventory signal: "does this exist?")
- `exec-command` reports `handled: bool` (operational signal: "was this work performed?")
- The names matter: a command can be found but not handled (e.g. too large for context window), or handled silently (no output message)
--- a/USAGE.md
+++ b/USAGE.md
@@ -2,9 +2,6 @@

 This guide covers the current Rust workspace under `rust/` and the `claw` CLI binary. If you are brand new, make the doctor health check your first run: start `claw`, then run `/doctor`.

-> [!TIP]
-> **Building orchestration code that calls `claw` as a subprocess?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern (one handler for all 14 clawable commands, exit codes, JSON envelope contract, and recovery strategies).
-
 ## Quick-start health check

 Run this before prompts, sessions, or automation:
@@ -98,17 +95,11 @@ cd rust

 ### JSON output for scripting

-All clawable commands support `--output-format json` for machine-readable output. Every invocation returns a consistent JSON envelope with `exit_code`, `command`, `timestamp`, and either `{success fields}` or `{error: {kind, message, ...}}`.
-
 ```bash
 cd rust
 ./target/debug/claw --output-format json prompt "status"
-./target/debug/claw --output-format json load-session my-session-id
-./target/debug/claw --output-format json turn-loop "analyze logs" --max-turns 1
 ```

-**Building a dispatcher or orchestration script?** See [`ERROR_HANDLING.md`](./ERROR_HANDLING.md) for the unified error-handling pattern. One code example works for all 14 clawable commands: parse the exit code, classify by `error.kind`, apply recovery strategies (retry, timeout recovery, validation, logging). Use that pattern instead of reimplementing error handling per command.
-
 ### Inspect worker state

 The `claw state` command reads `.claw/worker-state.json`, which is written by the interactive REPL or a one-shot prompt when a worker executes a task. This file contains the worker ID, session reference, model, and permission mode.
--- a/progress.txt
+++ b/progress.txt
@@ -74,6 +74,18 @@ US-007 COMPLETE (Phase 5 - Plugin/MCP lifecycle maturity)
 - DegradedMode behavior
 - Tests: 11 unit tests passing

+
+Iteration 2026-04-27 - ROADMAP #200 COMPLETED
+------------------------------------------------
+- Selected next actionable backlog item because no active task was in progress.
+- ROADMAP #200: Interactive MCP/tool permission prompts are invisible blockers.
+- Files: rust/crates/runtime/src/worker_boot.rs, rust/crates/runtime/src/recovery_recipes.rs, ROADMAP.md, progress.txt.
+- Added tool_permission_required worker status and event classification for interactive MCP/tool permission gates.
+- Added structured ToolPermissionPrompt payload with server/tool identity and prompt preview.
+- Startup evidence now records tool_permission_prompt_detected and classifies timeout evidence as tool_permission_required.
+- Readiness snapshots now mark tool-permission-gated workers as blocked, not ready/idle.
+- Tests: targeted tool_permission regressions, full runtime test/clippy/fmt pending in Ralph verification loop.
+
 VERIFICATION STATUS:
 ------------------
 - cargo build --workspace: PASSED
@@ -108,6 +120,29 @@ US-010 COMPLETED (Add model compatibility documentation)
 - Cross-referenced with existing code comments in openai_compat.rs
 - cargo clippy passes

+Iteration 3: 2026-04-16
+------------------------
+
+US-012 COMPLETED (Trust prompt resolver with allowlist auto-trust)
+- Files: rust/crates/runtime/src/trust_resolver.rs
+- Enhanced TrustConfig with pattern matching and serde support:
+  - TrustAllowlistEntry struct with pattern, worktree_pattern, description
+  - TrustResolution enum (AutoAllowlisted, ManualApproval)
+  - Enhanced TrustEvent variants with serde tags and metadata
+  - Glob pattern matching with * and ? wildcards
+  - Support for path prefix matching and worktree patterns
+- Updated TrustResolver with new resolve() signature:
+  - Added worktree parameter for worktree pattern matching
+  - Proper event emission with TrustResolution
+  - Manual approval detection from screen text
+- Added helper functions:
+  - extract_repo_name() - extracts repo name from path
+  - detect_manual_approval() - detects manual trust from screen text
+  - glob_matches() - recursive backtracking glob matcher
+- Tests: 25 new tests for pattern matching, serialization, and resolver behavior
+- All 483 runtime tests pass
+- cargo clippy passes with no warnings
+
 US-011 COMPLETED (Performance optimization: reduce API request serialization overhead)
 - Files:
  - rust/crates/api/Cargo.toml (added criterion dev-dependency and bench config)
@@ -131,3 +166,213 @@ US-011 COMPLETED (Performance optimization: reduce API request serialization ove
  - is_reasoning_model detection: ~26-42ns depending on model
 - All tests pass (119 unit tests + 29 integration tests)
 - cargo clippy passes
+
+VERIFICATION STATUS (Iteration 3):
+----------------------------------
+- cargo build --workspace: PASSED
+- cargo test --workspace: PASSED (891+ tests)
+- cargo clippy --workspace --all-targets -- -D warnings: PASSED
+- cargo fmt -- --check: PASSED
+
+All 12 stories from prd.json now have passes: true
+- US-001 through US-007: Pre-existing implementations
+- US-008: kimi-k2.5 model API compatibility fix
+- US-009: Unit tests for kimi model compatibility
+- US-010: Model compatibility documentation
+- US-011: Performance optimization with criterion benchmarks
+- US-012: Trust prompt resolver with allowlist auto-trust
+
+Iteration 4: 2026-04-16
+------------------------
+
+US-013 COMPLETED (Phase 2 - Session event ordering + terminal-state reconciliation)
+- Files: rust/crates/runtime/src/lane_events.rs
+- Added EventTerminality enum (Terminal, Advisory, Uncertainty)
+- Added classify_event_terminality() function for event classification
+- Added reconcile_terminal_events() function for deterministic event ordering:
+  - Sorts events by monotonic sequence number
+  - Deduplicates terminal events by fingerprint
+  - Detects transport death uncertainty (terminal + transport death)
+  - Handles out-of-order event bursts
+- Added events_materially_differ() for detecting meaningful differences
+- Added 8 comprehensive tests for reconciliation logic:
+  - reconcile_terminal_events_sorts_by_monotonic_sequence
+  - reconcile_terminal_events_deduplicates_same_fingerprint
+  - reconcile_terminal_events_detects_transport_death_uncertainty
+  - reconcile_terminal_events_handles_completed_idle_error_completed_noise
+  - reconcile_terminal_events_returns_none_for_empty_input
+  - reconcile_terminal_events_preserves_advisory_events
+  - events_materially_differ_detects_real_differences
+  - classify_event_terminality_correctly_classifies
+- Fixed test compilation issues with LaneEventBuilder API
+
+VERIFICATION STATUS (Iteration 4):
+----------------------------------
+- cargo build --workspace: PASSED
+- cargo test --workspace: PASSED (891+ tests)
+- cargo clippy --workspace --all-targets -- -D warnings: PASSED
+- cargo fmt -- --check: PASSED
+
+US-013 marked passes: true in prd.json
+
+US-014 COMPLETED (Phase 2 - Event provenance / environment labeling)
+- Files: rust/crates/runtime/src/lane_events.rs
+- Added ConfidenceLevel enum (High, Medium, Low, Unknown)
+- Added fields to LaneEventMetadata:
+  - environment_label: Option<String> - environment/channel (production, staging, dev)
+  - emitter_identity: Option<String> - emitter (clawd, plugin-name, operator-id)
+  - confidence_level: Option<ConfidenceLevel> - trust level for automation
+- Added builder methods: with_environment(), with_emitter(), with_confidence()
+- Added filtering functions:
+  - filter_by_provenance() - select events by source
+  - filter_by_environment() - select events by environment label
+  - filter_by_confidence() - select events above confidence threshold
+  - is_test_event() - check if synthetic source (test, healthcheck, replay)
+  - is_live_lane_event() - check if production event
+- Added 7 comprehensive tests for US-014:
+  - confidence_level_round_trips_through_serialization
+  - filter_by_provenance_selects_only_matching_events
+  - filter_by_environment_selects_only_matching_environment
+  - filter_by_confidence_selects_events_above_threshold
+  - is_test_event_detects_synthetic_sources
+  - is_live_lane_event_detects_production_events
+  - lane_event_metadata_includes_us014_fields
+
+US-016 COMPLETED (Phase 2 - Duplicate terminal-event suppression)
+- Files: rust/crates/runtime/src/lane_events.rs
+- Event fingerprinting already implemented via compute_event_fingerprint()
+- Fingerprint attached via LaneEventMetadata.event_fingerprint
+- Deduplication via dedupe_terminal_events() - returns first occurrence of each fingerprint
+- Raw event history preserved separately from deduplicated actionable events
+- Material difference detection via events_materially_differ():
+  - Different event type (Finished vs Failed) is material
+  - Different status is material
+  - Different failure class is material
+  - Different data payload is material
+- Reconcile function surfaces latest terminal event when materially different
+- Added 5 comprehensive tests for US-016:
+  - canonical_terminal_event_fingerprint_attached_to_metadata
+  - dedupe_terminal_events_suppresses_repeated_fingerprints
+  - dedupe_preserves_raw_event_history_separately
+  - events_materially_differ_detects_payload_differences
+  - reconcile_terminal_events_surfaces_latest_when_different
+
+US-017 COMPLETED (Phase 2 - Lane ownership / scope binding)
+- Files: rust/crates/runtime/src/lane_events.rs
+- LaneOwnership struct already existed with:
+  - owner: String - owner/assignee identity
+  - workflow_scope: String - workflow scope (claw-code-dogfood, etc.)
+  - watcher_action: WatcherAction - Act, Observe, Ignore
+- Ownership preserved through lifecycle via with_ownership() builder method
+- All lifecycle events (Started -> Ready -> Finished) preserve ownership
+- Added 3 comprehensive tests for US-017:
+  - lane_ownership_attached_to_metadata
+  - lane_ownership_preserved_through_lifecycle_events
+  - lane_ownership_watcher_action_variants
+
+US-015 COMPLETED (Phase 2 - Session identity completeness at creation time)
+- Files: rust/crates/runtime/src/lane_events.rs
+- SessionIdentity struct already existed with:
+  - title: String - stable title for the session
+  - workspace: String - workspace/worktree path
+  - purpose: String - lane/session purpose
+  - placeholder_reason: Option<String> - reason for placeholder values
+- Added reconcile_enriched() method for updating session identity:
+  - Updates title/workspace/purpose with newly available data
+  - Clears placeholder_reason when real values are provided
+  - Preserves existing values for fields not being updated
+  - Allows incremental enrichment without ambiguity
+- Added 2 comprehensive tests:
+  - session_identity_reconcile_enriched_updates_fields
+  - session_identity_reconcile_preserves_placeholder_if_no_new_data
+
+US-018 COMPLETED (Phase 2 - Nudge acknowledgment / dedupe contract)
+- Files: rust/crates/runtime/src/lane_events.rs
+- Added NudgeTracking struct:
+  - nudge_id: String - unique nudge identifier
+  - delivered_at: String - timestamp of delivery
+  - acknowledged: bool - whether acknowledged
+  - acknowledged_at: Option<String> - when acknowledged
+  - is_retry: bool - whether this is a retry
+  - original_nudge_id: Option<String> - original ID if retry
+- Added NudgeClassification enum (New, Retry, StaleDuplicate)
+- Added classify_nudge() function for deduplication logic
+- Added 6 comprehensive tests for US-018
+
+US-019 COMPLETED (Phase 2 - Stable roadmap-id assignment)
+- Files: rust/crates/runtime/src/lane_events.rs
+- Added RoadmapId struct:
+  - id: String - canonical unique identifier
+  - filed_at: String - timestamp when filed
+  - is_new_filing: bool - new vs update
+  - supersedes: Option<String> - lineage for supersedes
+- Added builder methods: new_filing(), update(), supersedes()
+- Added 3 comprehensive tests for US-019
+
+US-020 COMPLETED (Phase 2 - Roadmap item lifecycle state contract)
+- Files: rust/crates/runtime/src/lane_events.rs
+- Added RoadmapLifecycleState enum (Filed, Acknowledged, InProgress, Blocked, Done, Superseded)
+- Added RoadmapLifecycle struct:
+  - state: RoadmapLifecycleState - current state
+  - state_changed_at: String - last transition timestamp
+  - filed_at: String - original filing timestamp
+  - lineage: Vec<String> - supersession chain
+- Added methods: new_filed(), transition(), superseded_by(), is_terminal(), is_active()
+- Added 5 comprehensive tests for US-020
+
+VERIFICATION STATUS (Iteration 7):
+----------------------------------
+- cargo build --workspace: PASSED
+- cargo test --workspace: PASSED (891+ tests)
+- cargo clippy --workspace --all-targets -- -D warnings: PASSED
+- cargo fmt -- --check: PASSED
+
+US-013 through US-015 and US-018 through US-020 now marked passes: true
+
+FINAL VERIFICATION (All 20 Stories Complete):
+------------------------------------------------
+- cargo build --workspace: PASSED
+- cargo test --workspace: PASSED (119+ API tests, 39 runtime tests, 12 integration tests)
+- cargo clippy --workspace --all-targets -- -D warnings: PASSED
+- cargo fmt -- --check: PASSED
+
+ALL 20 STORIES FROM PRD COMPLETE:
+- US-001 through US-012: Pre-existing implementations (verified working)
+- US-013: Session event ordering + terminal-state reconciliation
+- US-014: Event provenance / environment labeling
+- US-015: Session identity completeness at creation time
+- US-016: Duplicate terminal-event suppression
+- US-017: Lane ownership / scope binding
+- US-018: Nudge acknowledgment / dedupe contract
+- US-019: Stable roadmap-id assignment
+- US-020: Roadmap item lifecycle state contract
+
+Iteration 8: 2026-04-16
+------------------------
+
+US-021 COMPLETED (Request body size pre-flight check - from dogfood findings)
+- Files:
+  - rust/crates/api/src/error.rs (new error variant)
+  - rust/crates/api/src/providers/openai_compat.rs
+- Added RequestBodySizeExceeded error variant with actionable message
+- Added max_request_body_bytes to OpenAiCompatConfig:
+  - DashScope: 6MB (6_291_456 bytes) - from dogfood with kimi-k2.5
+  - OpenAI: 100MB (104_857_600 bytes)
+  - xAI: 50MB (52_428_800 bytes)
+- Added estimate_request_body_size() for pre-flight checks
+- Added check_request_body_size() for validation
+- Pre-flight check integrated in send_raw_request()
+- Tests: 5 new tests for size estimation and limit checking
+
+PROJECT STATUS: COMPLETE (21/21 stories)
+
+Iteration 2026-04-29 - ROADMAP #96 COMPLETED
+------------------------------------------------
+- Pulled origin/main: already up to date.
+- Selected ROADMAP #96 as a small repo-local Immediate Backlog item: the `claw --help` Resume-safe command summary leaked slash-command stubs despite the main Interactive command listing filtering them.
+- Files: rust/crates/rusty-claude-cli/src/main.rs, ROADMAP.md, progress.txt.
+- Changed help rendering to filter `resume_supported_slash_commands()` through `STUB_COMMANDS` before building the Resume-safe one-liner.
+- Added `stub_commands_absent_from_resume_safe_help` regression coverage so future stub additions cannot leak into the Resume-safe summary.
+- Targeted verification: `cargo test -p rusty-claude-cli stub_commands_absent_from_resume_safe_help -- --nocapture` passed; `cargo test -p rusty-claude-cli parses_direct_cli_actions -- --nocapture` passed.
+- Format/check verification: `cargo fmt --all --check`, `git diff --check`, and `cargo check -p rusty-claude-cli` passed.
+- Broader clippy note: `cargo clippy -p rusty-claude-cli --all-targets -- -D warnings` is blocked by pre-existing `clippy::unnecessary_wraps` failures in `rust/crates/commands/src/lib.rs` (`render_mcp_report_for`, `render_mcp_report_json_for`), outside this diff.
--- a/rust/CLAUDE.md
+++ b/rust/CLAUDE.md
@@ -7,7 +7,8 @@ This file provides guidance to Claw Code (clawcode.dev) when working with code i
 - Frameworks: none detected from the supported starter markers.

 ## Verification
- Run Rust verification from the repo root: `cargo fmt`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace`
+- From the repository root, run Rust formatting with `scripts/fmt.sh` (or `scripts/fmt.sh --check` for CI-style checks). From this `rust/` directory, the equivalent command is `../scripts/fmt.sh`. Root-level `cargo fmt --manifest-path rust/Cargo.toml` is not the supported formatting command.
+- From this `rust/` directory, run Rust verification with `cargo clippy --workspace --all-targets -- -D warnings` and `cargo test --workspace`.

 ## Working agreement
 - Prefer small, reviewable changes and keep generated bootstrap files aligned with actual repo workflows.
--- a/rust/crates/api/src/providers/mod.rs
+++ b/rust/crates/api/src/providers/mod.rs
@@ -753,14 +753,14 @@ mod tests {
    #[test]
    fn returns_context_window_metadata_for_kimi_models() {
        // kimi-k2.5
-        let k25_limit = model_token_limit("kimi-k2.5")
-            .expect("kimi-k2.5 should have token limit metadata");
+        let k25_limit =
+            model_token_limit("kimi-k2.5").expect("kimi-k2.5 should have token limit metadata");
        assert_eq!(k25_limit.max_output_tokens, 16_384);
        assert_eq!(k25_limit.context_window_tokens, 256_000);

        // kimi-k1.5
-        let k15_limit = model_token_limit("kimi-k1.5")
-            .expect("kimi-k1.5 should have token limit metadata");
+        let k15_limit =
+            model_token_limit("kimi-k1.5").expect("kimi-k1.5 should have token limit metadata");
        assert_eq!(k15_limit.max_output_tokens, 16_384);
        assert_eq!(k15_limit.context_window_tokens, 256_000);
    }
@@ -768,11 +768,13 @@ mod tests {
    #[test]
    fn kimi_alias_resolves_to_kimi_k25_token_limits() {
        // The "kimi" alias resolves to "kimi-k2.5" via resolve_model_alias()
-        let alias_limit = model_token_limit("kimi")
-            .expect("kimi alias should resolve to kimi-k2.5 limits");
-        let direct_limit = model_token_limit("kimi-k2.5")
-            .expect("kimi-k2.5 should have limits");
-        assert_eq!(alias_limit.max_output_tokens, direct_limit.max_output_tokens);
+        let alias_limit =
+            model_token_limit("kimi").expect("kimi alias should resolve to kimi-k2.5 limits");
+        let direct_limit = model_token_limit("kimi-k2.5").expect("kimi-k2.5 should have limits");
+        assert_eq!(
+            alias_limit.max_output_tokens,
+            direct_limit.max_output_tokens
+        );
        assert_eq!(
            alias_limit.context_window_tokens,
            direct_limit.context_window_tokens
--- a/rust/crates/api/src/providers/openai_compat.rs
+++ b/rust/crates/api/src/providers/openai_compat.rs
@@ -2195,9 +2195,16 @@ mod tests {

    #[test]
    fn provider_specific_size_limits_are_correct() {
-        assert_eq!(OpenAiCompatConfig::dashscope().max_request_body_bytes, 6_291_456); // 6MB
-        assert_eq!(OpenAiCompatConfig::openai().max_request_body_bytes, 104_857_600); // 100MB
-        assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800); // 50MB
+        assert_eq!(
+            OpenAiCompatConfig::dashscope().max_request_body_bytes,
+            6_291_456
+        ); // 6MB
+        assert_eq!(
+            OpenAiCompatConfig::openai().max_request_body_bytes,
+            104_857_600
+        ); // 100MB
+        assert_eq!(OpenAiCompatConfig::xai().max_request_body_bytes, 52_428_800);
+        // 50MB
    }

    #[test]
--- a/rust/crates/commands/src/lib.rs
+++ b/rust/crates/commands/src/lib.rs
@@ -2623,10 +2623,8 @@ fn render_mcp_report_json_for(
            // runs, the existing serializer adds `status: "ok"` below.
            match loader.load() {
                Ok(runtime_config) => {
-                    let mut value = render_mcp_summary_report_json(
-                        cwd,
-                        runtime_config.mcp().servers(),
-                    );
+                    let mut value =
+                        render_mcp_summary_report_json(cwd, runtime_config.mcp().servers());
                    if let Some(map) = value.as_object_mut() {
                        map.insert("status".to_string(), Value::String("ok".to_string()));
                        map.insert("config_load_error".to_string(), Value::Null);
--- a/rust/crates/runtime/src/bash.rs
+++ b/rust/crates/runtime/src/bash.rs
@@ -122,7 +122,7 @@ fn detect_and_emit_ship_prepared(command: &str) {
            actor: get_git_actor().unwrap_or_else(|| "unknown".to_string()),
            pr_number: None,
        };
-        let _event = LaneEvent::ship_prepared(format!("{}", now), &provenance);
+        let _event = LaneEvent::ship_prepared(format!("{now}"), &provenance);
        // Log to stderr as interim routing before event stream integration
        eprintln!(
            "[ship.prepared] branch={} -> main, commits={}, actor={}",
@@ -172,7 +172,7 @@ async fn execute_bash_async(
 ) -> io::Result<BashCommandOutput> {
    // Detect and emit ship provenance for git push operations
    detect_and_emit_ship_prepared(&input.command);
-    
+
    let mut command = prepare_tokio_command(&input.command, &cwd, &sandbox_status, true);

    let output_result = if let Some(timeout_ms) = input.timeout {
--- a/rust/crates/runtime/src/lane_events.rs
+++ b/rust/crates/runtime/src/lane_events.rs
--- a/rust/crates/runtime/src/recovery_recipes.rs
+++ b/rust/crates/runtime/src/recovery_recipes.rs
@@ -45,7 +45,9 @@ impl FailureScenario {
    #[must_use]
    pub fn from_worker_failure_kind(kind: WorkerFailureKind) -> Self {
        match kind {
-            WorkerFailureKind::TrustGate => Self::TrustPromptUnresolved,
+            WorkerFailureKind::TrustGate | WorkerFailureKind::ToolPermissionGate => {
+                Self::TrustPromptUnresolved
+            }
            WorkerFailureKind::PromptDelivery => Self::PromptMisdelivery,
            WorkerFailureKind::Protocol => Self::McpHandshakeFailure,
            WorkerFailureKind::Provider | WorkerFailureKind::StartupNoEvidence => {
--- a/rust/crates/runtime/src/session_control.rs
+++ b/rust/crates/runtime/src/session_control.rs
@@ -58,8 +58,8 @@ impl SessionStore {
        let workspace_root = workspace_root.as_ref();
        // #151: canonicalize workspace_root for consistent fingerprinting
        // across equivalent path representations.
-        let canonical_workspace = fs::canonicalize(workspace_root)
-            .unwrap_or_else(|_| workspace_root.to_path_buf());
+        let canonical_workspace =
+            fs::canonicalize(workspace_root).unwrap_or_else(|_| workspace_root.to_path_buf());
        let sessions_root = data_dir
            .as_ref()
            .join("sessions")
@@ -158,10 +158,9 @@ impl SessionStore {
    }

    pub fn latest_session(&self) -> Result<ManagedSessionSummary, SessionControlError> {
-        self.list_sessions()?
-            .into_iter()
-            .next()
-            .ok_or_else(|| SessionControlError::Format(format_no_managed_sessions(&self.sessions_root)))
+        self.list_sessions()?.into_iter().next().ok_or_else(|| {
+            SessionControlError::Format(format_no_managed_sessions(&self.sessions_root))
+        })
    }

    pub fn load_session(
--- a/rust/crates/runtime/src/trust_resolver.rs
+++ b/rust/crates/runtime/src/trust_resolver.rs
@@ -1,5 +1,7 @@
 use std::path::{Path, PathBuf};

+use serde::{Deserialize, Serialize};
+
 const TRUST_PROMPT_CUES: &[&str] = &[
    "do you trust the files in this folder",
    "trust the files in this folder",
@@ -8,24 +10,121 @@ const TRUST_PROMPT_CUES: &[&str] = &[
    "yes, proceed",
 ];

-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+/// Resolution method for trust decisions.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
 pub enum TrustPolicy {
+    /// Automatically trust this path (allowlisted)
    AutoTrust,
+    /// Require manual approval
    RequireApproval,
+    /// Deny trust for this path
    Deny,
 }

-#[derive(Debug, Clone, PartialEq, Eq)]
+/// Events emitted during trust resolution lifecycle.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
 pub enum TrustEvent {
-    TrustRequired { cwd: String },
-    TrustResolved { cwd: String, policy: TrustPolicy },
-    TrustDenied { cwd: String, reason: String },
+    /// Trust prompt was detected and is required
+    TrustRequired {
+        /// Current working directory where trust is needed
+        cwd: String,
+        /// Optional repo identifier
+        #[serde(skip_serializing_if = "Option::is_none")]
+        repo: Option<String>,
+        /// Optional worktree path
+        #[serde(skip_serializing_if = "Option::is_none")]
+        worktree: Option<String>,
+    },
+    /// Trust was resolved (granted)
+    TrustResolved {
+        /// Current working directory
+        cwd: String,
+        /// The policy that was applied
+        policy: TrustPolicy,
+        /// How the trust was resolved
+        resolution: TrustResolution,
+    },
+    /// Trust was denied
+    TrustDenied {
+        /// Current working directory
+        cwd: String,
+        /// Reason for denial
+        reason: String,
+    },
 }

-#[derive(Debug, Clone, Default)]
+/// How trust was resolved.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum TrustResolution {
+    /// Automatically granted due to allowlist
+    AutoAllowlisted,
+    /// Manually approved by user
+    ManualApproval,
+}
+
+/// Entry in the trust allowlist with pattern matching support.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct TrustAllowlistEntry {
+    /// Repository path or glob pattern to match
+    pub pattern: String,
+    /// Optional worktree subpath pattern
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub worktree_pattern: Option<String>,
+    /// Human-readable description of why this is allowlisted
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub description: Option<String>,
+}
+
+impl TrustAllowlistEntry {
+    #[must_use]
+    pub fn new(pattern: impl Into<String>) -> Self {
+        Self {
+            pattern: pattern.into(),
+            worktree_pattern: None,
+            description: None,
+        }
+    }
+
+    #[must_use]
+    pub fn with_worktree_pattern(mut self, pattern: impl Into<String>) -> Self {
+        self.worktree_pattern = Some(pattern.into());
+        self
+    }
+
+    #[must_use]
+    pub fn with_description(mut self, desc: impl Into<String>) -> Self {
+        self.description = Some(desc.into());
+        self
+    }
+}
+
+/// Configuration for trust resolution with allowlist/denylist support.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
 pub struct TrustConfig {
-    allowlisted: Vec<PathBuf>,
-    denied: Vec<PathBuf>,
+    /// Allowlisted paths with pattern matching
+    pub allowlisted: Vec<TrustAllowlistEntry>,
+    /// Denied paths (exact or prefix matches)
+    pub denied: Vec<PathBuf>,
+    /// Whether to emit events for trust decisions
+    #[serde(default = "default_emit_events")]
+    pub emit_events: bool,
+}
+
+fn default_emit_events() -> bool {
+    true
+}
+
+impl Default for TrustConfig {
+    fn default() -> Self {
+        Self {
+            allowlisted: Vec::new(),
+            denied: Vec::new(),
+            emit_events: true,
+        }
+    }
 }

 impl TrustConfig {
@@ -35,8 +134,14 @@ impl TrustConfig {
    }

    #[must_use]
-    pub fn with_allowlisted(mut self, path: impl Into<PathBuf>) -> Self {
-        self.allowlisted.push(path.into());
+    pub fn with_allowlisted(mut self, path: impl Into<String>) -> Self {
+        self.allowlisted.push(TrustAllowlistEntry::new(path));
+        self
+    }
+
+    #[must_use]
+    pub fn with_allowlisted_entry(mut self, entry: TrustAllowlistEntry) -> Self {
+        self.allowlisted.push(entry);
        self
    }

@@ -45,6 +150,147 @@ impl TrustConfig {
        self.denied.push(path.into());
        self
    }
+
+    /// Check if a path matches an allowlisted entry using glob patterns.
+    #[must_use]
+    pub fn is_allowlisted(
+        &self,
+        cwd: &str,
+        worktree: Option<&str>,
+    ) -> Option<&TrustAllowlistEntry> {
+        self.allowlisted.iter().find(|entry| {
+            let path_matches = Self::pattern_matches(&entry.pattern, cwd);
+            if !path_matches {
+                return false;
+            }
+
+            match (&entry.worktree_pattern, worktree) {
+                (Some(wt_pattern), Some(wt)) => Self::pattern_matches(wt_pattern, wt),
+                (Some(_), None) => false,
+                (None, _) => true,
+            }
+        })
+    }
+
+    /// Match a pattern against a path string.
+    /// Supports exact matching and glob patterns (* and ?).
+    fn pattern_matches(pattern: &str, path: &str) -> bool {
+        let pattern = pattern.trim();
+        let path = path.trim();
+
+        // Exact match
+        if pattern == path {
+            return true;
+        }
+
+        // Normalize paths for comparison
+        let pattern_normalized = pattern.replace("//", "/");
+        let path_normalized = path.replace("//", "/");
+
+        // Check if pattern is a path prefix (e.g., "/tmp/worktrees" matches "/tmp/worktrees/repo-a")
+        // This handles the common case of directory containment
+        if !pattern_normalized.contains('*') && !pattern_normalized.contains('?') {
+            // Prefix match: pattern is a directory that contains path
+            if path_normalized.starts_with(&pattern_normalized) {
+                let rest = &path_normalized[pattern_normalized.len()..];
+                // Must be exact match or continue with /
+                return rest.is_empty() || rest.starts_with('/');
+            }
+        }
+
+        // Check if pattern ends with wildcard (prefix match)
+        if pattern_normalized.ends_with("/*") {
+            let prefix = pattern_normalized.trim_end_matches("/*");
+            if let Some(rest) = path_normalized.strip_prefix(prefix) {
+                // Must either be exact match or continue with /
+                return rest.is_empty() || rest.starts_with('/');
+            }
+        } else if pattern_normalized.ends_with('*') && !pattern_normalized.contains("/*/") {
+            // Simple trailing * (not a path component wildcard)
+            let prefix = pattern_normalized.trim_end_matches('*');
+            if let Some(rest) = path_normalized.strip_prefix(prefix) {
+                return rest.is_empty() || !rest.starts_with('/');
+            }
+        }
+
+        // Check if pattern is a path component match (bounded by /)
+        if path_normalized
+            .split('/')
+            .any(|component| component == pattern_normalized)
+        {
+            return true;
+        }
+
+        // Check if pattern appears as a substring within a path component
+        // (e.g., "repo" matches "/tmp/worktrees/repo-a")
+        if path_normalized
+            .split('/')
+            .any(|component| component.contains(&pattern_normalized))
+        {
+            return true;
+        }
+
+        // Glob matching for patterns with ? or * in the middle
+        if pattern.contains('?') || pattern.contains("/*/") || pattern.starts_with("*/") {
+            return Self::glob_matches(&pattern_normalized, &path_normalized);
+        }
+
+        false
+    }
+
+    /// Simple glob pattern matching (? matches single char, * matches any sequence).
+    /// Handles patterns like /tmp/*/repo-* where * matches path components.
+    fn glob_matches(pattern: &str, path: &str) -> bool {
+        // Use recursive backtracking for proper glob matching
+        Self::glob_match_recursive(pattern, path, 0, 0)
+    }
+
+    fn glob_match_recursive(pattern: &str, path: &str, p_idx: usize, s_idx: usize) -> bool {
+        let p_chars: Vec<char> = pattern.chars().collect();
+        let s_chars: Vec<char> = path.chars().collect();
+
+        let mut p = p_idx;
+        let mut s = s_idx;
+
+        while p < p_chars.len() {
+            match p_chars[p] {
+                '*' => {
+                    // Try all possible matches for *
+                    p += 1;
+                    if p >= p_chars.len() {
+                        // * at end matches everything remaining
+                        return true;
+                    }
+                    // Try matching 0 or more characters
+                    for skip in 0..=(s_chars.len() - s) {
+                        if Self::glob_match_recursive(pattern, path, p, s + skip) {
+                            return true;
+                        }
+                    }
+                    return false;
+                }
+                '?' => {
+                    // ? matches exactly one character
+                    if s >= s_chars.len() {
+                        return false;
+                    }
+                    p += 1;
+                    s += 1;
+                }
+                c => {
+                    // Exact character match
+                    if s >= s_chars.len() || s_chars[s] != c {
+                        return false;
+                    }
+                    p += 1;
+                    s += 1;
+                }
+            }
+        }
+
+        // Pattern exhausted - path must also be exhausted
+        s >= s_chars.len()
+    }
 }

 #[derive(Debug, Clone, PartialEq, Eq)]
@@ -86,15 +332,19 @@ impl TrustResolver {
    }

    #[must_use]
-    pub fn resolve(&self, cwd: &str, screen_text: &str) -> TrustDecision {
+    pub fn resolve(&self, cwd: &str, worktree: Option<&str>, screen_text: &str) -> TrustDecision {
        if !detect_trust_prompt(screen_text) {
            return TrustDecision::NotRequired;
        }

+        let repo = extract_repo_name(cwd);
        let mut events = vec![TrustEvent::TrustRequired {
            cwd: cwd.to_owned(),
+            repo: repo.clone(),
+            worktree: worktree.map(String::from),
        }];

+        // Check denylist first
        if let Some(matched_root) = self
            .config
            .denied
@@ -112,15 +362,12 @@ impl TrustResolver {
            };
        }

-        if self
-            .config
-            .allowlisted
-            .iter()
-            .any(|root| path_matches(cwd, root))
-        {
+        // Check allowlist with pattern matching
+        if self.config.is_allowlisted(cwd, worktree).is_some() {
            events.push(TrustEvent::TrustResolved {
                cwd: cwd.to_owned(),
                policy: TrustPolicy::AutoTrust,
+                resolution: TrustResolution::AutoAllowlisted,
            });
            return TrustDecision::Required {
                policy: TrustPolicy::AutoTrust,
@@ -128,6 +375,19 @@ impl TrustResolver {
            };
        }

+        // Check for manual trust resolution via screen text analysis
+        if detect_manual_approval(screen_text) {
+            events.push(TrustEvent::TrustResolved {
+                cwd: cwd.to_owned(),
+                policy: TrustPolicy::RequireApproval,
+                resolution: TrustResolution::ManualApproval,
+            });
+            return TrustDecision::Required {
+                policy: TrustPolicy::RequireApproval,
+                events,
+            };
+        }
+
        TrustDecision::Required {
            policy: TrustPolicy::RequireApproval,
            events,
@@ -135,17 +395,20 @@ impl TrustResolver {
    }

    #[must_use]
-    pub fn trusts(&self, cwd: &str) -> bool {
-        !self
+    pub fn trusts(&self, cwd: &str, worktree: Option<&str>) -> bool {
+        // Check denylist first
+        let denied = self
            .config
            .denied
            .iter()
-            .any(|root| path_matches(cwd, root))
-            && self
-                .config
-                .allowlisted
-                .iter()
-                .any(|root| path_matches(cwd, root))
+            .any(|root| path_matches(cwd, root));
+
+        if denied {
+            return false;
+        }
+
+        // Check allowlist using pattern matching
+        self.config.is_allowlisted(cwd, worktree).is_some()
    }
 }

@@ -172,11 +435,240 @@ fn normalize_path(path: &Path) -> PathBuf {
    std::fs::canonicalize(path).unwrap_or_else(|_| path.to_path_buf())
 }

+/// Extract repository name from a path for event context.
+fn extract_repo_name(cwd: &str) -> Option<String> {
+    let path = Path::new(cwd);
+    // Try to find a .git directory to identify repo root
+    let mut current = Some(path);
+    while let Some(p) = current {
+        if p.join(".git").is_dir() {
+            return p.file_name().map(|n| n.to_string_lossy().to_string());
+        }
+        current = p.parent();
+    }
+    // Fallback: use the last component of the path
+    path.file_name().map(|n| n.to_string_lossy().to_string())
+}
+
+/// Detect if the screen text indicates manual approval was granted.
+fn detect_manual_approval(screen_text: &str) -> bool {
+    let lowered = screen_text.to_ascii_lowercase();
+    // Look for indicators that user manually approved
+    MANUAL_APPROVAL_CUES.iter().any(|cue| lowered.contains(cue))
+}
+
+const MANUAL_APPROVAL_CUES: &[&str] = &[
+    "yes, i trust",
+    "i trust this",
+    "trusted manually",
+    "approval granted",
+];
+
+#[cfg(test)]
+mod path_matching_tests {
+    use super::*;
+
+    #[test]
+    fn glob_pattern_star_matches_any_sequence() {
+        assert!(TrustConfig::pattern_matches("/tmp/*", "/tmp/foo"));
+        assert!(TrustConfig::pattern_matches("/tmp/*", "/tmp/bar/baz"));
+        assert!(!TrustConfig::pattern_matches("/tmp/*", "/other/tmp/foo"));
+    }
+
+    #[test]
+    fn glob_pattern_question_matches_single_char() {
+        assert!(TrustConfig::pattern_matches("/tmp/test?", "/tmp/test1"));
+        assert!(TrustConfig::pattern_matches("/tmp/test?", "/tmp/testA"));
+        assert!(!TrustConfig::pattern_matches("/tmp/test?", "/tmp/test12"));
+        assert!(!TrustConfig::pattern_matches("/tmp/test?", "/tmp/test"));
+    }
+
+    #[test]
+    fn pattern_matches_exact() {
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/worktrees",
+            "/tmp/worktrees"
+        ));
+        assert!(!TrustConfig::pattern_matches(
+            "/tmp/worktrees",
+            "/tmp/worktrees-other"
+        ));
+    }
+
+    #[test]
+    fn pattern_matches_prefix_with_wildcard() {
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/worktrees/*",
+            "/tmp/worktrees/repo-a"
+        ));
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/worktrees/*",
+            "/tmp/worktrees/repo-a/subdir"
+        ));
+        assert!(!TrustConfig::pattern_matches(
+            "/tmp/worktrees/*",
+            "/tmp/other/repo"
+        ));
+    }
+
+    #[test]
+    fn pattern_matches_contains() {
+        // Pattern contained within path
+        assert!(TrustConfig::pattern_matches(
+            "worktrees",
+            "/tmp/worktrees/repo-a"
+        ));
+        assert!(TrustConfig::pattern_matches(
+            "repo",
+            "/tmp/worktrees/repo-a"
+        ));
+    }
+
+    #[test]
+    fn allowlist_entry_with_worktree_pattern() {
+        let config = TrustConfig::new().with_allowlisted_entry(
+            TrustAllowlistEntry::new("/tmp/worktrees/*")
+                .with_worktree_pattern("*/.git")
+                .with_description("Git worktrees"),
+        );
+
+        // Should match when both patterns match
+        assert!(config
+            .is_allowlisted("/tmp/worktrees/repo-a", Some("/tmp/worktrees/repo-a/.git"))
+            .is_some());
+
+        // Should not match when worktree pattern doesn't match
+        assert!(config
+            .is_allowlisted("/tmp/worktrees/repo-a", Some("/other/path"))
+            .is_none());
+
+        // Should not match when a worktree pattern is required but no worktree is supplied
+        assert!(config
+            .is_allowlisted("/tmp/worktrees/repo-a", None)
+            .is_none());
+
+        // Should match when no worktree pattern required and path matches
+        let config_no_worktree = TrustConfig::new().with_allowlisted("/tmp/worktrees/*");
+        assert!(config_no_worktree
+            .is_allowlisted("/tmp/worktrees/repo-a", None)
+            .is_some());
+    }
+
+    #[test]
+    fn allowlist_entry_returns_matched_entry() {
+        let entry = TrustAllowlistEntry::new("/tmp/worktrees/*").with_description("Test worktrees");
+        let config = TrustConfig::new().with_allowlisted_entry(entry.clone());
+
+        let matched = config.is_allowlisted("/tmp/worktrees/repo-a", None);
+        assert!(matched.is_some());
+        assert_eq!(
+            matched.unwrap().description,
+            Some("Test worktrees".to_string())
+        );
+    }
+
+    #[test]
+    fn complex_glob_patterns() {
+        // Multiple wildcards
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/*/repo-*",
+            "/tmp/worktrees/repo-123"
+        ));
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/*/repo-*",
+            "/tmp/other/repo-abc"
+        ));
+        assert!(!TrustConfig::pattern_matches(
+            "/tmp/*/repo-*",
+            "/tmp/worktrees/other"
+        ));
+
+        // Mixed ? and *
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/test?/*.txt",
+            "/tmp/test1/file.txt"
+        ));
+        assert!(TrustConfig::pattern_matches(
+            "/tmp/test?/*.txt",
+            "/tmp/testA/subdir/file.txt"
+        ));
+    }
+
+    #[test]
+    fn serde_serialization_roundtrip() {
+        let config = TrustConfig::new()
+            .with_allowlisted_entry(
+                TrustAllowlistEntry::new("/tmp/worktrees/*")
+                    .with_worktree_pattern("*/.git")
+                    .with_description("Git worktrees"),
+            )
+            .with_denied("/tmp/malicious");
+
+        let json = serde_json::to_string(&config).expect("serialization failed");
+        let deserialized: TrustConfig =
+            serde_json::from_str(&json).expect("deserialization failed");
+
+        assert_eq!(config.allowlisted.len(), deserialized.allowlisted.len());
+        assert_eq!(config.denied.len(), deserialized.denied.len());
+        assert_eq!(config.emit_events, deserialized.emit_events);
+    }
+
+    #[test]
+    fn trust_event_serialization() {
+        let event = TrustEvent::TrustRequired {
+            cwd: "/tmp/test".to_string(),
+            repo: Some("test-repo".to_string()),
+            worktree: Some("/tmp/test/.git".to_string()),
+        };
+
+        let json = serde_json::to_string(&event).expect("serialization failed");
+        assert!(json.contains("trust_required"));
+        assert!(json.contains("/tmp/test"));
+        assert!(json.contains("test-repo"));
+
+        let deserialized: TrustEvent = serde_json::from_str(&json).expect("deserialization failed");
+        match deserialized {
+            TrustEvent::TrustRequired {
+                cwd,
+                repo,
+                worktree,
+            } => {
+                assert_eq!(cwd, "/tmp/test");
+                assert_eq!(repo, Some("test-repo".to_string()));
+                assert_eq!(worktree, Some("/tmp/test/.git".to_string()));
+            }
+            _ => panic!("wrong event type"),
+        }
+    }
+
+    #[test]
+    fn trust_event_resolved_serialization() {
+        let event = TrustEvent::TrustResolved {
+            cwd: "/tmp/test".to_string(),
+            policy: TrustPolicy::AutoTrust,
+            resolution: TrustResolution::AutoAllowlisted,
+        };
+
+        let json = serde_json::to_string(&event).expect("serialization failed");
+        assert!(json.contains("trust_resolved"));
+        assert!(json.contains("auto_allowlisted"));
+
+        let deserialized: TrustEvent = serde_json::from_str(&json).expect("deserialization failed");
+        match deserialized {
+            TrustEvent::TrustResolved { resolution, .. } => {
+                assert_eq!(resolution, TrustResolution::AutoAllowlisted);
+            }
+            _ => panic!("wrong event type"),
+        }
+    }
+}
+
 #[cfg(test)]
 mod tests {
    use super::{
-        detect_trust_prompt, path_matches_trusted_root, TrustConfig, TrustDecision, TrustEvent,
-        TrustPolicy, TrustResolver,
+        detect_manual_approval, detect_trust_prompt, path_matches_trusted_root,
+        TrustAllowlistEntry, TrustConfig, TrustDecision, TrustEvent, TrustPolicy, TrustResolution,
+        TrustResolver,
    };

    #[test]
@@ -197,7 +689,7 @@ mod tests {
        let resolver = TrustResolver::new(TrustConfig::new().with_allowlisted("/tmp/worktrees"));

        // when
-        let decision = resolver.resolve("/tmp/worktrees/repo-a", "Ready for your input\n>");
+        let decision = resolver.resolve("/tmp/worktrees/repo-a", None, "Ready for your input\n>");

        // then
        assert_eq!(decision, TrustDecision::NotRequired);
@@ -213,23 +705,23 @@ mod tests {
        // when
        let decision = resolver.resolve(
            "/tmp/worktrees/repo-a",
+            None,
            "Do you trust the files in this folder?\n1. Yes, proceed\n2. No",
        );

        // then
        assert_eq!(decision.policy(), Some(TrustPolicy::AutoTrust));
-        assert_eq!(
-            decision.events(),
-            &[
-                TrustEvent::TrustRequired {
-                    cwd: "/tmp/worktrees/repo-a".to_string(),
-                },
-                TrustEvent::TrustResolved {
-                    cwd: "/tmp/worktrees/repo-a".to_string(),
-                    policy: TrustPolicy::AutoTrust,
-                },
-            ]
-        );
+        let events = decision.events();
+        assert_eq!(events.len(), 2);
+        assert!(matches!(events[0], TrustEvent::TrustRequired { .. }));
+        assert!(matches!(
+            events[1],
+            TrustEvent::TrustResolved {
+                policy: TrustPolicy::AutoTrust,
+                resolution: TrustResolution::AutoAllowlisted,
+                ..
+            }
+        ));
    }

    #[test]
@@ -240,6 +732,7 @@ mod tests {
        // when
        let decision = resolver.resolve(
            "/tmp/other/repo-b",
+            None,
            "Do you trust the files in this folder?\n1. Yes, proceed\n2. No",
        );

@@ -249,6 +742,8 @@ mod tests {
            decision.events(),
            &[TrustEvent::TrustRequired {
                cwd: "/tmp/other/repo-b".to_string(),
+                repo: Some("repo-b".to_string()),
+                worktree: None,
            }]
        );
    }
@@ -265,6 +760,7 @@ mod tests {
        // when
        let decision = resolver.resolve(
            "/tmp/worktrees/repo-c",
+            None,
            "Do you trust the files in this folder?\n1. Yes, proceed\n2. No",
        );

@@ -275,6 +771,8 @@ mod tests {
            &[
                TrustEvent::TrustRequired {
                    cwd: "/tmp/worktrees/repo-c".to_string(),
+                    repo: Some("repo-c".to_string()),
+                    worktree: None,
                },
                TrustEvent::TrustDenied {
                    cwd: "/tmp/worktrees/repo-c".to_string(),
@@ -284,6 +782,66 @@ mod tests {
        );
    }

+    #[test]
+    fn auto_trusts_with_glob_pattern_allowlist() {
+        // given
+        let resolver = TrustResolver::new(TrustConfig::new().with_allowlisted("/tmp/worktrees/*"));
+
+        // when - any repo under /tmp/worktrees should auto-trust
+        let decision = resolver.resolve(
+            "/tmp/worktrees/repo-a",
+            None,
+            "Do you trust the files in this folder?\n1. Yes, proceed\n2. No",
+        );
+
+        // then
+        assert_eq!(decision.policy(), Some(TrustPolicy::AutoTrust));
+    }
+
+    #[test]
+    fn resolve_with_worktree_pattern_matching() {
+        // given
+        let config = TrustConfig::new().with_allowlisted_entry(
+            TrustAllowlistEntry::new("/tmp/worktrees/*").with_worktree_pattern("*/.git"),
+        );
+        let resolver = TrustResolver::new(config);
+
+        // when - with worktree that matches the pattern
+        let decision = resolver.resolve(
+            "/tmp/worktrees/repo-a",
+            Some("/tmp/worktrees/repo-a/.git"),
+            "Do you trust the files in this folder?\n1. Yes, proceed\n2. No",
+        );
+
+        // then - should auto-trust because both patterns match
+        assert_eq!(decision.policy(), Some(TrustPolicy::AutoTrust));
+    }
+
+    #[test]
+    fn manual_approval_detected_from_screen_text() {
+        // given
+        let resolver = TrustResolver::new(TrustConfig::new());
+
+        // when - screen text indicates manual approval
+        let decision = resolver.resolve(
+            "/tmp/some/repo",
+            None,
+            "Do you trust the files in this folder?\nUser selected: Yes, I trust this folder",
+        );
+
+        // then - should detect manual approval
+        assert_eq!(decision.policy(), Some(TrustPolicy::RequireApproval));
+        let events = decision.events();
+        assert!(events.len() >= 2);
+        assert!(matches!(
+            events[events.len() - 1],
+            TrustEvent::TrustResolved {
+                resolution: TrustResolution::ManualApproval,
+                ..
+            }
+        ));
+    }
+
    #[test]
    fn sibling_prefix_does_not_match_trusted_root() {
        // given
@@ -296,4 +854,70 @@ mod tests {
        // then
        assert!(!matched);
    }
+
+    #[test]
+    fn detects_manual_approval_cues() {
+        assert!(detect_manual_approval(
+            "User selected: Yes, I trust this folder"
+        ));
+        assert!(detect_manual_approval(
+            "I trust this repository and its contents"
+        ));
+        assert!(detect_manual_approval("Approval granted by user"));
+        assert!(!detect_manual_approval(
+            "Do you trust the files in this folder?"
+        ));
+        assert!(!detect_manual_approval("Some unrelated text"));
+    }
+
+    #[test]
+    fn trust_config_default_emit_events() {
+        let config = TrustConfig::default();
+        assert!(config.emit_events);
+    }
+
+    #[test]
+    fn trust_resolver_trusts_method() {
+        let resolver = TrustResolver::new(
+            TrustConfig::new()
+                .with_allowlisted("/tmp/worktrees/*")
+                .with_denied("/tmp/worktrees/bad-repo"),
+        );
+
+        // Should trust allowlisted paths
+        assert!(resolver.trusts("/tmp/worktrees/good-repo", None));
+
+        // Should not trust denied paths
+        assert!(!resolver.trusts("/tmp/worktrees/bad-repo", None));
+
+        // Should not trust unknown paths
+        assert!(!resolver.trusts("/tmp/other/repo", None));
+    }
+
+    #[test]
+    fn trust_policy_serde_roundtrip() {
+        for policy in [
+            TrustPolicy::AutoTrust,
+            TrustPolicy::RequireApproval,
+            TrustPolicy::Deny,
+        ] {
+            let json = serde_json::to_string(&policy).expect("serialization failed");
+            let deserialized: TrustPolicy =
+                serde_json::from_str(&json).expect("deserialization failed");
+            assert_eq!(policy, deserialized);
+        }
+    }
+
+    #[test]
+    fn trust_resolution_serde_roundtrip() {
+        for resolution in [
+            TrustResolution::AutoAllowlisted,
+            TrustResolution::ManualApproval,
+        ] {
+            let json = serde_json::to_string(&resolution).expect("serialization failed");
+            let deserialized: TrustResolution =
+                serde_json::from_str(&json).expect("deserialization failed");
+            assert_eq!(resolution, deserialized);
+        }
+    }
 }
--- a/rust/crates/runtime/src/worker_boot.rs
+++ b/rust/crates/runtime/src/worker_boot.rs
@@ -30,6 +30,7 @@ fn now_secs() -> u64 {
 pub enum WorkerStatus {
    Spawning,
    TrustRequired,
+    ToolPermissionRequired,
    ReadyForPrompt,
    Running,
    Finished,
@@ -41,6 +42,7 @@ impl std::fmt::Display for WorkerStatus {
        match self {
            Self::Spawning => write!(f, "spawning"),
            Self::TrustRequired => write!(f, "trust_required"),
+            Self::ToolPermissionRequired => write!(f, "tool_permission_required"),
            Self::ReadyForPrompt => write!(f, "ready_for_prompt"),
            Self::Running => write!(f, "running"),
            Self::Finished => write!(f, "finished"),
@@ -53,6 +55,7 @@ impl std::fmt::Display for WorkerStatus {
 #[serde(rename_all = "snake_case")]
 pub enum WorkerFailureKind {
    TrustGate,
+    ToolPermissionGate,
    PromptDelivery,
    Protocol,
    Provider,
@@ -71,6 +74,7 @@ pub struct WorkerFailure {
 pub enum WorkerEventKind {
    Spawning,
    TrustRequired,
+    ToolPermissionRequired,
    TrustResolved,
    ReadyForPrompt,
    PromptMisdelivery,
@@ -104,6 +108,8 @@ pub enum WorkerPromptTarget {
 pub enum StartupFailureClassification {
    /// Trust prompt is required but not detected/resolved
    TrustRequired,
+    /// Tool permission prompt is required before startup can continue
+    ToolPermissionRequired,
    /// Prompt was delivered to wrong target (shell misdelivery)
    PromptMisdelivery,
    /// Prompt was sent but acceptance timed out
@@ -130,6 +136,14 @@ pub struct StartupEvidenceBundle {
    pub prompt_acceptance_state: bool,
    /// Result of trust prompt detection at timeout
    pub trust_prompt_detected: bool,
+    /// Result of tool permission prompt detection at timeout
+    pub tool_permission_prompt_detected: bool,
+    /// Age in seconds of the latest tool permission prompt, when observed
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub tool_permission_prompt_age_seconds: Option<u64>,
+    /// Whether the prompt surface exposed only a session allow path or also an always-allow path
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub tool_permission_allow_scope: Option<ToolPermissionAllowScope>,
    /// Transport health summary (true = healthy/responsive)
    pub transport_healthy: bool,
    /// MCP health summary (true = all servers healthy)
@@ -146,6 +160,15 @@ pub enum WorkerEventPayload {
        #[serde(skip_serializing_if = "Option::is_none")]
        resolution: Option<WorkerTrustResolution>,
    },
+    ToolPermissionPrompt {
+        #[serde(skip_serializing_if = "Option::is_none")]
+        server_name: Option<String>,
+        #[serde(skip_serializing_if = "Option::is_none")]
+        tool_name: Option<String>,
+        prompt_age_seconds: u64,
+        allow_scope: ToolPermissionAllowScope,
+        prompt_preview: String,
+    },
    PromptDelivery {
        prompt_preview: String,
        observed_target: WorkerPromptTarget,
@@ -163,6 +186,14 @@ pub enum WorkerEventPayload {
    },
 }

+#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
+#[serde(rename_all = "snake_case")]
+pub enum ToolPermissionAllowScope {
+    SessionOnly,
+    SessionOrAlways,
+    Unknown,
+}
+
 #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
 pub struct WorkerTaskReceipt {
    pub repo: String,
@@ -276,6 +307,29 @@ impl WorkerRegistry {
            .ok_or_else(|| format!("worker not found: {worker_id}"))?;
        let lowered = screen_text.to_ascii_lowercase();

+        if let Some(tool_prompt) = detect_tool_permission_prompt(screen_text, &lowered) {
+            worker.status = WorkerStatus::ToolPermissionRequired;
+            worker.last_error = Some(WorkerFailure {
+                kind: WorkerFailureKind::ToolPermissionGate,
+                message: tool_prompt.message(),
+                created_at: now_secs(),
+            });
+            push_event(
+                worker,
+                WorkerEventKind::ToolPermissionRequired,
+                WorkerStatus::ToolPermissionRequired,
+                Some("tool permission prompt detected".to_string()),
+                Some(WorkerEventPayload::ToolPermissionPrompt {
+                    server_name: tool_prompt.server_name,
+                    tool_name: tool_prompt.tool_name,
+                    prompt_age_seconds: 0,
+                    allow_scope: tool_prompt.allow_scope,
+                    prompt_preview: tool_prompt.prompt_preview,
+                }),
+            );
+            return Ok(worker.clone());
+        }
+
        if !worker.trust_gate_cleared && detect_trust_prompt(&lowered) {
            worker.status = WorkerStatus::TrustRequired;
            worker.last_error = Some(WorkerFailure {
@@ -503,7 +557,9 @@ impl WorkerRegistry {
            ready: worker.status == WorkerStatus::ReadyForPrompt,
            blocked: matches!(
                worker.status,
-                WorkerStatus::TrustRequired | WorkerStatus::Failed
+                WorkerStatus::TrustRequired
+                    | WorkerStatus::ToolPermissionRequired
+                    | WorkerStatus::Failed
            ),
            replay_prompt_ready: worker.replay_prompt.is_some(),
            last_error: worker.last_error.clone(),
@@ -624,6 +680,18 @@ impl WorkerRegistry {

        let now = now_secs();
        let elapsed = now.saturating_sub(worker.created_at);
+        let latest_tool_permission_event = worker
+            .events
+            .iter()
+            .rev()
+            .find(|event| event.kind == WorkerEventKind::ToolPermissionRequired);
+        let tool_permission_allow_scope =
+            latest_tool_permission_event.and_then(|event| match &event.payload {
+                Some(WorkerEventPayload::ToolPermissionPrompt { allow_scope, .. }) => {
+                    Some(*allow_scope)
+                }
+                _ => None,
+            });

        // Build evidence bundle
        let evidence = StartupEvidenceBundle {
@@ -640,6 +708,13 @@ impl WorkerRegistry {
                .events
                .iter()
                .any(|e| e.kind == WorkerEventKind::TrustRequired),
+            tool_permission_prompt_detected: worker
+                .events
+                .iter()
+                .any(|e| e.kind == WorkerEventKind::ToolPermissionRequired),
+            tool_permission_prompt_age_seconds: latest_tool_permission_event
+                .map(|event| now.saturating_sub(event.timestamp)),
+            tool_permission_allow_scope,
            transport_healthy,
            mcp_healthy,
            elapsed_seconds: elapsed,
@@ -694,6 +769,13 @@ fn classify_startup_failure(evidence: &StartupEvidenceBundle) -> StartupFailureC
        return StartupFailureClassification::TrustRequired;
    }

+    // Check for tool permission prompts that were not resolved
+    if evidence.tool_permission_prompt_detected
+        && evidence.last_lifecycle_state == WorkerStatus::ToolPermissionRequired
+    {
+        return StartupFailureClassification::ToolPermissionRequired;
+    }
+
    // Check for prompt acceptance timeout
    if evidence.prompt_sent_at.is_some()
        && !evidence.prompt_acceptance_state
@@ -815,6 +897,140 @@ fn normalize_path(path: &str) -> PathBuf {
    std::fs::canonicalize(path).unwrap_or_else(|_| Path::new(path).to_path_buf())
 }

+#[derive(Debug, Clone, PartialEq, Eq)]
+struct ToolPermissionPromptObservation {
+    server_name: Option<String>,
+    tool_name: Option<String>,
+    allow_scope: ToolPermissionAllowScope,
+    prompt_preview: String,
+}
+
+impl ToolPermissionPromptObservation {
+    fn message(&self) -> String {
+        match (&self.server_name, &self.tool_name) {
+            (Some(server), Some(tool)) => {
+                format!("worker boot blocked on tool permission prompt for {server}.{tool}")
+            }
+            (Some(server), None) => {
+                format!("worker boot blocked on tool permission prompt for {server}")
+            }
+            (None, Some(tool)) => {
+                format!("worker boot blocked on tool permission prompt for {tool}")
+            }
+            (None, None) => "worker boot blocked on tool permission prompt".to_string(),
+        }
+    }
+}
+
+fn detect_tool_permission_prompt(
+    screen_text: &str,
+    lowered: &str,
+) -> Option<ToolPermissionPromptObservation> {
+    let looks_like_prompt = lowered.contains("allow the")
+        && lowered.contains("server")
+        && lowered.contains("tool")
+        && lowered.contains("run");
+    let looks_like_tool_gate = lowered.contains("allow tool") && lowered.contains("run");
+    if !looks_like_prompt && !looks_like_tool_gate {
+        return None;
+    }
+
+    let prompt_line = screen_text
+        .lines()
+        .rev()
+        .find(|line| {
+            let lowered_line = line.to_ascii_lowercase();
+            lowered_line.contains("allow")
+                && lowered_line.contains("tool")
+                && (lowered_line.contains("run") || lowered_line.contains("server"))
+        })
+        .unwrap_or(screen_text)
+        .trim();
+
+    let tool_name = extract_quoted_value(prompt_line)
+        .or_else(|| extract_after(prompt_line, "tool ").map(|token| normalize_tool_token(&token)));
+    let server_name = extract_between(prompt_line, "the ", " server")
+        .map(|server| server.trim_end_matches(" MCP").to_string())
+        .or_else(|| {
+            tool_name
+                .as_deref()
+                .and_then(extract_server_from_qualified_tool)
+        });
+
+    Some(ToolPermissionPromptObservation {
+        server_name,
+        tool_name,
+        allow_scope: detect_tool_permission_allow_scope(lowered),
+        prompt_preview: prompt_preview(prompt_line),
+    })
+}
+
+fn detect_tool_permission_allow_scope(lowered: &str) -> ToolPermissionAllowScope {
+    let always_allow_capable = [
+        "always allow",
+        "allow always",
+        "allow this tool always",
+        "allow for all sessions",
+    ]
+    .iter()
+    .any(|needle| lowered.contains(needle));
+
+    if always_allow_capable {
+        return ToolPermissionAllowScope::SessionOrAlways;
+    }
+
+    let session_allow_capable = [
+        "allow once",
+        "allow for this session",
+        "allow this session",
+        "yes, allow",
+    ]
+    .iter()
+    .any(|needle| lowered.contains(needle));
+
+    if session_allow_capable {
+        ToolPermissionAllowScope::SessionOnly
+    } else {
+        ToolPermissionAllowScope::Unknown
+    }
+}
+
+fn extract_quoted_value(text: &str) -> Option<String> {
+    let start = text.find('"')? + 1;
+    let rest = &text[start..];
+    let end = rest.find('"')?;
+    Some(rest[..end].to_string())
+}
+
+fn extract_between(text: &str, prefix: &str, suffix: &str) -> Option<String> {
+    let start = text.find(prefix)? + prefix.len();
+    let rest = &text[start..];
+    let end = rest.find(suffix)?;
+    let value = rest[..end].trim();
+    (!value.is_empty()).then(|| value.to_string())
+}
+
+fn extract_after(text: &str, prefix: &str) -> Option<String> {
+    let start = text.to_ascii_lowercase().find(prefix)? + prefix.len();
+    let value = text[start..]
+        .split_whitespace()
+        .next()?
+        .trim_matches(|ch: char| ch == '?' || ch == ':' || ch == '"' || ch == '\'');
+    (!value.is_empty()).then(|| value.to_string())
+}
+
+fn normalize_tool_token(token: &str) -> String {
+    token
+        .trim_matches(|ch: char| ch == '?' || ch == ':' || ch == '"' || ch == '\'')
+        .to_string()
+}
+
+fn extract_server_from_qualified_tool(tool: &str) -> Option<String> {
+    let rest = tool.strip_prefix("mcp__")?;
+    let (server, _) = rest.split_once("__")?;
+    (!server.is_empty()).then(|| server.to_string())
+}
+
 fn detect_trust_prompt(lowered: &str) -> bool {
    [
        "do you trust the files in this folder",
@@ -1134,6 +1350,96 @@ mod tests {
        assert!(detect_ready_for_prompt("│ >", "│ >"));
    }

+    #[test]
+    fn tool_permission_prompt_blocks_worker_with_structured_event() {
+        let registry = WorkerRegistry::new();
+        let worker = registry.create("/tmp/repo-mcp", &[], true);
+
+        let blocked = registry
+            .observe(
+                &worker.worker_id,
+                "Allow the omx_memory MCP server to run tool \"project_memory_read\"?\n\
+                 1. Yes, allow once\n\
+                 2. Always allow this tool",
+            )
+            .expect("tool permission observe should succeed");
+
+        assert_eq!(blocked.status, WorkerStatus::ToolPermissionRequired);
+        assert_eq!(
+            blocked
+                .last_error
+                .as_ref()
+                .expect("tool permission error should exist")
+                .kind,
+            WorkerFailureKind::ToolPermissionGate
+        );
+        let event = blocked
+            .events
+            .iter()
+            .find(|event| event.kind == WorkerEventKind::ToolPermissionRequired)
+            .expect("tool permission event should exist");
+        assert_eq!(
+            event.payload,
+            Some(WorkerEventPayload::ToolPermissionPrompt {
+                server_name: Some("omx_memory".to_string()),
+                tool_name: Some("project_memory_read".to_string()),
+                prompt_age_seconds: 0,
+                allow_scope: ToolPermissionAllowScope::SessionOrAlways,
+                prompt_preview: prompt_preview(
+                    "Allow the omx_memory MCP server to run tool \"project_memory_read\"?",
+                ),
+            })
+        );
+
+        let readiness = registry
+            .await_ready(&worker.worker_id)
+            .expect("ready snapshot should load");
+        assert!(readiness.blocked);
+        assert!(!readiness.ready);
+    }
+
+    #[test]
+    fn startup_timeout_classifies_tool_permission_prompt() {
+        let registry = WorkerRegistry::new();
+        let worker = registry.create("/tmp/repo-mcp-timeout", &[], true);
+
+        registry
+            .observe(
+                &worker.worker_id,
+                "Allow the omx_memory MCP server to run tool \"notepad_read\"?\n\
+                 1. Yes, allow once",
+            )
+            .expect("tool permission observe should succeed");
+
+        let timed_out = registry
+            .observe_startup_timeout(&worker.worker_id, "claw prompt", true, true)
+            .expect("startup timeout observe should succeed");
+        let event = timed_out
+            .events
+            .iter()
+            .find(|event| event.kind == WorkerEventKind::StartupNoEvidence)
+            .expect("startup no evidence event should exist");
+
+        match event.payload.as_ref() {
+            Some(WorkerEventPayload::StartupNoEvidence {
+                classification,
+                evidence,
+            }) => {
+                assert_eq!(
+                    *classification,
+                    StartupFailureClassification::ToolPermissionRequired
+                );
+                assert!(evidence.tool_permission_prompt_detected);
+                assert_eq!(
+                    evidence.tool_permission_allow_scope,
+                    Some(ToolPermissionAllowScope::SessionOnly)
+                );
+                assert!(evidence.tool_permission_prompt_age_seconds.is_some());
+            }
+            _ => panic!("expected StartupNoEvidence payload"),
+        }
+    }
+
    #[test]
    fn prompt_misdelivery_is_detected_and_replay_can_be_rearmed() {
        let registry = WorkerRegistry::new();
@@ -1634,6 +1940,9 @@ mod tests {
            prompt_sent_at: Some(1_234_567_890),
            prompt_acceptance_state: false,
            trust_prompt_detected: true,
+            tool_permission_prompt_detected: false,
+            tool_permission_prompt_age_seconds: None,
+            tool_permission_allow_scope: None,
            transport_healthy: true,
            mcp_healthy: false,
            elapsed_seconds: 60,
@@ -1661,6 +1970,9 @@ mod tests {
            prompt_sent_at: None,
            prompt_acceptance_state: false,
            trust_prompt_detected: false,
+            tool_permission_prompt_detected: false,
+            tool_permission_prompt_age_seconds: None,
+            tool_permission_allow_scope: None,
            transport_healthy: false,
            mcp_healthy: true,
            elapsed_seconds: 30,
@@ -1678,6 +1990,9 @@ mod tests {
            prompt_sent_at: None,
            prompt_acceptance_state: false,
            trust_prompt_detected: false,
+            tool_permission_prompt_detected: false,
+            tool_permission_prompt_age_seconds: None,
+            tool_permission_allow_scope: None,
            transport_healthy: true,
            mcp_healthy: true,
            elapsed_seconds: 10,
@@ -1697,6 +2012,9 @@ mod tests {
            prompt_sent_at: None, // No prompt sent yet
            prompt_acceptance_state: false,
            trust_prompt_detected: false,
+            tool_permission_prompt_detected: false,
+            tool_permission_prompt_age_seconds: None,
+            tool_permission_allow_scope: None,
            transport_healthy: true,
            mcp_healthy: false, // MCP unhealthy but transport healthy suggests crash
            elapsed_seconds: 45,
--- a/rust/crates/rusty-claude-cli/src/main.rs
+++ b/rust/crates/rusty-claude-cli/src/main.rs
--- a/rust/crates/rusty-claude-cli/tests/compact_output.rs
+++ b/rust/crates/rusty-claude-cli/tests/compact_output.rs
@@ -172,7 +172,10 @@ stderr:
    );
    let stdout = String::from_utf8(output.stdout).expect("stdout should be utf8");
    let parsed: Value = serde_json::from_str(&stdout).expect("compact json stdout should parse");
-    assert_eq!(parsed["message"], "Mock streaming says hello from the parity harness.");
+    assert_eq!(
+        parsed["message"],
+        "Mock streaming says hello from the parity harness."
+    );
    assert_eq!(parsed["compact"], true);
    assert_eq!(parsed["model"], "claude-sonnet-4-6");
    assert!(parsed["usage"].is_object());
--- a/rust/crates/rusty-claude-cli/tests/output_format_contract.rs
+++ b/rust/crates/rusty-claude-cli/tests/output_format_contract.rs
@@ -388,114 +388,6 @@ fn assert_json_command(current_dir: &Path, args: &[&str]) -> Value {
    assert_json_command_with_env(current_dir, args, &[])
 }

-/// #247 regression helper: run claw expecting a non-zero exit and return
-/// the JSON error envelope parsed from stderr. Asserts exit != 0 and that
-/// the envelope includes `type: "error"` at the very least.
-fn assert_json_error_envelope(current_dir: &Path, args: &[&str]) -> Value {
-    let output = run_claw(current_dir, args, &[]);
-    assert!(
-        !output.status.success(),
-        "command unexpectedly succeeded; stdout:\n{}\nstderr:\n{}",
-        String::from_utf8_lossy(&output.stdout),
-        String::from_utf8_lossy(&output.stderr)
-    );
-    // The JSON envelope is written to stderr for error cases (see main.rs).
-    let envelope: Value = serde_json::from_slice(&output.stderr).unwrap_or_else(|err| {
-        panic!(
-            "stderr should be a JSON error envelope but failed to parse: {err}\nstderr bytes:\n{}",
-            String::from_utf8_lossy(&output.stderr)
-        )
-    });
-    assert_eq!(
-        envelope["type"], "error",
-        "envelope should carry type=error"
-    );
-    envelope
-}
-
-#[test]
-fn prompt_subcommand_without_arg_emits_cli_parse_envelope_with_hint_247() {
-    // #247: `claw prompt` with no argument must classify as `cli_parse`
-    // (not `unknown`) and the JSON envelope must carry the same actionable
-    // `Run claw --help for usage.` hint that text-mode stderr appends.
-    let root = unique_temp_dir("247-prompt-no-arg");
-    fs::create_dir_all(&root).expect("temp dir should exist");
-
-    let envelope = assert_json_error_envelope(&root, &["--output-format", "json", "prompt"]);
-    assert_eq!(
-        envelope["kind"], "cli_parse",
-        "prompt subcommand without arg should classify as cli_parse, envelope: {envelope}"
-    );
-    assert_eq!(
-        envelope["error"], "prompt subcommand requires a prompt string",
-        "short reason should match the raw error, envelope: {envelope}"
-    );
-    assert_eq!(
-        envelope["hint"],
-        "Run `claw --help` for usage.",
-        "JSON envelope must carry the same help-runbook hint as text mode, envelope: {envelope}"
-    );
-}
-
-#[test]
-fn empty_positional_arg_emits_cli_parse_envelope_247() {
-    // #247: `claw ""` must classify as `cli_parse`, not `unknown`. The
-    // message itself embeds a ``run `claw --help`` pointer so the explicit
-    // hint field is allowed to remain null to avoid duplication — what
-    // matters for the typed-error contract is that `kind == cli_parse`.
-    let root = unique_temp_dir("247-empty-arg");
-    fs::create_dir_all(&root).expect("temp dir should exist");
-
-    let envelope = assert_json_error_envelope(&root, &["--output-format", "json", ""]);
-    assert_eq!(
-        envelope["kind"], "cli_parse",
-        "empty-prompt error should classify as cli_parse, envelope: {envelope}"
-    );
-    let short = envelope["error"]
-        .as_str()
-        .expect("error field should be a string");
-    assert!(
-        short.starts_with("empty prompt:"),
-        "short reason should preserve the original empty-prompt message, got: {short}"
-    );
-}
-
-#[test]
-fn whitespace_only_positional_arg_emits_cli_parse_envelope_247() {
-    // #247: same rule for `claw "   "` — any whitespace-only prompt must
-    // flow through the empty-prompt path and classify as `cli_parse`.
-    let root = unique_temp_dir("247-whitespace-arg");
-    fs::create_dir_all(&root).expect("temp dir should exist");
-
-    let envelope = assert_json_error_envelope(&root, &["--output-format", "json", "   "]);
-    assert_eq!(
-        envelope["kind"], "cli_parse",
-        "whitespace-only prompt should classify as cli_parse, envelope: {envelope}"
-    );
-}
-
-#[test]
-fn unrecognized_argument_still_classifies_as_cli_parse_247_regression_guard() {
-    // #247 regression guard: the new empty-prompt / prompt-subcommand
-    // patterns must NOT hijack the existing #77 unrecognized-argument
-    // classification. `claw doctor --foo` must still surface as cli_parse
-    // with the runbook hint present.
-    let root = unique_temp_dir("247-unrecognized-arg");
-    fs::create_dir_all(&root).expect("temp dir should exist");
-
-    let envelope =
-        assert_json_error_envelope(&root, &["--output-format", "json", "doctor", "--foo"]);
-    assert_eq!(
-        envelope["kind"], "cli_parse",
-        "unrecognized-argument must remain cli_parse, envelope: {envelope}"
-    );
-    assert_eq!(
-        envelope["hint"],
-        "Run `claw --help` for usage.",
-        "unrecognized-argument hint should stay intact, envelope: {envelope}"
-    );
-}
-
 fn assert_json_command_with_env(current_dir: &Path, args: &[&str], envs: &[(&str, &str)]) -> Value {
    let output = run_claw(current_dir, args, envs);
    assert!(
--- a/rust/crates/tools/src/lib.rs
+++ b/rust/crates/tools/src/lib.rs
@@ -240,6 +240,13 @@ impl GlobalToolRegistry {
            }
        }

+        if allowed.is_empty() {
+            return Err(format!(
+                "--allowedTools was provided with no usable tool names (got `{}`). Omit the flag to allow all tools.",
+                values.join(" ")
+            ));
+        }
+
        Ok(Some(allowed))
    }

@@ -6883,6 +6890,21 @@ mod tests {
        assert!(empty_permission.contains("unsupported plugin permission: "));
    }

+    #[test]
+    fn allowed_tools_rejects_empty_token_lists() {
+        let registry = GlobalToolRegistry::builtin();
+
+        for raw in ["", ",,", "   "] {
+            let err = registry
+                .normalize_allowed_tools(&[raw.to_string()])
+                .expect_err("empty allow-list input should be rejected");
+            assert!(
+                err.contains("--allowedTools was provided with no usable tool names"),
+                "unexpected error for {raw:?}: {err}"
+            );
+        }
+    }
+
    #[test]
    fn runtime_tools_extend_registry_definitions_permissions_and_search() {
        let registry = GlobalToolRegistry::builtin()
--- a/scripts/fmt.sh
+++ b/scripts/fmt.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+cd "$REPO_ROOT/rust"
+exec cargo fmt "$@"
--- a/src/init.py
+++ b/src/init.py
@@ -5,16 +5,7 @@ from .parity_audit import ParityAuditResult, run_parity_audit
 from .port_manifest import PortManifest, build_port_manifest
 from .query_engine import QueryEnginePort, TurnResult
 from .runtime import PortRuntime, RuntimeSession
-from .session_store import (
-    SessionDeleteError,
-    SessionNotFoundError,
-    StoredSession,
-    delete_session,
-    list_sessions,
-    load_session,
-    save_session,
-    session_exists,
-)
+from .session_store import StoredSession, load_session, save_session
 from .system_init import build_system_init_message
 from .tools import PORTED_TOOLS, build_tool_backlog

@@ -24,8 +15,6 @@ __all__ = [
    'PortRuntime',
    'QueryEnginePort',
    'RuntimeSession',
-    'SessionDeleteError',
-    'SessionNotFoundError',
    'StoredSession',
    'TurnResult',
    'PORTED_COMMANDS',
@@ -34,10 +23,7 @@ __all__ = [
    'build_port_manifest',
    'build_system_init_message',
    'build_tool_backlog',
-    'delete_session',
-    'list_sessions',
    'load_session',
    'run_parity_audit',
    'save_session',
-    'session_exists',
 ]
--- a/src/main.py
+++ b/src/main.py
@@ -12,48 +12,22 @@ from .port_manifest import build_port_manifest
 from .query_engine import QueryEnginePort
 from .remote_runtime import run_remote_mode, run_ssh_mode, run_teleport_mode
 from .runtime import PortRuntime
-from .session_store import (
-    SessionDeleteError,
-    SessionNotFoundError,
-    delete_session,
-    list_sessions,
-    load_session,
-    session_exists,
-)
+from .session_store import load_session
 from .setup import run_setup
 from .tool_pool import assemble_tool_pool
 from .tools import execute_tool, get_tool, get_tools, render_tool_index


-def wrap_json_envelope(data: dict, command: str, exit_code: int = 0) -> dict:
-    """Wrap command output in canonical JSON envelope per SCHEMAS.md."""
-    from datetime import datetime, timezone
-    now_utc = datetime.now(timezone.utc).isoformat(timespec='seconds').replace('+00:00', 'Z')
-    return {
-        'timestamp': now_utc,
-        'command': command,
-        'exit_code': exit_code,
-        'output_format': 'json',
-        'schema_version': '1.0',
-        **data,
-    }
-
-
 def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(description='Python porting workspace for the Claude Code rewrite effort')
-    # #180: Add --version flag to match canonical CLI contract
-    parser.add_argument('--version', action='version', version='claw-code 1.0.0 (Python harness)')
    subparsers = parser.add_subparsers(dest='command', required=True)
    subparsers.add_parser('summary', help='render a Markdown summary of the Python porting workspace')
    subparsers.add_parser('manifest', help='print the current Python workspace manifest')
    subparsers.add_parser('parity-audit', help='compare the Python workspace against the local ignored TypeScript archive when available')
    subparsers.add_parser('setup-report', help='render the startup/prefetch setup report')
-    command_graph_parser = subparsers.add_parser('command-graph', help='show command graph segmentation')
-    command_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
-    tool_pool_parser = subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
-    tool_pool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
-    bootstrap_graph_parser = subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
-    bootstrap_graph_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
+    subparsers.add_parser('command-graph', help='show command graph segmentation')
+    subparsers.add_parser('tool-pool', help='show assembled tool pool with default settings')
+    subparsers.add_parser('bootstrap-graph', help='show the mirrored bootstrap/runtime graph stages')
    list_parser = subparsers.add_parser('subsystems', help='list the current Python modules in the workspace')
    list_parser.add_argument('--limit', type=int, default=32)

@@ -74,104 +48,22 @@ def build_parser() -> argparse.ArgumentParser:
    route_parser = subparsers.add_parser('route', help='route a prompt across mirrored command/tool inventories')
    route_parser.add_argument('prompt')
    route_parser.add_argument('--limit', type=int, default=5)
-    # #168: parity with show-command/show-tool/session-lifecycle CLI family
-    route_parser.add_argument('--output-format', choices=['text', 'json'], default='text')

    bootstrap_parser = subparsers.add_parser('bootstrap', help='build a runtime-style session report from the mirrored inventories')
    bootstrap_parser.add_argument('prompt')
    bootstrap_parser.add_argument('--limit', type=int, default=5)
-    # #168: parity with CLI family
-    bootstrap_parser.add_argument('--output-format', choices=['text', 'json'], default='text')

    loop_parser = subparsers.add_parser('turn-loop', help='run a small stateful turn loop for the mirrored runtime')
    loop_parser.add_argument('prompt')
    loop_parser.add_argument('--limit', type=int, default=5)
    loop_parser.add_argument('--max-turns', type=int, default=3)
    loop_parser.add_argument('--structured-output', action='store_true')
-    loop_parser.add_argument(
-        '--timeout-seconds',
-        type=float,
-        default=None,
-        help='total wall-clock budget across all turns (#161). Default: unbounded.',
-    )
-    loop_parser.add_argument(
-        '--continuation-prompt',
-        default=None,
-        help=(
-            'prompt to submit on turns after the first (#163). Default: None '
-            '(loop stops after turn 0). Replaces the deprecated implicit "[turn N]" '
-            'suffix that used to pollute the transcript.'
-        ),
-    )
-    loop_parser.add_argument(
-        '--output-format',
-        choices=['text', 'json'],
-        default='text',
-        help='output format (#164 Stage B: JSON includes cancel_observed per turn)',
-    )

-    flush_parser = subparsers.add_parser(
-        'flush-transcript',
-        help='persist and flush a temporary session transcript (#160/#166: claw-native session API)',
-    )
+    flush_parser = subparsers.add_parser('flush-transcript', help='persist and flush a temporary session transcript')
    flush_parser.add_argument('prompt')
-    flush_parser.add_argument(
-        '--directory', help='session storage directory (default: .port_sessions)'
-    )
-    flush_parser.add_argument(
-        '--output-format',
-        choices=['text', 'json'],
-        default='text',
-        help='output format',
-    )
-    flush_parser.add_argument(
-        '--session-id',
-        help='deterministic session ID (default: auto-generated UUID)',
-    )

-    load_session_parser = subparsers.add_parser(
-        'load-session',
-        help='load a previously persisted session (#160/#165: claw-native session API)',
-    )
+    load_session_parser = subparsers.add_parser('load-session', help='load a previously persisted session')
    load_session_parser.add_argument('session_id')
-    load_session_parser.add_argument(
-        '--directory', help='session storage directory (default: .port_sessions)'
-    )
-    load_session_parser.add_argument(
-        '--output-format',
-        choices=['text', 'json'],
-        default='text',
-        help='output format',
-    )
-
-    list_sessions_parser = subparsers.add_parser(
-        'list-sessions',
-        help='enumerate stored session IDs (#160: claw-native session API)',
-    )
-    list_sessions_parser.add_argument(
-        '--directory', help='session storage directory (default: .port_sessions)'
-    )
-    list_sessions_parser.add_argument(
-        '--output-format',
-        choices=['text', 'json'],
-        default='text',
-        help='output format',
-    )
-
-    delete_session_parser = subparsers.add_parser(
-        'delete-session',
-        help='delete a persisted session (#160: idempotent, race-safe)',
-    )
-    delete_session_parser.add_argument('session_id')
-    delete_session_parser.add_argument(
-        '--directory', help='session storage directory (default: .port_sessions)'
-    )
-    delete_session_parser.add_argument(
-        '--output-format',
-        choices=['text', 'json'],
-        default='text',
-        help='output format',
-    )

    remote_parser = subparsers.add_parser('remote-mode', help='simulate remote-control runtime branching')
    remote_parser.add_argument('target')
@@ -186,112 +78,22 @@ def build_parser() -> argparse.ArgumentParser:

    show_command = subparsers.add_parser('show-command', help='show one mirrored command entry by exact name')
    show_command.add_argument('name')
-    show_command.add_argument('--output-format', choices=['text', 'json'], default='text')
    show_tool = subparsers.add_parser('show-tool', help='show one mirrored tool entry by exact name')
    show_tool.add_argument('name')
-    show_tool.add_argument('--output-format', choices=['text', 'json'], default='text')

    exec_command_parser = subparsers.add_parser('exec-command', help='execute a mirrored command shim by exact name')
    exec_command_parser.add_argument('name')
    exec_command_parser.add_argument('prompt')
-    # #168: parity with CLI family
-    exec_command_parser.add_argument('--output-format', choices=['text', 'json'], default='text')

    exec_tool_parser = subparsers.add_parser('exec-tool', help='execute a mirrored tool shim by exact name')
    exec_tool_parser.add_argument('name')
    exec_tool_parser.add_argument('payload')
-    # #168: parity with CLI family
-    exec_tool_parser.add_argument('--output-format', choices=['text', 'json'], default='text')
    return parser


-class _ArgparseError(Exception):
-    """#179: internal exception capturing argparse's real error message.
-
-    Subclassed ArgumentParser raises this instead of printing + exiting,
-    so JSON mode can preserve the actual error (e.g. 'the following arguments
-    are required: session_id') in the envelope.
-    """
-    def __init__(self, message: str) -> None:
-        super().__init__(message)
-        self.message = message
-
-
-def _emit_parse_error_envelope(argv: list[str], message: str) -> None:
-    """#178/#179: emit JSON envelope for argparse-level errors when --output-format json is requested.
-
-    Pre-scans argv for --output-format json. If found, prints a parse-error envelope
-    to stdout (per SCHEMAS.md 'error' envelope shape) instead of letting argparse
-    dump help text to stderr. This preserves the JSON contract for claws that can't
-    parse argparse usage messages.
-
-    #179 update: `message` now carries argparse's actual error text, not a generic
-    rejection string. Stderr is fully suppressed in JSON mode.
-    """
-    import json
-    # Extract the attempted command (argv[0] is the first positional)
-    attempted = argv[0] if argv and not argv[0].startswith('-') else '<missing>'
-    envelope = wrap_json_envelope(
-        {
-            'error': {
-                'kind': 'parse',
-                'operation': 'argparse',
-                'target': attempted,
-                'retryable': False,
-                'message': message,
-                'hint': 'run with no arguments to see available subcommands',
-            },
-        },
-        command=attempted,
-        exit_code=1,
-    )
-    print(json.dumps(envelope))
-
-
-def _wants_json_output(argv: list[str]) -> bool:
-    """#178: check if argv contains --output-format json anywhere (for parse-error routing)."""
-    for i, arg in enumerate(argv):
-        if arg == '--output-format' and i + 1 < len(argv) and argv[i + 1] == 'json':
-            return True
-        if arg == '--output-format=json':
-            return True
-    return False
-
-
 def main(argv: list[str] | None = None) -> int:
-    import sys
-    if argv is None:
-        argv = sys.argv[1:]
    parser = build_parser()
-    json_mode = _wants_json_output(argv)
-    # #178/#179: capture argparse errors with real message and emit JSON envelope
-    # when --output-format json is requested. In JSON mode, stderr is silenced
-    # so claws only see the envelope on stdout.
-    if json_mode:
-        # Monkey-patch parser.error to raise instead of print+exit. This preserves
-        # the original error message text (e.g. 'argument X: invalid choice: ...').
-        original_error = parser.error
-        def _json_mode_error(message: str) -> None:
-            raise _ArgparseError(message)
-        parser.error = _json_mode_error  # type: ignore[method-assign]
-        # Also patch all subparsers
-        for action in parser._actions:
-            if hasattr(action, 'choices') and isinstance(action.choices, dict):
-                for subp in action.choices.values():
-                    subp.error = _json_mode_error  # type: ignore[method-assign]
-        try:
-            args = parser.parse_args(argv)
-        except _ArgparseError as err:
-            _emit_parse_error_envelope(argv, err.message)
-            return 1
-        except SystemExit as exc:
-            # Defensive: if argparse exits via some other path (e.g. --help in JSON mode)
-            if exc.code != 0:
-                _emit_parse_error_envelope(argv, 'argparse exited with non-zero code')
-                return 1
-            raise
-    else:
-        args = parser.parse_args(argv)
+    args = parser.parse_args(argv)
    manifest = build_port_manifest()
    if args.command == 'summary':
        print(QueryEnginePort(manifest).render_summary())
@@ -306,44 +108,13 @@ def main(argv: list[str] | None = None) -> int:
        print(run_setup().as_markdown())
        return 0
    if args.command == 'command-graph':
-        graph = build_command_graph()
-        if args.output_format == 'json':
-            import json
-            envelope = {
-                'builtins_count': len(graph.builtins),
-                'plugin_like_count': len(graph.plugin_like),
-                'skill_like_count': len(graph.skill_like),
-                'total_count': len(graph.flattened()),
-                'builtins': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.builtins],
-                'plugin_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.plugin_like],
-                'skill_like': [{'name': m.name, 'source_hint': m.source_hint} for m in graph.skill_like],
-            }
-            print(json.dumps(wrap_json_envelope(envelope, args.command)))
-        else:
-            print(graph.as_markdown())
+        print(build_command_graph().as_markdown())
        return 0
    if args.command == 'tool-pool':
-        pool = assemble_tool_pool()
-        if args.output_format == 'json':
-            import json
-            envelope = {
-                'simple_mode': pool.simple_mode,
-                'include_mcp': pool.include_mcp,
-                'tool_count': len(pool.tools),
-                'tools': [{'name': t.name, 'source_hint': t.source_hint} for t in pool.tools],
-            }
-            print(json.dumps(wrap_json_envelope(envelope, args.command)))
-        else:
-            print(pool.as_markdown())
+        print(assemble_tool_pool().as_markdown())
        return 0
    if args.command == 'bootstrap-graph':
-        graph = build_bootstrap_graph()
-        if args.output_format == 'json':
-            import json
-            envelope = {'stages': graph.as_markdown().split('\n'), 'note': 'bootstrap-graph is markdown-only in this version'}
-            print(json.dumps(wrap_json_envelope(envelope, args.command)))
-        else:
-            print(graph.as_markdown())
+        print(build_bootstrap_graph().as_markdown())
        return 0
    if args.command == 'subsystems':
        for subsystem in manifest.top_level_modules[: args.limit]:
@@ -370,25 +141,6 @@ def main(argv: list[str] | None = None) -> int:
        return 0
    if args.command == 'route':
        matches = PortRuntime().route_prompt(args.prompt, limit=args.limit)
-        # #168: JSON envelope for machine parsing
-        if args.output_format == 'json':
-            import json
-            envelope = {
-                'prompt': args.prompt,
-                'limit': args.limit,
-                'match_count': len(matches),
-                'matches': [
-                    {
-                        'kind': m.kind,
-                        'name': m.name,
-                        'score': m.score,
-                        'source_hint': m.source_hint,
-                    }
-                    for m in matches
-                ],
-            }
-            print(json.dumps(wrap_json_envelope(envelope, args.command)))
-            return 0
        if not matches:
            print('No mirrored command/tool matches found.')
            return 0
@@ -396,220 +148,25 @@ def main(argv: list[str] | None = None) -> int:
            print(f'{match.kind}\t{match.name}\t{match.score}\t{match.source_hint}')
        return 0
    if args.command == 'bootstrap':
-        session = PortRuntime().bootstrap_session(args.prompt, limit=args.limit)
-        # #168: JSON envelope for machine parsing
-        if args.output_format == 'json':
-            import json
-            envelope = {
-                'prompt': session.prompt,
-                'limit': args.limit,
-                'setup': {
-                    'python_version': session.setup.python_version,
-                    'implementation': session.setup.implementation,
-                    'platform_name': session.setup.platform_name,
-                    'test_command': session.setup.test_command,
-                },
-                'routed_matches': [
-                    {
-                        'kind': m.kind,
-                        'name': m.name,
-                        'score': m.score,
-                        'source_hint': m.source_hint,
-                    }
-                    for m in session.routed_matches
-                ],
-                'command_execution_messages': list(session.command_execution_messages),
-                'tool_execution_messages': list(session.tool_execution_messages),
-                'turn': {
-                    'prompt': session.turn_result.prompt,
-                    'output': session.turn_result.output,
-                    'stop_reason': session.turn_result.stop_reason,
-                    'cancel_observed': session.turn_result.cancel_observed,
-                },
-                'persisted_session_path': session.persisted_session_path,
-            }
-            print(json.dumps(wrap_json_envelope(envelope, args.command)))
-            return 0
-        print(session.as_markdown())
+        print(PortRuntime().bootstrap_session(args.prompt, limit=args.limit).as_markdown())
        return 0
    if args.command == 'turn-loop':
-        results = PortRuntime().run_turn_loop(
-            args.prompt,
-            limit=args.limit,
-            max_turns=args.max_turns,
-            structured_output=args.structured_output,
-            timeout_seconds=args.timeout_seconds,
-            continuation_prompt=args.continuation_prompt,
-        )
-        # Exit 2 when a timeout terminated the loop so claws can distinguish
-        # 'ran to completion' from 'hit wall-clock budget'.
-        loop_exit_code = 2 if results and results[-1].stop_reason == 'timeout' else 0
-        if args.output_format == 'json':
-            # #164 Stage B + #173: JSON envelope with per-turn cancel_observed
-            # Promotes turn-loop from OPT_OUT to CLAWABLE surface.
-            import json
-            envelope = {
-                'prompt': args.prompt,
-                'max_turns': args.max_turns,
-                'turns_completed': len(results),
-                'timeout_seconds': args.timeout_seconds,
-                'continuation_prompt': args.continuation_prompt,
-                'turns': [
-                    {
-                        'prompt': r.prompt,
-                        'output': r.output,
-                        'stop_reason': r.stop_reason,
-                        'cancel_observed': r.cancel_observed,
-                        'matched_commands': list(r.matched_commands),
-                        'matched_tools': list(r.matched_tools),
-                    }
-                    for r in results
-                ],
-                'final_stop_reason': results[-1].stop_reason if results else None,
-                'final_cancel_observed': results[-1].cancel_observed if results else False,
-            }
-            print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=loop_exit_code)))
-            return loop_exit_code
+        results = PortRuntime().run_turn_loop(args.prompt, limit=args.limit, max_turns=args.max_turns, structured_output=args.structured_output)
        for idx, result in enumerate(results, start=1):
            print(f'## Turn {idx}')
            print(result.output)
            print(f'stop_reason={result.stop_reason}')
-        return loop_exit_code
+        return 0
    if args.command == 'flush-transcript':
-        from pathlib import Path as _Path
        engine = QueryEnginePort.from_workspace()
-        # #166: allow deterministic session IDs for claw checkpointing/replay.
-        # When unset, the engine's auto-generated UUID is used (backward compat).
-        if args.session_id:
-            engine.session_id = args.session_id
        engine.submit_message(args.prompt)
-        directory = _Path(args.directory) if args.directory else None
-        path = engine.persist_session(directory)
-        if args.output_format == 'json':
-            import json as _json
-            _env = {
-                'session_id': engine.session_id,
-                'path': path,
-                'flushed': engine.transcript_store.flushed,
-                'messages_count': len(engine.mutable_messages),
-                'input_tokens': engine.total_usage.input_tokens,
-                'output_tokens': engine.total_usage.output_tokens,
-            }
-            print(_json.dumps(wrap_json_envelope(_env, args.command)))
-        else:
-            # #166: legacy text output preserved byte-for-byte for backward compat.
-            print(path)
-            print(f'flushed={engine.transcript_store.flushed}')
+        path = engine.persist_session()
+        print(path)
+        print(f'flushed={engine.transcript_store.flushed}')
        return 0
    if args.command == 'load-session':
-        from pathlib import Path as _Path
-        directory = _Path(args.directory) if args.directory else None
-        # #165: catch typed SessionNotFoundError + surface a JSON error envelope
-        # matching the delete-session contract shape. No more raw tracebacks.
-        try:
-            session = load_session(args.session_id, directory)
-        except SessionNotFoundError as exc:
-            if args.output_format == 'json':
-                import json as _json
-                resolved_dir = str(directory) if directory else '.port_sessions'
-                _env = {
-                    'session_id': args.session_id,
-                    'loaded': False,
-                    'error': {
-                        'kind': 'session_not_found',
-                        'message': str(exc),
-                        'directory': resolved_dir,
-                        'retryable': False,
-                    },
-                }
-                print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
-            else:
-                print(f'error: {exc}')
-            return 1
-        except (OSError, ValueError) as exc:
-            # Corrupted session file, IO error, JSON decode error — distinct
-            # from 'not found'. Callers may retry here (fs glitch).
-            if args.output_format == 'json':
-                import json as _json
-                resolved_dir = str(directory) if directory else '.port_sessions'
-                _env = {
-                    'session_id': args.session_id,
-                    'loaded': False,
-                    'error': {
-                        'kind': 'session_load_failed',
-                        'message': str(exc),
-                        'directory': resolved_dir,
-                        'retryable': True,
-                    },
-                }
-                print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
-            else:
-                print(f'error: {exc}')
-            return 1
-        if args.output_format == 'json':
-            import json as _json
-            _env = {
-                'session_id': session.session_id,
-                'loaded': True,
-                'messages_count': len(session.messages),
-                'input_tokens': session.input_tokens,
-                'output_tokens': session.output_tokens,
-            }
-            print(_json.dumps(wrap_json_envelope(_env, args.command)))
-        else:
-            print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
-        return 0
-    if args.command == 'list-sessions':
-        from pathlib import Path as _Path
-        directory = _Path(args.directory) if args.directory else None
-        ids = list_sessions(directory)
-        if args.output_format == 'json':
-            import json as _json
-            _env = {'sessions': ids, 'count': len(ids)}
-            print(_json.dumps(wrap_json_envelope(_env, args.command)))
-        else:
-            if not ids:
-                print('(no sessions)')
-            else:
-                for sid in ids:
-                    print(sid)
-        return 0
-    if args.command == 'delete-session':
-        from pathlib import Path as _Path
-        directory = _Path(args.directory) if args.directory else None
-        try:
-            deleted = delete_session(args.session_id, directory)
-        except SessionDeleteError as exc:
-            if args.output_format == 'json':
-                import json as _json
-                _env = {
-                    'session_id': args.session_id,
-                    'deleted': False,
-                    'error': {
-                        'kind': 'session_delete_failed',
-                        'message': str(exc),
-                        'retryable': True,
-                    },
-                }
-                print(_json.dumps(wrap_json_envelope(_env, args.command, exit_code=1)))
-            else:
-                print(f'error: {exc}')
-            return 1
-        if args.output_format == 'json':
-            import json as _json
-            _env = {
-                'session_id': args.session_id,
-                'deleted': deleted,
-                'status': 'deleted' if deleted else 'not_found',
-            }
-            print(_json.dumps(wrap_json_envelope(_env, args.command)))
-        else:
-            if deleted:
-                print(f'deleted: {args.session_id}')
-            else:
-                print(f'not found: {args.session_id}')
-        # Exit 0 for both cases — delete_session is idempotent,
-        # not-found is success from a cleanup perspective
+        session = load_session(args.session_id)
+        print(f'{session.session_id}\n{len(session.messages)} messages\nin={session.input_tokens} out={session.output_tokens}')
        return 0
    if args.command == 'remote-mode':
        print(run_remote_mode(args.target).as_text())
@@ -629,123 +186,25 @@ def main(argv: list[str] | None = None) -> int:
    if args.command == 'show-command':
        module = get_command(args.name)
        if module is None:
-            if args.output_format == 'json':
-                import json
-                error_envelope = {
-                    'name': args.name,
-                    'found': False,
-                    'error': {
-                        'kind': 'command_not_found',
-                        'message': f'Unknown command: {args.name}',
-                        'retryable': False,
-                    },
-                }
-                print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
-            else:
-                print(f'Command not found: {args.name}')
+            print(f'Command not found: {args.name}')
            return 1
-        if args.output_format == 'json':
-            import json
-            output = {
-                'name': module.name,
-                'found': True,
-                'source_hint': module.source_hint,
-                'responsibility': module.responsibility,
-            }
-            print(json.dumps(wrap_json_envelope(output, args.command)))
-        else:
-            print('\n'.join([module.name, module.source_hint, module.responsibility]))
+        print('\n'.join([module.name, module.source_hint, module.responsibility]))
        return 0
    if args.command == 'show-tool':
        module = get_tool(args.name)
        if module is None:
-            if args.output_format == 'json':
-                import json
-                error_envelope = {
-                    'name': args.name,
-                    'found': False,
-                    'error': {
-                        'kind': 'tool_not_found',
-                        'message': f'Unknown tool: {args.name}',
-                        'retryable': False,
-                    },
-                }
-                print(json.dumps(wrap_json_envelope(error_envelope, args.command, exit_code=1)))
-            else:
-                print(f'Tool not found: {args.name}')
+            print(f'Tool not found: {args.name}')
            return 1
-        if args.output_format == 'json':
-            import json
-            output = {
-                'name': module.name,
-                'found': True,
-                'source_hint': module.source_hint,
-                'responsibility': module.responsibility,
-            }
-            print(json.dumps(wrap_json_envelope(output, args.command)))
-        else:
-            print('\n'.join([module.name, module.source_hint, module.responsibility]))
+        print('\n'.join([module.name, module.source_hint, module.responsibility]))
        return 0
    if args.command == 'exec-command':
        result = execute_command(args.name, args.prompt)
-        # #168: JSON envelope with typed not-found error
-        # #181: envelope exit_code must match process exit code
-        exit_code = 0 if result.handled else 1
-        if args.output_format == 'json':
-            import json
-            if not result.handled:
-                envelope = {
-                    'name': args.name,
-                    'prompt': args.prompt,
-                    'handled': False,
-                    'error': {
-                        'kind': 'command_not_found',
-                        'message': result.message,
-                        'retryable': False,
-                    },
-                }
-            else:
-                envelope = {
-                    'name': result.name,
-                    'prompt': result.prompt,
-                    'source_hint': result.source_hint,
-                    'handled': True,
-                    'message': result.message,
-                }
-            print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
-        else:
-            print(result.message)
-        return exit_code
+        print(result.message)
+        return 0 if result.handled else 1
    if args.command == 'exec-tool':
        result = execute_tool(args.name, args.payload)
-        # #168: JSON envelope with typed not-found error
-        # #181: envelope exit_code must match process exit code
-        exit_code = 0 if result.handled else 1
-        if args.output_format == 'json':
-            import json
-            if not result.handled:
-                envelope = {
-                    'name': args.name,
-                    'payload': args.payload,
-                    'handled': False,
-                    'error': {
-                        'kind': 'tool_not_found',
-                        'message': result.message,
-                        'retryable': False,
-                    },
-                }
-            else:
-                envelope = {
-                    'name': result.name,
-                    'payload': result.payload,
-                    'source_hint': result.source_hint,
-                    'handled': True,
-                    'message': result.message,
-                }
-            print(json.dumps(wrap_json_envelope(envelope, args.command, exit_code=exit_code)))
-        else:
-            print(result.message)
-        return exit_code
+        print(result.message)
+        return 0 if result.handled else 1
    parser.error(f'unknown command: {args.command}')
    return 2

--- a/src/query_engine.py
+++ b/src/query_engine.py
@@ -1,7 +1,6 @@
 from __future__ import annotations

 import json
-import threading
 from dataclasses import dataclass, field
 from uuid import uuid4

@@ -31,7 +30,6 @@ class TurnResult:
    permission_denials: tuple[PermissionDenial, ...]
    usage: UsageSummary
    stop_reason: str
-    cancel_observed: bool = False


@dataclass
@@ -66,59 +64,7 @@ class QueryEnginePort:
        matched_commands: tuple[str, ...] = (),
        matched_tools: tuple[str, ...] = (),
        denied_tools: tuple[PermissionDenial, ...] = (),
-        cancel_event: threading.Event | None = None,
    ) -> TurnResult:
-        """Submit a prompt and return a TurnResult.
-
-        #164 Stage A: cooperative cancellation via cancel_event.
-
-        The cancel_event argument (added for #164) lets a caller request early
-        termination at a safe point. When set before the pre-mutation commit
-        stage, submit_message returns early with ``stop_reason='cancelled'``
-        and the engine's state (mutable_messages, transcript_store,
-        permission_denials, total_usage) is left **exactly as it was on
-        entry**. This closes the #161 follow-up gap: before this change, a
-        wedged provider thread could finish executing and silently mutate
-        state after the caller had already observed ``stop_reason='timeout'``,
-        giving the session a ghost turn the caller never acknowledged.
-
-        Contract:
-          - cancel_event is None (default) — legacy behaviour, no checks.
-          - cancel_event set **before** budget check — returns 'cancelled'
-            immediately; no output synthesis, no projection, no mutation.
-          - cancel_event set **between** budget check and commit — returns
-            'cancelled' with state intact.
-          - cancel_event set **after** commit — not observable; the turn is
-            already committed and the caller sees 'completed'. Cancellation
-            is a *safe point* mechanism, not preemption. This is the honest
-            limit of cooperative cancellation in Python threading land.
-
-        Stop reason taxonomy after #164 Stage A:
-          - 'completed'            — turn committed, state mutated exactly once
-          - 'max_budget_reached'   — overflow, state unchanged (#162)
-          - 'max_turns_reached'    — capacity exceeded, state unchanged
-          - 'cancelled'            — cancel_event observed, state unchanged
-          - 'timeout'              — synthesised by runtime, not engine (#161)
-
-        Callers that care about deadline-driven cancellation (run_turn_loop)
-        can now request cleanup by setting the event on timeout — the next
-        submit_message on the same engine will observe it at the start and
-        return 'cancelled' without touching state, even if the previous call
-        is still wedged in provider IO.
-        """
-        # #164 Stage A: earliest safe cancellation point. No output synthesis,
-        # no budget projection, no mutation — just an immediate clean return.
-        if cancel_event is not None and cancel_event.is_set():
-            return TurnResult(
-                prompt=prompt,
-                output='',
-                matched_commands=matched_commands,
-                matched_tools=matched_tools,
-                permission_denials=denied_tools,
-                usage=self.total_usage,  # unchanged
-                stop_reason='cancelled',
-            )
-
        if len(self.mutable_messages) >= self.config.max_turns:
            output = f'Max turns reached before processing prompt: {prompt}'
            return TurnResult(
@@ -139,40 +85,9 @@ class QueryEnginePort:
        ]
        output = self._format_output(summary_lines)
        projected_usage = self.total_usage.add_turn(prompt, output)
-
-        # #162: budget check must precede mutation. Previously this block set
-        # stop_reason='max_budget_reached' but still appended the overflow turn
-        # to mutable_messages / transcript_store / permission_denials, corrupting
-        # the session for any caller that persisted it afterwards. The overflow
-        # prompt was effectively committed even though the TurnResult signalled
-        # rejection. Now we early-return with pre-mutation state intact so
-        # callers can safely retry with a smaller prompt or a fresh budget.
+        stop_reason = 'completed'
        if projected_usage.input_tokens + projected_usage.output_tokens > self.config.max_budget_tokens:
-            return TurnResult(
-                prompt=prompt,
-                output=output,
-                matched_commands=matched_commands,
-                matched_tools=matched_tools,
-                permission_denials=denied_tools,
-                usage=self.total_usage,  # unchanged — overflow turn was rejected
-                stop_reason='max_budget_reached',
-            )
-
-        # #164 Stage A: second safe cancellation point. Projection is done
-        # but nothing has been committed yet. If the caller cancelled while
-        # we were building output / computing budget, honour it here — still
-        # no mutation.
-        if cancel_event is not None and cancel_event.is_set():
-            return TurnResult(
-                prompt=prompt,
-                output=output,
-                matched_commands=matched_commands,
-                matched_tools=matched_tools,
-                permission_denials=denied_tools,
-                usage=self.total_usage,  # unchanged
-                stop_reason='cancelled',
-            )
-
+            stop_reason = 'max_budget_reached'
        self.mutable_messages.append(prompt)
        self.transcript_store.append(prompt)
        self.permission_denials.extend(denied_tools)
@@ -185,7 +100,7 @@ class QueryEnginePort:
            matched_tools=matched_tools,
            permission_denials=denied_tools,
            usage=self.total_usage,
-            stop_reason='completed',
+            stop_reason=stop_reason,
        )

    def stream_submit_message(
@@ -222,19 +137,7 @@ class QueryEnginePort:
    def flush_transcript(self) -> None:
        self.transcript_store.flush()

-    def persist_session(self, directory: 'Path | None' = None) -> str:
-        """Flush the transcript and save the session to disk.
-
-        Args:
-            directory: Optional override for the storage directory. When None
-                (default, for backward compat), uses the default location
-                (``.port_sessions`` in CWD). When set, passes through to
-                ``save_session`` which already supports directory overrides.
-
-        #166: added directory parameter to match the session-lifecycle CLI
-        surface established by #160/#165. Claws running out-of-tree can now
-        redirect session creation to a workspace-specific dir without chdir.
-        """
+    def persist_session(self) -> str:
        self.flush_transcript()
        path = save_session(
            StoredSession(
@@ -242,8 +145,7 @@ class QueryEnginePort:
                messages=tuple(self.mutable_messages),
                input_tokens=self.total_usage.input_tokens,
                output_tokens=self.total_usage.output_tokens,
-            ),
-            directory,
+            )
        )
        return str(path)

--- a/src/runtime.py
+++ b/src/runtime.py
@@ -1,14 +1,11 @@
 from __future__ import annotations

-import threading
-import time
-from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError
 from dataclasses import dataclass

 from .commands import PORTED_COMMANDS
 from .context import PortContext, build_port_context, render_context
 from .history import HistoryLog
-from .models import PermissionDenial, PortingModule, UsageSummary
+from .models import PermissionDenial, PortingModule
 from .query_engine import QueryEngineConfig, QueryEnginePort, TurnResult
 from .setup import SetupReport, WorkspaceSetup, run_setup
 from .system_init import build_system_init_message
@@ -154,161 +151,21 @@ class PortRuntime:
            persisted_session_path=persisted_session_path,
        )

-    def run_turn_loop(
-        self,
-        prompt: str,
-        limit: int = 5,
-        max_turns: int = 3,
-        structured_output: bool = False,
-        timeout_seconds: float | None = None,
-        continuation_prompt: str | None = None,
-    ) -> list[TurnResult]:
-        """Run a multi-turn engine loop with optional wall-clock deadline.
-
-        Args:
-            prompt: The initial prompt to submit.
-            limit: Match routing limit.
-            max_turns: Maximum number of turns before stopping.
-            structured_output: Whether to request structured output.
-            timeout_seconds: Total wall-clock budget across all turns. When the
-                budget is exhausted mid-turn, a synthetic TurnResult with
-                ``stop_reason='timeout'`` is appended and the loop exits.
-                ``None`` (default) preserves legacy unbounded behaviour.
-            continuation_prompt: What to send on turns after the first. When
-                ``None`` (default, #163), the loop stops after turn 0 and the
-                caller decides how to continue. When set, the same text is
-                submitted for every turn after the first, giving claws a clean
-                hook for structured follow-ups (e.g. ``"Continue."``, a
-                routing-planner instruction, or a tool-output cue). Previously
-                the loop silently appended ``" [turn N]"`` to the original
-                prompt, polluting the transcript with harness-generated
-                annotation the model had no way to interpret.
-
-        Returns:
-            A list of TurnResult objects. The final entry's ``stop_reason``
-            distinguishes ``'completed'``, ``'max_turns_reached'``,
-            ``'max_budget_reached'``, or ``'timeout'``.
-
-        #161: prior to this change a hung ``engine.submit_message`` call would
-        block the loop indefinitely with no cancellation path, forcing claws to
-        rely on external watchdogs or OS-level kills. Callers can now enforce a
-        deadline and receive a typed timeout signal instead.
-
-        #163: the old ``f'{prompt} [turn {turn + 1}]'`` suffix was never
-        interpreted by the engine or any system prompt. It looked like a real
-        user turn in ``mutable_messages`` and the transcript, making replay and
-        analysis fragile. Removed entirely; callers supply ``continuation_prompt``
-        for meaningful follow-ups or let the loop stop after turn 0.
-        """
+    def run_turn_loop(self, prompt: str, limit: int = 5, max_turns: int = 3, structured_output: bool = False) -> list[TurnResult]:
        engine = QueryEnginePort.from_workspace()
        engine.config = QueryEngineConfig(max_turns=max_turns, structured_output=structured_output)
        matches = self.route_prompt(prompt, limit=limit)
        command_names = tuple(match.name for match in matches if match.kind == 'command')
        tool_names = tuple(match.name for match in matches if match.kind == 'tool')
-        # #159: infer permission denials from the routed matches, not hardcoded empty tuple.
-        # Multi-turn sessions must have the same security posture as bootstrap_session.
-        denied_tools = tuple(self._infer_permission_denials(matches))
        results: list[TurnResult] = []
-        deadline = time.monotonic() + timeout_seconds if timeout_seconds is not None else None
-        # #164 Stage A: shared cancel_event signals cooperative cancellation
-        # across turns. On timeout we set() it so any still-running
-        # submit_message call (or the next one on the same engine) observes
-        # the cancel at a safe checkpoint and returns stop_reason='cancelled'
-        # without mutating state. This closes the window where a wedged
-        # provider thread could commit a ghost turn after the caller saw
-        # 'timeout'.
-        cancel_event = threading.Event() if deadline is not None else None
-
-        # ThreadPoolExecutor is reused across turns so we cancel cleanly on exit.
-        executor = ThreadPoolExecutor(max_workers=1) if deadline is not None else None
-        try:
-            for turn in range(max_turns):
-                # #163: no more f'{prompt} [turn N]' suffix injection.
-                # On turn 0 submit the original prompt.
-                # On turn > 0, submit the caller-supplied continuation prompt;
-                # if the caller did not supply one, stop the loop cleanly instead
-                # of fabricating a fake user turn.
-                if turn == 0:
-                    turn_prompt = prompt
-                elif continuation_prompt is not None:
-                    turn_prompt = continuation_prompt
-                else:
-                    break
-
-                if deadline is None:
-                    # Legacy path: unbounded call, preserves existing behaviour exactly.
-                    # #159: pass inferred denied_tools (no longer hardcoded empty tuple)
-                    # #164: cancel_event is None on this path; submit_message skips
-                    # cancellation checks entirely (legacy zero-overhead behaviour).
-                    result = engine.submit_message(turn_prompt, command_names, tool_names, denied_tools)
-                else:
-                    remaining = deadline - time.monotonic()
-                    if remaining <= 0:
-                        # #164: signal cancel for any in-flight/future submit_message
-                        # calls that share this engine. Safe because nothing has been
-                        # submitted yet this turn.
-                        assert cancel_event is not None
-                        cancel_event.set()
-                        results.append(self._build_timeout_result(
-                            turn_prompt, command_names, tool_names,
-                            cancel_observed=cancel_event.is_set()
-                        ))
-                        break
-                    assert executor is not None
-                    future = executor.submit(
-                        engine.submit_message, turn_prompt, command_names, tool_names,
-                        denied_tools, cancel_event,
-                    )
-                    try:
-                        result = future.result(timeout=remaining)
-                    except FuturesTimeoutError:
-                        # #164 Stage A: explicitly signal cancel to the still-running
-                        # submit_message thread. The next time it hits a checkpoint
-                        # (entry or post-budget), it returns 'cancelled' without
-                        # mutating state instead of committing a ghost turn. This
-                        # upgrades #161's best-effort future.cancel() (which only
-                        # cancels pre-start futures) to cooperative mid-flight cancel.
-                        assert cancel_event is not None
-                        cancel_event.set()
-                        future.cancel()
-                        results.append(self._build_timeout_result(
-                            turn_prompt, command_names, tool_names,
-                            cancel_observed=cancel_event.is_set()
-                        ))
-                        break
-
-                results.append(result)
-                if result.stop_reason != 'completed':
-                    break
-        finally:
-            if executor is not None:
-                # wait=False: don't let a hung thread block loop exit indefinitely.
-                # The thread will be reaped when the interpreter shuts down or when
-                # the engine call eventually returns.
-                executor.shutdown(wait=False)
+        for turn in range(max_turns):
+            turn_prompt = prompt if turn == 0 else f'{prompt} [turn {turn + 1}]'
+            result = engine.submit_message(turn_prompt, command_names, tool_names, ())
+            results.append(result)
+            if result.stop_reason != 'completed':
+                break
        return results

-    @staticmethod
-    def _build_timeout_result(
-        prompt: str,
-        command_names: tuple[str, ...],
-        tool_names: tuple[str, ...],
-        cancel_observed: bool = False,
-    ) -> TurnResult:
-        """Synthesize a TurnResult representing a wall-clock timeout (#161).
-        #164 Stage B: cancel_observed signals cancellation event was set.
-        """
-        return TurnResult(
-            prompt=prompt,
-            output='Wall-clock timeout exceeded before turn completed.',
-            matched_commands=command_names,
-            matched_tools=tool_names,
-            permission_denials=(),
-            usage=UsageSummary(),
-            stop_reason='timeout',
-            cancel_observed=cancel_observed,
-        )
-
    def _infer_permission_denials(self, matches: list[RoutedMatch]) -> list[PermissionDenial]:
        denials: list[PermissionDenial] = []
        for match in matches:
--- a/src/session_store.py
+++ b/src/session_store.py
@@ -26,96 +26,10 @@ def save_session(session: StoredSession, directory: Path | None = None) -> Path:

 def load_session(session_id: str, directory: Path | None = None) -> StoredSession:
    target_dir = directory or DEFAULT_SESSION_DIR
-    try:
-        data = json.loads((target_dir / f'{session_id}.json').read_text())
-    except FileNotFoundError:
-        raise SessionNotFoundError(f'session {session_id!r} not found in {target_dir}') from None
+    data = json.loads((target_dir / f'{session_id}.json').read_text())
    return StoredSession(
        session_id=data['session_id'],
        messages=tuple(data['messages']),
        input_tokens=data['input_tokens'],
        output_tokens=data['output_tokens'],
    )
-
-
-class SessionNotFoundError(KeyError):
-    """Raised when a session does not exist in the store."""
-    pass
-
-
-def list_sessions(directory: Path | None = None) -> list[str]:
-    """List all stored session IDs in the target directory.
-    
-    Args:
-        directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
-    
-    Returns:
-        Sorted list of session IDs (JSON filenames without .json extension).
-    """
-    target_dir = directory or DEFAULT_SESSION_DIR
-    if not target_dir.exists():
-        return []
-    return sorted(p.stem for p in target_dir.glob('*.json'))
-
-
-def session_exists(session_id: str, directory: Path | None = None) -> bool:
-    """Check if a session exists without raising an error.
-    
-    Args:
-        session_id: The session ID to check.
-        directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
-    
-    Returns:
-        True if the session file exists, False otherwise.
-    """
-    target_dir = directory or DEFAULT_SESSION_DIR
-    return (target_dir / f'{session_id}.json').exists()
-
-
-class SessionDeleteError(OSError):
-    """Raised when a session file exists but cannot be removed (permission, IO error).
-    
-    Distinct from SessionNotFoundError: this means the session was present but
-    deletion failed mid-operation. Callers can retry or escalate.
-    """
-    pass
-
-
-def delete_session(session_id: str, directory: Path | None = None) -> bool:
-    """Delete a session file from the store.
-    
-    Contract:
-    - **Idempotent**: `delete_session(x)` followed by `delete_session(x)` is safe.
-      Second call returns False (not found), does not raise.
-    - **Race-safe**: Uses `missing_ok=True` on unlink to avoid TOCTOU between
-      exists-check and unlink. Concurrent deletion by another process is
-      treated as a no-op success (returns False for the losing caller).
-    - **Partial-failure surfaced**: If the file exists but cannot be removed
-      (permission denied, filesystem error, directory instead of file), raises
-      `SessionDeleteError` wrapping the underlying OSError. The session store
-      may be in an inconsistent state; caller should retry or escalate.
-    
-    Args:
-        session_id: The session ID to delete.
-        directory: Target session directory. Defaults to DEFAULT_SESSION_DIR.
-    
-    Returns:
-        True if this call deleted the session file.
-        False if the session did not exist (either never existed or was already deleted).
-    
-    Raises:
-        SessionDeleteError: if the session existed but deletion failed.
-    """
-    target_dir = directory or DEFAULT_SESSION_DIR
-    path = target_dir / f'{session_id}.json'
-    try:
-        # Python 3.8+: missing_ok=True avoids TOCTOU race
-        path.unlink(missing_ok=False)
-        return True
-    except FileNotFoundError:
-        # Either never existed or was concurrently deleted — both are no-ops
-        return False
-    except (PermissionError, IsADirectoryError, OSError) as exc:
-        raise SessionDeleteError(
-            f'session {session_id!r} exists in {target_dir} but could not be deleted: {exc}'
-        ) from exc
--- a/tests/test_cancel_observed_field.py
+++ b/tests/test_cancel_observed_field.py
@@ -1,199 +0,0 @@
-"""#164 Stage B — cancel_observed field coverage.
-
-Validates that the TurnResult.cancel_observed field correctly signals
-whether cancellation was observed during turn execution.
-
-Test coverage:
-1. Normal completion: cancel_observed=False (no timeout occurred)
-2. Timeout with cancel signaled: cancel_observed=True
-3. bootstrap JSON output exposes the field
-4. turn-loop JSON output exposes cancel_observed per turn
-5. Safe-to-reuse: after timeout with cancel_observed=True,
-   engine can accept fresh messages without state corruption
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-import pytest
-
-from src.query_engine import QueryEnginePort, TurnResult
-from src.runtime import PortRuntime
-
-
-CLI = [sys.executable, '-m', 'src.main']
-REPO_ROOT = Path(__file__).resolve().parent.parent
-
-
-class TestCancelObservedField:
-    """TurnResult.cancel_observed correctly signals cancellation observation."""
-
-    def test_default_value_is_false(self) -> None:
-        """New TurnResult defaults to cancel_observed=False (backward compat)."""
-        from src.models import UsageSummary
-        result = TurnResult(
-            prompt='test',
-            output='ok',
-            matched_commands=(),
-            matched_tools=(),
-            permission_denials=(),
-            usage=UsageSummary(),
-            stop_reason='completed',
-        )
-        assert result.cancel_observed is False
-
-    def test_explicit_true_preserved(self) -> None:
-        """cancel_observed=True is preserved through construction."""
-        from src.models import UsageSummary
-        result = TurnResult(
-            prompt='test',
-            output='timed out',
-            matched_commands=(),
-            matched_tools=(),
-            permission_denials=(),
-            usage=UsageSummary(),
-            stop_reason='timeout',
-            cancel_observed=True,
-        )
-        assert result.cancel_observed is True
-
-    def test_normal_completion_cancel_observed_false(self) -> None:
-        """Normal turn completion → cancel_observed=False."""
-        runtime = PortRuntime()
-        results = runtime.run_turn_loop('hello', max_turns=1)
-        assert len(results) >= 1
-        assert results[0].cancel_observed is False
-
-    def test_bootstrap_json_includes_cancel_observed(self) -> None:
-        """bootstrap JSON envelope includes cancel_observed in turn result."""
-        result = subprocess.run(
-            CLI + ['bootstrap', 'hello', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0
-        envelope = json.loads(result.stdout)
-        assert 'turn' in envelope
-        assert 'cancel_observed' in envelope['turn'], (
-            f"bootstrap turn must include cancel_observed (SCHEMAS.md contract). "
-            f"Got keys: {list(envelope['turn'].keys())}"
-        )
-        # Normal completion → False
-        assert envelope['turn']['cancel_observed'] is False
-
-    def test_turn_loop_json_per_turn_cancel_observed(self) -> None:
-        """turn-loop JSON envelope includes cancel_observed per turn (#164 Stage B closure)."""
-        result = subprocess.run(
-            CLI + ['turn-loop', 'hello', '--max-turns', '1', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0, f"stderr: {result.stderr}"
-        envelope = json.loads(result.stdout)
-        # Common fields from wrap_json_envelope
-        assert envelope['command'] == 'turn-loop'
-        assert envelope['schema_version'] == '1.0'
-        # Turn-loop-specific fields
-        assert 'turns' in envelope
-        assert len(envelope['turns']) >= 1
-        for idx, turn in enumerate(envelope['turns']):
-            assert 'cancel_observed' in turn, (
-                f"Turn {idx} missing cancel_observed: {list(turn.keys())}"
-            )
-        # final_cancel_observed convenience field
-        assert 'final_cancel_observed' in envelope
-        assert isinstance(envelope['final_cancel_observed'], bool)
-
-
-class TestCancelObservedSafeReuseSemantics:
-    """After timeout with cancel_observed=True, engine state is safe to reuse."""
-
-    def test_timeout_result_cancel_observed_true_when_signaled(self) -> None:
-        """#164 Stage B: timeout path passes cancel_event.is_set() to result."""
-        # Force a timeout with max_turns=3 and timeout=0.0001 (instant)
-        runtime = PortRuntime()
-        results = runtime.run_turn_loop(
-            'hello', max_turns=3, timeout_seconds=0.0001,
-            continuation_prompt='keep going',
-        )
-        # Last result should be timeout (pre-start path since timeout is instant)
-        assert results, 'timeout path should still produce a result'
-        last = results[-1]
-        assert last.stop_reason == 'timeout'
-        # cancel_observed=True because the timeout path explicitly sets cancel_event
-        assert last.cancel_observed is True, (
-            f"timeout path must signal cancel_observed=True; got {last.cancel_observed}. "
-            f"stop_reason={last.stop_reason}"
-        )
-
-    def test_engine_messages_not_corrupted_by_timeout(self) -> None:
-        """After timeout with cancel_observed, engine.mutable_messages is consistent.
-
-        #164 Stage B contract: safe-to-reuse means after a timeout-with-cancel,
-        the engine has not committed a ghost turn and can accept fresh input.
-        """
-        engine = QueryEnginePort.from_workspace()
-        # Track initial state
-        initial_message_count = len(engine.mutable_messages)
-
-        # Simulate a direct submit_message call with cancellation
-        import threading
-        cancel_event = threading.Event()
-        cancel_event.set()  # Pre-set: first checkpoint fires
-        result = engine.submit_message(
-            'test', ('cmd1',), ('tool1',),
-            denied_tools=(), cancel_event=cancel_event,
-        )
-
-        # Cancelled turn should not commit mutation
-        assert result.stop_reason == 'cancelled', (
-            f"expected cancelled; got {result.stop_reason}"
-        )
-        # mutable_messages should not have grown
-        assert len(engine.mutable_messages) == initial_message_count, (
-            f"engine.mutable_messages grew after cancelled turn "
-            f"(was {initial_message_count}, now {len(engine.mutable_messages)})"
-        )
-
-        # Engine should accept a fresh message now
-        fresh = engine.submit_message('fresh prompt', ('cmd1',), ('tool1',))
-        assert fresh.stop_reason in ('completed', 'max_budget_reached'), (
-            f"expected engine reusable; got {fresh.stop_reason}"
-        )
-
-
-class TestCancelObservedSchemaCompliance:
-    """SCHEMAS.md contract for cancel_observed field."""
-
-    def test_cancel_observed_is_bool_not_nullable(self) -> None:
-        """cancel_observed is always bool (never null/missing) per SCHEMAS.md."""
-        result = subprocess.run(
-            CLI + ['bootstrap', 'test', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        cancel_observed = envelope['turn']['cancel_observed']
-        assert isinstance(cancel_observed, bool), (
-            f"cancel_observed must be bool; got {type(cancel_observed)}"
-        )
-
-    def test_turn_loop_envelope_has_final_cancel_observed(self) -> None:
-        """turn-loop JSON exposes final_cancel_observed convenience field."""
-        result = subprocess.run(
-            CLI + ['turn-loop', 'test', '--max-turns', '1', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0
-        envelope = json.loads(result.stdout)
-        assert 'final_cancel_observed' in envelope
-        assert isinstance(envelope['final_cancel_observed'], bool)
--- a/tests/test_cli_parity_audit.py
+++ b/tests/test_cli_parity_audit.py
@@ -1,333 +0,0 @@
-"""Cross-surface CLI parity audit (ROADMAP #171).
-
-Prevents future drift of the unified JSON envelope contract across
-claw-code's CLI surface. Instead of requiring humans to notice when
-a new command skips --output-format, this test introspects the parser
-at runtime and verifies every command in the declared clawable-surface
-list supports --output-format {text,json}.
-
-When a new clawable-surface command is added:
-  1. Implement --output-format on the subparser (normal feature work).
-  2. Add the command name to CLAWABLE_SURFACES below.
-  3. This test passes automatically.
-
-When a developer adds a new clawable-surface command but forgets
--output-format, the test fails with a concrete message pointing at
-the missing flag. Claws no longer need to eyeball parity; the contract
-is enforced at test time.
-
-Three classes of commands:
-  - CLAWABLE_SURFACES: MUST accept --output-format (inspect/lifecycle/exec/diagnostic)
-  - OPT_OUT_SURFACES: explicitly exempt (simulation/mode commands, human-first diagnostic)
-  - Any command in parser not listed in either: test FAILS with classification request
-
-This is operationalised parity — a machine-first CLI enforced by a
-machine-first test.
-"""
-
-from __future__ import annotations
-
-import subprocess
-import sys
-from pathlib import Path
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.main import build_parser  # noqa: E402
-
-
-# Commands that MUST accept --output-format {text,json}.
-# These are the machine-first surfaces — session lifecycle, execution,
-# inspect, diagnostic inventory.
-CLAWABLE_SURFACES = frozenset({
-    # Session lifecycle (#160, #165, #166)
-    'list-sessions',
-    'delete-session',
-    'load-session',
-    'flush-transcript',
-    # Inspect (#167)
-    'show-command',
-    'show-tool',
-    # Execution/work-verb (#168)
-    'exec-command',
-    'exec-tool',
-    'route',
-    'bootstrap',
-    # Diagnostic inventory (#169, #170)
-    'command-graph',
-    'tool-pool',
-    'bootstrap-graph',
-    # Turn-loop with JSON output (#164 Stage B, #174)
-    'turn-loop',
-})
-
-# Commands explicitly exempt from --output-format requirement.
-# Rationale must be explicit — either the command is human-first
-# (rich Markdown docs/reports), simulation-only, or has a dedicated
-# JSON mode flag under a different name.
-OPT_OUT_SURFACES = frozenset({
-    # Rich-Markdown report commands (planned future: JSON schema)
-    'summary',            # full workspace summary (Markdown)
-    'manifest',           # workspace manifest (Markdown)
-    'parity-audit',       # TypeScript archive comparison (Markdown)
-    'setup-report',       # startup/prefetch report (Markdown)
-    # List commands with their own query/filter surface (not JSON yet)
-    'subsystems',         # use --limit
-    'commands',           # use --query / --limit / --no-plugin-commands
-    'tools',              # use --query / --limit / --simple-mode
-    # Simulation/debug surfaces (not claw-orchestrated)
-    'remote-mode',
-    'ssh-mode',
-    'teleport-mode',
-    'direct-connect-mode',
-    'deep-link-mode',
-})
-
-
-def _discover_subcommands_and_flags() -> dict[str, frozenset[str]]:
-    """Introspect the argparse tree to discover every subcommand and its flags.
-
-    Returns:
-      {subcommand_name: frozenset of option strings including --output-format
-       if registered}
-    """
-    parser = build_parser()
-    subcommand_flags: dict[str, frozenset[str]] = {}
-    for action in parser._actions:
-        if not hasattr(action, 'choices') or not action.choices:
-            continue
-        if action.dest != 'command':
-            continue
-        for name, subp in action.choices.items():
-            flags: set[str] = set()
-            for a in subp._actions:
-                if a.option_strings:
-                    flags.update(a.option_strings)
-            subcommand_flags[name] = frozenset(flags)
-    return subcommand_flags
-
-
-class TestClawableSurfaceParity:
-    """Every clawable-surface command MUST accept --output-format {text,json}.
-
-    This is the invariant that codifies 'claws can treat the CLI as a
-    unified protocol without special-casing'.
-    """
-
-    def test_all_clawable_surfaces_accept_output_format(self) -> None:
-        """All commands in CLAWABLE_SURFACES must have --output-format registered."""
-        subcommand_flags = _discover_subcommands_and_flags()
-        missing = []
-        for cmd in CLAWABLE_SURFACES:
-            if cmd not in subcommand_flags:
-                missing.append(f'{cmd}: not registered in parser')
-            elif '--output-format' not in subcommand_flags[cmd]:
-                missing.append(f'{cmd}: missing --output-format flag')
-        assert not missing, (
-            'Clawable-surface parity violation. Every command in '
-            'CLAWABLE_SURFACES must accept --output-format. Failures:\n'
-            + '\n'.join(f'  - {m}' for m in missing)
-        )
-
-    @pytest.mark.parametrize('cmd_name', sorted(CLAWABLE_SURFACES))
-    def test_clawable_surface_output_format_choices(self, cmd_name: str) -> None:
-        """Every clawable surface must accept exactly {text, json} choices."""
-        parser = build_parser()
-        for action in parser._actions:
-            if not hasattr(action, 'choices') or not action.choices:
-                continue
-            if action.dest != 'command':
-                continue
-            if cmd_name not in action.choices:
-                continue
-            subp = action.choices[cmd_name]
-            for a in subp._actions:
-                if '--output-format' in a.option_strings:
-                    assert a.choices == ['text', 'json'], (
-                        f'{cmd_name}: --output-format choices are {a.choices}, '
-                        f'expected [text, json]'
-                    )
-                    assert a.default == 'text', (
-                        f'{cmd_name}: --output-format default is {a.default!r}, '
-                        f'expected \'text\' for backward compat'
-                    )
-                    return
-        pytest.fail(f'{cmd_name}: no --output-format flag found')
-
-
-class TestCommandClassificationCoverage:
-    """Every registered subcommand must be classified as either CLAWABLE or OPT_OUT.
-
-    If a new command is added to the parser but forgotten in both sets, this
-    test fails loudly — forcing an explicit classification decision.
-    """
-
-    def test_every_registered_command_is_classified(self) -> None:
-        subcommand_flags = _discover_subcommands_and_flags()
-        all_classified = CLAWABLE_SURFACES | OPT_OUT_SURFACES
-        unclassified = set(subcommand_flags.keys()) - all_classified
-        assert not unclassified, (
-            'Unclassified subcommands detected. Every new command must be '
-            'explicitly added to either CLAWABLE_SURFACES (must accept '
-            '--output-format) or OPT_OUT_SURFACES (explicitly exempt with '
-            'rationale). Unclassified:\n'
-            + '\n'.join(f'  - {cmd}' for cmd in sorted(unclassified))
-        )
-
-    def test_no_command_in_both_sets(self) -> None:
-        """Sanity: a command cannot be both clawable AND opt-out."""
-        overlap = CLAWABLE_SURFACES & OPT_OUT_SURFACES
-        assert not overlap, (
-            f'Classification conflict: commands appear in both sets: {overlap}'
-        )
-
-    def test_all_classified_commands_actually_exist(self) -> None:
-        """No typos — every command in our sets must actually be registered."""
-        subcommand_flags = _discover_subcommands_and_flags()
-        ghosts = (CLAWABLE_SURFACES | OPT_OUT_SURFACES) - set(subcommand_flags.keys())
-        assert not ghosts, (
-            f'Phantom commands in classification sets (not in parser): {ghosts}. '
-            'Update CLAWABLE_SURFACES / OPT_OUT_SURFACES if commands were removed.'
-        )
-
-
-class TestJsonOutputContractEndToEnd:
-    """Verify the contract AT RUNTIME — not just parser-level, but actual execution.
-
-    Each clawable command must, when invoked with --output-format json,
-    produce parseable JSON on stdout (for success cases).
-    """
-
-    # Minimal invocation args for each clawable command (to hit success path)
-    RUNTIME_INVOCATIONS = {
-        'list-sessions': [],
-        # delete-session/load-session: skip (need state setup, covered by dedicated tests)
-        'show-command': ['add-dir'],
-        'show-tool': ['BashTool'],
-        'exec-command': ['add-dir', 'hi'],
-        'exec-tool': ['BashTool', '{}'],
-        'route': ['review'],
-        'bootstrap': ['hello'],
-        'command-graph': [],
-        'tool-pool': [],
-        'bootstrap-graph': [],
-        # flush-transcript: skip (creates files, covered by dedicated tests)
-    }
-
-    @pytest.mark.parametrize('cmd_name,cmd_args', sorted(RUNTIME_INVOCATIONS.items()))
-    def test_command_emits_parseable_json(self, cmd_name: str, cmd_args: list[str]) -> None:
-        """End-to-end: invoking with --output-format json yields valid JSON."""
-        import json
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        # Accept exit 0 (success) or 1 (typed not-found) — both must still produce JSON
-        assert result.returncode in (0, 1), (
-            f'{cmd_name}: unexpected exit {result.returncode}\n'
-            f'stderr: {result.stderr}\n'
-            f'stdout: {result.stdout[:200]}'
-        )
-        try:
-            json.loads(result.stdout)
-        except json.JSONDecodeError as e:
-            pytest.fail(
-                f'{cmd_name} {cmd_args} --output-format json did not produce '
-                f'parseable JSON: {e}\nOutput: {result.stdout[:200]}'
-            )
-
-
-class TestOptOutSurfaceRejection:
-    """Cycle #30: OPT_OUT surfaces must REJECT --output-format, not silently accept.
-    
-    OPT_OUT_AUDIT.md classifies 12 surfaces as intentionally exempt from the
-    JSON envelope contract. This test LOCKS that rejection so accidental
-    drift (e.g., a developer adds --output-format to summary without thinking)
-    doesn't silently promote an OPT_OUT surface to CLAWABLE.
-    
-    Relationship to existing tests:
-    - test_clawable_surface_has_output_format: asserts CLAWABLE surfaces accept it
-    - TestOptOutSurfaceRejection: asserts OPT_OUT surfaces REJECT it
-    
-    Together, these two test classes form a complete parity check:
-    every surface is either IN or OUT, and both cases are explicitly tested.
-    
-    If an OPT_OUT surface is promoted to CLAWABLE intentionally:
-    1. Move it from OPT_OUT_SURFACES to CLAWABLE_SURFACES
-    2. Update OPT_OUT_AUDIT.md with promotion rationale
-    3. Remove from this test's expected rejections
-    4. Both sets of tests continue passing
-    """
-
-    @pytest.mark.parametrize('cmd_name', sorted(OPT_OUT_SURFACES))
-    def test_opt_out_surface_rejects_output_format(self, cmd_name: str) -> None:
-        """OPT_OUT surfaces must NOT accept --output-format flag.
-        
-        Passing --output-format to an OPT_OUT surface should produce an
-        'unrecognized arguments' error from argparse.
-        """
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', cmd_name, '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        # Should fail — argparse exit 2 in text mode, exit 1 in JSON mode
-        # (both modes normalize to "unrecognized arguments" message)
-        assert result.returncode != 0, (
-            f'{cmd_name} unexpectedly accepted --output-format json. '
-            f'If this is intentional (promotion to CLAWABLE), move from '
-            f'OPT_OUT_SURFACES to CLAWABLE_SURFACES and update OPT_OUT_AUDIT.md. '
-            f'Output: {result.stdout[:200]}\nStderr: {result.stderr[:200]}'
-        )
-        # Verify the error is specifically about --output-format
-        error_text = result.stdout + result.stderr
-        assert '--output-format' in error_text or 'unrecognized' in error_text, (
-            f'{cmd_name} failed but error not about --output-format. '
-            f'Something else is broken:\n'
-            f'stdout: {result.stdout[:300]}\nstderr: {result.stderr[:300]}'
-        )
-
-    def test_opt_out_set_matches_audit_document(self) -> None:
-        """OPT_OUT_SURFACES constant must exactly match OPT_OUT_AUDIT.md listing.
-        
-        This test reads OPT_OUT_AUDIT.md and verifies the constant doesn't
-        drift from the documentation.
-        """
-        audit_path = Path(__file__).resolve().parent.parent / 'OPT_OUT_AUDIT.md'
-        audit_text = audit_path.read_text()
-        
-        # Expected 12 surfaces per audit doc
-        expected_surfaces = {
-            # Group A: Rich-Markdown Reports (4)
-            'summary', 'manifest', 'parity-audit', 'setup-report',
-            # Group B: List Commands (3)
-            'subsystems', 'commands', 'tools',
-            # Group C: Simulation/Debug (5)
-            'remote-mode', 'ssh-mode', 'teleport-mode',
-            'direct-connect-mode', 'deep-link-mode',
-        }
-        
-        assert OPT_OUT_SURFACES == expected_surfaces, (
-            f'OPT_OUT_SURFACES drift from expected 12 surfaces per audit:\n'
-            f'  Expected: {sorted(expected_surfaces)}\n'
-            f'  Actual:   {sorted(OPT_OUT_SURFACES)}'
-        )
-        
-        # Each surface should be mentioned in audit doc
-        missing_from_audit = [s for s in OPT_OUT_SURFACES if s not in audit_text]
-        assert not missing_from_audit, (
-            f'OPT_OUT surfaces not mentioned in OPT_OUT_AUDIT.md: {missing_from_audit}'
-        )
-
-    def test_opt_out_count_matches_declared(self) -> None:
-        """OPT_OUT_AUDIT.md declares '12 surfaces'. Constant must match."""
-        assert len(OPT_OUT_SURFACES) == 12, (
-            f'OPT_OUT_SURFACES has {len(OPT_OUT_SURFACES)} items, '
-            f'but OPT_OUT_AUDIT.md declares 12 total surfaces. '
-            f'Update either the audit doc or the constant.'
-        )
--- a/tests/test_command_graph_tool_pool_output_format.py
+++ b/tests/test_command_graph_tool_pool_output_format.py
@@ -1,70 +0,0 @@
-"""Tests for --output-format on command-graph and tool-pool (ROADMAP #169).
-
-Diagnostic inventory surfaces now speak the CLI family's JSON contract.
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-
-def _run(args: list[str]) -> subprocess.CompletedProcess:
-    return subprocess.run(
-        [sys.executable, '-m', 'src.main', *args],
-        cwd=Path(__file__).resolve().parent.parent,
-        capture_output=True,
-        text=True,
-    )
-
-
-class TestCommandGraphOutputFormat:
-    def test_command_graph_json(self) -> None:
-        result = _run(['command-graph', '--output-format', 'json'])
-        assert result.returncode == 0, result.stderr
-
-        envelope = json.loads(result.stdout)
-        assert 'builtins_count' in envelope
-        assert 'plugin_like_count' in envelope
-        assert 'skill_like_count' in envelope
-        assert 'total_count' in envelope
-        assert envelope['total_count'] == (
-            envelope['builtins_count'] + envelope['plugin_like_count'] + envelope['skill_like_count']
-        )
-        assert isinstance(envelope['builtins'], list)
-        if envelope['builtins']:
-            assert set(envelope['builtins'][0].keys()) == {'name', 'source_hint'}
-
-    def test_command_graph_text_backward_compat(self) -> None:
-        result = _run(['command-graph'])
-        assert result.returncode == 0
-        assert '# Command Graph' in result.stdout
-        assert 'Builtins:' in result.stdout
-        # Not JSON
-        assert not result.stdout.strip().startswith('{')
-
-
-class TestToolPoolOutputFormat:
-    def test_tool_pool_json(self) -> None:
-        result = _run(['tool-pool', '--output-format', 'json'])
-        assert result.returncode == 0, result.stderr
-
-        envelope = json.loads(result.stdout)
-        assert 'simple_mode' in envelope
-        assert 'include_mcp' in envelope
-        assert 'tool_count' in envelope
-        assert 'tools' in envelope
-        assert envelope['tool_count'] == len(envelope['tools'])
-        if envelope['tools']:
-            assert set(envelope['tools'][0].keys()) == {'name', 'source_hint'}
-
-    def test_tool_pool_text_backward_compat(self) -> None:
-        result = _run(['tool-pool'])
-        assert result.returncode == 0
-        assert '# Tool Pool' in result.stdout
-        assert 'Simple mode:' in result.stdout
-        assert not result.stdout.strip().startswith('{')
--- a/tests/test_cross_channel_consistency.py
+++ b/tests/test_cross_channel_consistency.py
@@ -1,242 +0,0 @@
-"""Cycle #27 cross-channel consistency audit (post-#181).
-
-After #181 fix (envelope.exit_code must match process exit), this test
-class systematizes the three-layer protocol invariant framework:
-
-1. Structural compliance: Does the envelope exist? (#178)
-2. Quality compliance: Is stderr silent + message truthful? (#179)
-3. Cross-channel consistency: Do multiple channels agree? (#181 + this)
-
-This file captures cycle #27's proactive invariant audit proving that
-envelope fields match their corresponding reality channels:
-
- envelope.command ↔ argv dispatch
- envelope.output_format ↔ --output-format flag
- envelope.timestamp ↔ actual wall clock
- envelope.found/handled/deleted ↔ operational truth (no error block mismatch)
-
-All tests passing = no drift detected.
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-from datetime import datetime, timezone
-from pathlib import Path
-
-import pytest
-
-import sys
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-
-def _run(args: list[str]) -> subprocess.CompletedProcess:
-    """Run claw-code command and capture output."""
-    return subprocess.run(
-        ['python3', '-m', 'src.main'] + args,
-        cwd=Path(__file__).parent.parent,
-        capture_output=True,
-        text=True,
-    )
-
-
-class TestCrossChannelConsistency:
-    """Cycle #27: envelope fields must match reality channels.
-    
-    These are distinct from structural/quality tests. A command can
-    emit structurally valid JSON with clean stderr but still lie about
-    its own output_format or exit code (as #181 proved).
-    """
-
-    def test_envelope_command_matches_dispatch(self) -> None:
-        """Envelope.command must equal the dispatched subcommand."""
-        commands_to_test = [
-            'show-command',
-            'show-tool',
-            'list-sessions',
-            'exec-command',
-            'exec-tool',
-            'delete-session',
-        ]
-        failures = []
-        for cmd in commands_to_test:
-            # Dispatch varies by arity
-            if cmd == 'show-command':
-                args = [cmd, 'nonexistent', '--output-format', 'json']
-            elif cmd == 'show-tool':
-                args = [cmd, 'nonexistent', '--output-format', 'json']
-            elif cmd == 'exec-command':
-                args = [cmd, 'unknown', 'test', '--output-format', 'json']
-            elif cmd == 'exec-tool':
-                args = [cmd, 'unknown', '{}', '--output-format', 'json']
-            else:
-                args = [cmd, '--output-format', 'json']
-            
-            result = _run(args)
-            try:
-                envelope = json.loads(result.stdout)
-            except json.JSONDecodeError:
-                failures.append(f'{cmd}: JSON parse error')
-                continue
-            
-            if envelope.get('command') != cmd:
-                failures.append(
-                    f'{cmd}: envelope.command={envelope.get("command")}, '
-                    f'expected {cmd}'
-                )
-        assert not failures, (
-            'Envelope.command must match dispatched subcommand:\n' +
-            '\n'.join(failures)
-        )
-
-    def test_envelope_output_format_matches_flag(self) -> None:
-        """Envelope.output_format must match --output-format flag."""
-        result = _run(['list-sessions', '--output-format', 'json'])
-        envelope = json.loads(result.stdout)
-        assert envelope['output_format'] == 'json', (
-            f'output_format mismatch: flag=json, envelope={envelope["output_format"]}'
-        )
-
-    def test_envelope_timestamp_is_recent(self) -> None:
-        """Envelope.timestamp must be recent (generated at call time)."""
-        result = _run(['list-sessions', '--output-format', 'json'])
-        envelope = json.loads(result.stdout)
-        ts_str = envelope.get('timestamp')
-        assert ts_str, 'no timestamp field'
-        
-        ts = datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
-        now = datetime.now(timezone.utc)
-        delta = abs((now - ts).total_seconds())
-        
-        assert delta < 5, f'timestamp off by {delta}s (should be <5s)'
-
-    def test_envelope_exit_code_matches_process_exit(self) -> None:
-        """Cycle #26/#181: envelope.exit_code == process exit code.
-        
-        This is a critical invariant. Claws that trust the envelope
-        field must get the truth, not a lie.
-        """
-        cases = [
-            (['show-command', 'nonexistent', '--output-format', 'json'], 1),
-            (['show-tool', 'nonexistent', '--output-format', 'json'], 1),
-            (['list-sessions', '--output-format', 'json'], 0),
-            (['delete-session', 'any-id', '--output-format', 'json'], 0),
-        ]
-        failures = []
-        for args, expected_exit in cases:
-            result = _run(args)
-            if result.returncode != expected_exit:
-                failures.append(
-                    f'{args[0]}: process exit {result.returncode}, '
-                    f'expected {expected_exit}'
-                )
-                continue
-            
-            envelope = json.loads(result.stdout)
-            if envelope['exit_code'] != result.returncode:
-                failures.append(
-                    f'{args[0]}: process exit {result.returncode}, '
-                    f'envelope.exit_code {envelope["exit_code"]}'
-                )
-        
-        assert not failures, (
-            'Envelope.exit_code must match process exit:\n' +
-            '\n'.join(failures)
-        )
-
-    def test_envelope_boolean_fields_match_error_presence(self) -> None:
-        """found/handled/deleted fields must correlate with error block.
-        
-        - If field is True, no error block should exist
-        - If field is False + operational error, error block must exist
-        - If field is False + idempotent (delete nonexistent), no error block
-        """
-        cases = [
-            # (args, bool_field, expected_value, expect_error_block)
-            (['show-command', 'nonexistent', '--output-format', 'json'],
-             'found', False, True),
-            (['exec-command', 'unknown', 'test', '--output-format', 'json'],
-             'handled', False, True),
-            (['delete-session', 'any-id', '--output-format', 'json'],
-             'deleted', False, False),  # idempotent, no error
-        ]
-        failures = []
-        for args, field, expected_val, expect_error in cases:
-            result = _run(args)
-            envelope = json.loads(result.stdout)
-            
-            actual_val = envelope.get(field)
-            has_error = 'error' in envelope
-            
-            if actual_val != expected_val:
-                failures.append(
-                    f'{args[0]}: {field}={actual_val}, expected {expected_val}'
-                )
-            if expect_error and not has_error:
-                failures.append(
-                    f'{args[0]}: expected error block, but none present'
-                )
-            elif not expect_error and has_error:
-                failures.append(
-                    f'{args[0]}: unexpected error block present'
-                )
-        
-        assert not failures, (
-            'Boolean fields must correlate with error block:\n' +
-            '\n'.join(failures)
-        )
-
-
-class TestTextVsJsonModeDivergence:
-    """Cycle #29: Document known text-mode vs JSON-mode exit code divergence.
-    
-    ERROR_HANDLING.md specifies the exit code contract applies ONLY when
-    --output-format json is set. Text mode follows argparse defaults (e.g.,
-    exit 2 for parse errors) while JSON mode normalizes to the contract
-    (exit 1 for parse errors).
-    
-    This test class LOCKS the expected divergence so:
-    1. Documentation stays aligned with implementation
-    2. Future changes to text mode behavior are caught as intentional
-    3. Claws consuming subprocess output can trust the docs
-    """
-
-    def test_unknown_command_text_mode_exits_2(self) -> None:
-        """Text mode: argparse default exit 2 for unknown subcommand."""
-        result = _run(['nonexistent-cmd'])
-        assert result.returncode == 2, (
-            f'text mode should exit 2 (argparse default), got {result.returncode}'
-        )
-
-    def test_unknown_command_json_mode_exits_1(self) -> None:
-        """JSON mode: normalized exit 1 for parse error (#178)."""
-        result = _run(['nonexistent-cmd', '--output-format', 'json'])
-        assert result.returncode == 1, (
-            f'JSON mode should exit 1 (protocol contract), got {result.returncode}'
-        )
-        envelope = json.loads(result.stdout)
-        assert envelope['error']['kind'] == 'parse'
-
-    def test_missing_required_arg_text_mode_exits_2(self) -> None:
-        """Text mode: argparse default exit 2 for missing required arg."""
-        result = _run(['exec-command'])  # missing name + prompt
-        assert result.returncode == 2, (
-            f'text mode should exit 2, got {result.returncode}'
-        )
-
-    def test_missing_required_arg_json_mode_exits_1(self) -> None:
-        """JSON mode: normalized exit 1 for parse error."""
-        result = _run(['exec-command', '--output-format', 'json'])
-        assert result.returncode == 1, (
-            f'JSON mode should exit 1, got {result.returncode}'
-        )
-
-    def test_success_path_identical_in_both_modes(self) -> None:
-        """Success exit codes are identical in both modes."""
-        text_result = _run(['list-sessions'])
-        json_result = _run(['list-sessions', '--output-format', 'json'])
-        assert text_result.returncode == json_result.returncode == 0, (
-            f'success exit should be 0 in both modes: '
-            f'text={text_result.returncode}, json={json_result.returncode}'
-        )
--- a/tests/test_exec_route_bootstrap_output_format.py
+++ b/tests/test_exec_route_bootstrap_output_format.py
@@ -1,306 +0,0 @@
-"""Tests for --output-format on exec-command/exec-tool/route/bootstrap (ROADMAP #168).
-
-Closes the final JSON-parity gap across the CLI family. After #160/#165/
-#166/#167, the session-lifecycle and inspect CLI commands all spoke JSON;
-this batch extends that contract to the exec, route, and bootstrap
-surfaces — the commands claws actually invoke to DO work, not just inspect
-state.
-
-Verifies:
- exec-command / exec-tool: JSON envelope with handled + source_hint on
-  success; {name, handled:false, error:{kind,message,retryable}} on
-  not-found
- route: JSON envelope with match_count + matches list
- bootstrap: JSON envelope with setup, routed_matches, turn, messages,
-  persisted_session_path
- All 4 preserve legacy text mode byte-identically
- Exit codes unchanged (0 success, 1 exec-not-found)
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-
-def _run(args: list[str]) -> subprocess.CompletedProcess:
-    return subprocess.run(
-        [sys.executable, '-m', 'src.main', *args],
-        cwd=Path(__file__).resolve().parent.parent,
-        capture_output=True,
-        text=True,
-    )
-
-
-class TestExecCommandOutputFormat:
-    def test_exec_command_found_json(self) -> None:
-        result = _run(['exec-command', 'add-dir', 'hello', '--output-format', 'json'])
-        assert result.returncode == 0, result.stderr
-
-        envelope = json.loads(result.stdout)
-        assert envelope['handled'] is True
-        assert envelope['name'] == 'add-dir'
-        assert envelope['prompt'] == 'hello'
-        assert 'source_hint' in envelope
-        assert 'message' in envelope
-        assert 'error' not in envelope
-
-    def test_exec_command_not_found_json(self) -> None:
-        result = _run(['exec-command', 'nonexistent-cmd', 'hi', '--output-format', 'json'])
-        assert result.returncode == 1
-
-        envelope = json.loads(result.stdout)
-        assert envelope['handled'] is False
-        assert envelope['name'] == 'nonexistent-cmd'
-        assert envelope['prompt'] == 'hi'
-        assert envelope['error']['kind'] == 'command_not_found'
-        assert envelope['error']['retryable'] is False
-        assert 'source_hint' not in envelope
-
-    def test_exec_command_text_backward_compat(self) -> None:
-        result = _run(['exec-command', 'add-dir', 'hello'])
-        assert result.returncode == 0
-        # Single line prose (unchanged from pre-#168)
-        assert result.stdout.count('\n') == 1
-        assert 'add-dir' in result.stdout
-
-
-class TestExecToolOutputFormat:
-    def test_exec_tool_found_json(self) -> None:
-        result = _run(['exec-tool', 'BashTool', '{"cmd":"ls"}', '--output-format', 'json'])
-        assert result.returncode == 0, result.stderr
-
-        envelope = json.loads(result.stdout)
-        assert envelope['handled'] is True
-        assert envelope['name'] == 'BashTool'
-        assert envelope['payload'] == '{"cmd":"ls"}'
-        assert 'source_hint' in envelope
-        assert 'error' not in envelope
-
-    def test_exec_tool_not_found_json(self) -> None:
-        result = _run(['exec-tool', 'NotATool', '{}', '--output-format', 'json'])
-        assert result.returncode == 1
-
-        envelope = json.loads(result.stdout)
-        assert envelope['handled'] is False
-        assert envelope['name'] == 'NotATool'
-        assert envelope['error']['kind'] == 'tool_not_found'
-        assert envelope['error']['retryable'] is False
-
-    def test_exec_tool_text_backward_compat(self) -> None:
-        result = _run(['exec-tool', 'BashTool', '{}'])
-        assert result.returncode == 0
-        assert result.stdout.count('\n') == 1
-
-
-class TestRouteOutputFormat:
-    def test_route_json_envelope(self) -> None:
-        result = _run(['route', 'review mcp', '--limit', '3', '--output-format', 'json'])
-        assert result.returncode == 0
-
-        envelope = json.loads(result.stdout)
-        assert envelope['prompt'] == 'review mcp'
-        assert envelope['limit'] == 3
-        assert 'match_count' in envelope
-        assert 'matches' in envelope
-        assert envelope['match_count'] == len(envelope['matches'])
-        # Every match has required keys
-        for m in envelope['matches']:
-            assert set(m.keys()) == {'kind', 'name', 'score', 'source_hint'}
-            assert m['kind'] in ('command', 'tool')
-
-    def test_route_json_no_matches(self) -> None:
-        # Very unusual string should yield zero matches
-        result = _run(['route', 'zzzzzzzzzqqqqq', '--output-format', 'json'])
-        assert result.returncode == 0
-
-        envelope = json.loads(result.stdout)
-        assert envelope['match_count'] == 0
-        assert envelope['matches'] == []
-
-    def test_route_text_backward_compat(self) -> None:
-        """Text mode tab-separated output unchanged from pre-#168."""
-        result = _run(['route', 'review mcp', '--limit', '2'])
-        assert result.returncode == 0
-        # Each non-empty line has exactly 3 tabs (kind\tname\tscore\tsource_hint)
-        for line in result.stdout.strip().split('\n'):
-            if line:
-                assert line.count('\t') == 3
-
-
-class TestBootstrapOutputFormat:
-    def test_bootstrap_json_envelope(self) -> None:
-        result = _run(['bootstrap', 'review MCP', '--limit', '2', '--output-format', 'json'])
-        assert result.returncode == 0, result.stderr
-
-        envelope = json.loads(result.stdout)
-        # Required top-level keys
-        required = {
-            'prompt', 'limit', 'setup', 'routed_matches',
-            'command_execution_messages', 'tool_execution_messages',
-            'turn', 'persisted_session_path',
-        }
-        assert required.issubset(envelope.keys())
-        # Setup sub-envelope
-        assert 'python_version' in envelope['setup']
-        assert 'platform_name' in envelope['setup']
-        # Turn sub-envelope
-        assert 'stop_reason' in envelope['turn']
-        assert 'prompt' in envelope['turn']
-
-    def test_bootstrap_text_is_markdown(self) -> None:
-        """Text mode produces Markdown (unchanged from pre-#168)."""
-        result = _run(['bootstrap', 'hello', '--limit', '2'])
-        assert result.returncode == 0
-        # Markdown headers
-        assert '# Runtime Session' in result.stdout
-        assert '## Setup' in result.stdout
-        assert '## Routed Matches' in result.stdout
-
-
-class TestFamilyWideJsonParity:
-    """After #167 and #168, ALL inspect/exec/route/lifecycle commands
-    support --output-format. Verify the full family is now parity-complete."""
-
-    FAMILY_SURFACES = [
-        # (cmd_args, expected_to_parse_json)
-        (['show-command', 'add-dir'], True),
-        (['show-tool', 'BashTool'], True),
-        (['exec-command', 'add-dir', 'hi'], True),
-        (['exec-tool', 'BashTool', '{}'], True),
-        (['route', 'review'], True),
-        (['bootstrap', 'hello'], True),
-    ]
-
-    def test_all_family_commands_accept_output_format_json(self) -> None:
-        """Every family command accepts --output-format json and emits parseable JSON."""
-        failures = []
-        for args_base, should_parse in self.FAMILY_SURFACES:
-            result = _run([*args_base, '--output-format', 'json'])
-            if result.returncode not in (0, 1):
-                failures.append(f'{args_base}: exit {result.returncode} — {result.stderr}')
-                continue
-            try:
-                json.loads(result.stdout)
-            except json.JSONDecodeError as e:
-                failures.append(f'{args_base}: not parseable JSON ({e}): {result.stdout[:100]}')
-        assert not failures, (
-            'CLI family JSON parity gap:\n' + '\n'.join(failures)
-        )
-
-    def test_all_family_commands_text_mode_unchanged(self) -> None:
-        """Omitting --output-format defaults to text for every family command."""
-        # Sanity: just verify each runs without error in text mode
-        for args_base, _ in self.FAMILY_SURFACES:
-            result = _run(args_base)
-            assert result.returncode in (0, 1), (
-                f'{args_base} failed in text mode: {result.stderr}'
-            )
-            # Output should not be JSON-shaped (no leading {)
-            assert not result.stdout.strip().startswith('{')
-
-
-class TestEnvelopeExitCodeMatchesProcessExit:
-    """#181: Envelope exit_code field must match actual process exit code.
-    
-    Regression test for the protocol violation where exec-command/exec-tool
-    not-found cases returned exit code 1 from the process but emitted
-    envelopes with exit_code: 0 (default wrap_json_envelope). Claws reading
-    the envelope would misclassify failures as successes.
-    
-    Contract (from ERROR_HANDLING.md):
-    - Exit code 0 = success
-    - Exit code 1 = error/not-found
-    - Envelope MUST reflect process exit
-    """
-
-    def test_exec_command_not_found_envelope_exit_matches(self) -> None:
-        """exec-command 'unknown-name' must have exit_code=1 in envelope."""
-        result = _run(['exec-command', 'nonexistent-cmd-name', 'test-prompt', '--output-format', 'json'])
-        assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
-        envelope = json.loads(result.stdout)
-        assert envelope['exit_code'] == 1, (
-            f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
-        )
-        assert envelope['handled'] is False
-        assert envelope['error']['kind'] == 'command_not_found'
-
-    def test_exec_tool_not_found_envelope_exit_matches(self) -> None:
-        """exec-tool 'unknown-tool' must have exit_code=1 in envelope."""
-        result = _run(['exec-tool', 'nonexistent-tool-name', '{}', '--output-format', 'json'])
-        assert result.returncode == 1, f'process exit should be 1, got {result.returncode}'
-        envelope = json.loads(result.stdout)
-        assert envelope['exit_code'] == 1, (
-            f'envelope.exit_code mismatch: process=1, envelope={envelope["exit_code"]}'
-        )
-        assert envelope['handled'] is False
-        assert envelope['error']['kind'] == 'tool_not_found'
-
-    def test_all_commands_exit_code_invariant(self) -> None:
-        """Audit: for every clawable command, envelope.exit_code == process exit.
-        
-        This is a stronger invariant than 'emits JSON'. Claws dispatching on
-        the envelope's exit_code field must get the truth, not a lie.
-        """
-        # Sample cases known to return non-zero
-        cases = [
-            # command, expected_exit, justification
-            (['show-command', 'nonexistent-abc'], 1, 'not-found inventory lookup'),
-            (['show-tool', 'nonexistent-xyz'], 1, 'not-found inventory lookup'),
-            (['exec-command', 'nonexistent-1', 'test'], 1, 'not-found execution'),
-            (['exec-tool', 'nonexistent-2', '{}'], 1, 'not-found execution'),
-        ]
-        mismatches = []
-        for args, expected_exit, reason in cases:
-            result = _run([*args, '--output-format', 'json'])
-            if result.returncode != expected_exit:
-                mismatches.append(
-                    f'{args}: expected process exit {expected_exit} ({reason}), '
-                    f'got {result.returncode}'
-                )
-                continue
-            try:
-                envelope = json.loads(result.stdout)
-            except json.JSONDecodeError as e:
-                mismatches.append(f'{args}: JSON parse failed: {e}')
-                continue
-            if envelope.get('exit_code') != result.returncode:
-                mismatches.append(
-                    f'{args}: envelope.exit_code={envelope.get("exit_code")} '
-                    f'!= process exit={result.returncode} ({reason})'
-                )
-        assert not mismatches, (
-            'Envelope exit_code must match process exit code:\n' + 
-            '\n'.join(mismatches)
-        )
-
-
-class TestMetadataFlags:
-    """Cycle #28: --version flag implementation (#180 gap closure)."""
-
-    def test_version_flag_returns_version_text(self) -> None:
-        """--version returns version string and exits successfully."""
-        result = _run(['--version'])
-        assert result.returncode == 0
-        assert 'claw-code' in result.stdout
-        assert '1.0.0' in result.stdout
-
-    def test_help_flag_returns_help_text(self) -> None:
-        """--help returns help text and exits successfully."""
-        result = _run(['--help'])
-        assert result.returncode == 0
-        assert 'usage:' in result.stdout
-        assert 'Python porting workspace' in result.stdout
-
-    def test_help_still_works_after_version_added(self) -> None:
-        """Verify -h and --help both work (no regression)."""
-        result_short = _run(['-h'])
-        result_long = _run(['--help'])
-        assert result_short.returncode == 0
-        assert result_long.returncode == 0
-        assert 'usage:' in result_short.stdout
-        assert 'usage:' in result_long.stdout
--- a/tests/test_flush_transcript_cli.py
+++ b/tests/test_flush_transcript_cli.py
@@ -1,206 +0,0 @@
-"""Tests for flush-transcript CLI parity with the #160/#165 lifecycle triplet (ROADMAP #166).
-
-Verifies that session *creation* now accepts the same flag family as session
-management (list/delete/load):
- --directory DIR (alternate storage location)
- --output-format {text,json} (structured output)
- --session-id ID (deterministic IDs for claw checkpointing)
-
-Also verifies backward compat: default text output unchanged byte-for-byte.
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-
-_REPO_ROOT = Path(__file__).resolve().parent.parent
-
-
-def _run_cli(*args: str) -> subprocess.CompletedProcess[str]:
-    return subprocess.run(
-        [sys.executable, '-m', 'src.main', *args],
-        capture_output=True, text=True, cwd=str(_REPO_ROOT),
-    )
-
-
-class TestDirectoryFlag:
-    def test_flush_transcript_writes_to_custom_directory(self, tmp_path: Path) -> None:
-        result = _run_cli(
-            'flush-transcript', 'hello world',
-            '--directory', str(tmp_path),
-        )
-        assert result.returncode == 0, result.stderr
-        # Exactly one session file should exist in the directory
-        files = list(tmp_path.glob('*.json'))
-        assert len(files) == 1
-        # And the legacy text output points to that file
-        assert str(files[0]) in result.stdout
-
-
-class TestSessionIdFlag:
-    def test_explicit_session_id_is_respected(self, tmp_path: Path) -> None:
-        result = _run_cli(
-            'flush-transcript', 'hello',
-            '--directory', str(tmp_path),
-            '--session-id', 'deterministic-id-42',
-        )
-        assert result.returncode == 0, result.stderr
-        expected_path = tmp_path / 'deterministic-id-42.json'
-        assert expected_path.exists(), (
-            f'session file not created at deterministic path: {expected_path}'
-        )
-        # And it should contain the ID we asked for
-        data = json.loads(expected_path.read_text())
-        assert data['session_id'] == 'deterministic-id-42'
-
-    def test_auto_session_id_when_flag_omitted(self, tmp_path: Path) -> None:
-        """Without --session-id, engine still auto-generates a UUID (backward compat)."""
-        result = _run_cli(
-            'flush-transcript', 'hello',
-            '--directory', str(tmp_path),
-        )
-        assert result.returncode == 0
-        files = list(tmp_path.glob('*.json'))
-        assert len(files) == 1
-        # The filename (minus .json) should be a 32-char hex UUID
-        stem = files[0].stem
-        assert len(stem) == 32
-        assert all(c in '0123456789abcdef' for c in stem)
-
-
-class TestOutputFormatFlag:
-    def test_json_mode_emits_structured_envelope(self, tmp_path: Path) -> None:
-        result = _run_cli(
-            'flush-transcript', 'hello',
-            '--directory', str(tmp_path),
-            '--session-id', 'beta',
-            '--output-format', 'json',
-        )
-        assert result.returncode == 0
-        data = json.loads(result.stdout)
-        assert data['session_id'] == 'beta'
-        assert data['flushed'] is True
-        assert data['path'].endswith('beta.json')
-        # messages_count and token counts should be present and typed
-        assert isinstance(data['messages_count'], int)
-        assert isinstance(data['input_tokens'], int)
-        assert isinstance(data['output_tokens'], int)
-
-    def test_text_mode_byte_identical_to_pre_166_output(self, tmp_path: Path) -> None:
-        """Legacy text output must not change — claws may be parsing it."""
-        result = _run_cli(
-            'flush-transcript', 'hello',
-            '--directory', str(tmp_path),
-        )
-        assert result.returncode == 0
-        lines = result.stdout.strip().split('\n')
-        # Line 1: path ending in .json
-        assert lines[0].endswith('.json')
-        # Line 2: exact legacy format
-        assert lines[1] == 'flushed=True'
-
-
-class TestBackwardCompat:
-    def test_no_flags_default_behaviour(self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None:
-        """Running with no flags still works (default dir, text mode, auto UUID)."""
-        import os
-        env = os.environ.copy()
-        env['PYTHONPATH'] = str(_REPO_ROOT)
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'flush-transcript', 'hello'],
-            capture_output=True, text=True, cwd=str(tmp_path), env=env,
-        )
-        assert result.returncode == 0, result.stderr
-        # Default dir is `.port_sessions` in CWD
-        sessions_dir = tmp_path / '.port_sessions'
-        assert sessions_dir.exists()
-        assert len(list(sessions_dir.glob('*.json'))) == 1
-
-
-class TestLifecycleIntegration:
-    """#166's real value: the triplet + creation command are now a coherent family."""
-
-    def test_create_then_list_then_load_then_delete_roundtrip(
-        self, tmp_path: Path,
-    ) -> None:
-        """End-to-end: flush → list → load → delete, all via the same --directory."""
-        # 1. Create
-        create_result = _run_cli(
-            'flush-transcript', 'roundtrip test',
-            '--directory', str(tmp_path),
-            '--session-id', 'rt-session',
-            '--output-format', 'json',
-        )
-        assert create_result.returncode == 0
-        assert json.loads(create_result.stdout)['session_id'] == 'rt-session'
-
-        # 2. List
-        list_result = _run_cli(
-            'list-sessions',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert list_result.returncode == 0
-        list_data = json.loads(list_result.stdout)
-        assert 'rt-session' in list_data['sessions']
-
-        # 3. Load
-        load_result = _run_cli(
-            'load-session', 'rt-session',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert load_result.returncode == 0
-        assert json.loads(load_result.stdout)['loaded'] is True
-
-        # 4. Delete
-        delete_result = _run_cli(
-            'delete-session', 'rt-session',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert delete_result.returncode == 0
-
-        # 5. Verify gone
-        verify_result = _run_cli(
-            'load-session', 'rt-session',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert verify_result.returncode == 1
-        assert json.loads(verify_result.stdout)['error']['kind'] == 'session_not_found'
-
-
-class TestFullFamilyParity:
-    """All four session-lifecycle CLI commands accept the same core flag pair.
-
-    This is the #166 acceptance test: flush-transcript joins the family.
-    """
-
-    @pytest.mark.parametrize(
-        'command',
-        ['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
-    )
-    def test_all_four_accept_directory_flag(self, command: str) -> None:
-        help_text = _run_cli(command, '--help').stdout
-        assert '--directory' in help_text, (
-            f'{command} missing --directory flag (#166 parity gap)'
-        )
-
-    @pytest.mark.parametrize(
-        'command',
-        ['list-sessions', 'delete-session', 'load-session', 'flush-transcript'],
-    )
-    def test_all_four_accept_output_format_flag(self, command: str) -> None:
-        help_text = _run_cli(command, '--help').stdout
-        assert '--output-format' in help_text, (
-            f'{command} missing --output-format flag (#166 parity gap)'
-        )
--- a/tests/test_json_envelope_field_consistency.py
+++ b/tests/test_json_envelope_field_consistency.py
@@ -1,213 +0,0 @@
-"""JSON envelope field consistency validation (ROADMAP #173 prep).
-
-This test suite validates that clawable-surface commands' JSON output
-follows the contract defined in SCHEMAS.md. Currently, commands emit
-command-specific envelopes without the canonical common fields
-(timestamp, command, exit_code, output_format, schema_version).
-
-This test documents the current gap and validates the consistency
-of what IS there, providing a baseline for #173 (common field wrapping).
-
-Phase 1 (this test): Validate consistency within each command's envelope.
-Phase 2 (future #173): Wrap all 13 commands with canonical common fields.
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-from typing import Any
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.main import build_parser  # noqa: E402
-
-
-# Expected fields for each clawable command's JSON envelope.
-# These are the command-specific fields (not including common fields yet).
-# Entries are (command_name, required_fields, optional_fields).
-ENVELOPE_CONTRACTS = {
-    'list-sessions': (
-        {'count', 'sessions'},
-        set(),
-    ),
-    'delete-session': (
-        {'session_id', 'deleted', 'directory'},
-        set(),
-    ),
-    'load-session': (
-        {'session_id', 'loaded', 'directory', 'path'},
-        set(),
-    ),
-    'flush-transcript': (
-        {'session_id', 'path', 'flushed', 'messages_count', 'input_tokens', 'output_tokens'},
-        set(),
-    ),
-    'show-command': (
-        {'name', 'found', 'source_hint', 'responsibility'},
-        set(),
-    ),
-    'show-tool': (
-        {'name', 'found', 'source_hint'},
-        set(),
-    ),
-    'exec-command': (
-        {'name', 'prompt', 'handled', 'message', 'source_hint'},
-        set(),
-    ),
-    'exec-tool': (
-        {'name', 'payload', 'handled', 'message', 'source_hint'},
-        set(),
-    ),
-    'route': (
-        {'prompt', 'limit', 'match_count', 'matches'},
-        set(),
-    ),
-    'bootstrap': (
-        {'prompt', 'setup', 'routed_matches', 'turn', 'persisted_session_path'},
-        set(),
-    ),
-    'command-graph': (
-        {'builtins_count', 'plugin_like_count', 'skill_like_count', 'total_count', 'builtins', 'plugin_like', 'skill_like'},
-        set(),
-    ),
-    'tool-pool': (
-        {'simple_mode', 'include_mcp', 'tool_count', 'tools'},
-        set(),
-    ),
-    'bootstrap-graph': (
-        {'stages', 'note'},
-        set(),
-    ),
-}
-
-
-class TestJsonEnvelopeConsistency:
-    """Validate current command envelopes match their declared contracts.
-
-    This is a consistency check, not a conformance check. Once #173 adds
-    common fields to all commands, these tests will auto-pass the common
-    field assertions and verify command-specific fields stay consistent.
-    """
-
-    @pytest.mark.parametrize('cmd_name,contract', sorted(ENVELOPE_CONTRACTS.items()))
-    def test_command_json_fields_present(self, cmd_name: str, contract: tuple[set[str], set[str]]) -> None:
-        required, optional = contract
-        """Command's JSON envelope must include all required fields."""
-        # Get minimal invocation args for this command
-        test_invocations = {
-            'list-sessions': [],
-            'show-command': ['add-dir'],
-            'show-tool': ['BashTool'],
-            'exec-command': ['add-dir', 'hi'],
-            'exec-tool': ['BashTool', '{}'],
-            'route': ['review'],
-            'bootstrap': ['hello'],
-            'command-graph': [],
-            'tool-pool': [],
-            'bootstrap-graph': [],
-        }
-        
-        if cmd_name not in test_invocations:
-            pytest.skip(f'{cmd_name} requires session setup; skipped')
-        
-        cmd_args = test_invocations[cmd_name]
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        
-        if result.returncode not in (0, 1):
-            pytest.fail(f'{cmd_name}: unexpected exit {result.returncode}\nstderr: {result.stderr}')
-        
-        try:
-            envelope = json.loads(result.stdout)
-        except json.JSONDecodeError as e:
-            pytest.fail(f'{cmd_name}: invalid JSON: {e}\nOutput: {result.stdout[:200]}')
-        
-        # Check required fields (command-specific)
-        missing = required - set(envelope.keys())
-        if missing:
-            pytest.fail(
-                f'{cmd_name} envelope missing required fields: {missing}\n'
-                f'Expected: {required}\nGot: {set(envelope.keys())}'
-            )
-        
-        # Check that extra fields are accounted for (warn if unknown)
-        known = required | optional
-        extra = set(envelope.keys()) - known
-        if extra:
-            # Warn but don't fail — there may be new fields added
-            pytest.warns(UserWarning, match=f'extra fields in {cmd_name}: {extra}')
-
-    def test_envelope_field_value_types(self) -> None:
-        """Smoke test: envelope fields have expected types (bool, int, str, list, dict, null)."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'list-sessions', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        
-        envelope = json.loads(result.stdout)
-        
-        # Spot check a few fields
-        assert isinstance(envelope.get('count'), int), 'count should be int'
-        assert isinstance(envelope.get('sessions'), list), 'sessions should be list'
-
-
-class TestJsonEnvelopeCommonFieldPrep:
-    """Validation stubs for common fields (part of #173 implementation).
-
-    These tests will activate once wrap_json_envelope() is applied to all
-    13 clawable commands. Currently they document the expected contract.
-    """
-
-    def test_all_envelopes_include_timestamp(self) -> None:
-        """Every clawable envelope must include ISO 8601 UTC timestamp."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'command-graph', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        assert 'timestamp' in envelope, 'Missing timestamp field'
-        # Verify ISO 8601 format (ends with Z for UTC)
-        assert envelope['timestamp'].endswith('Z'), f'Timestamp not UTC: {envelope["timestamp"]}'
-
-    def test_all_envelopes_include_command(self) -> None:
-        """Every envelope must echo the command name."""
-        test_cases = [
-            ('list-sessions', []),
-            ('command-graph', []),
-            ('bootstrap', ['hello']),
-        ]
-        for cmd_name, cmd_args in test_cases:
-            result = subprocess.run(
-                [sys.executable, '-m', 'src.main', cmd_name, *cmd_args, '--output-format', 'json'],
-                cwd=Path(__file__).resolve().parent.parent,
-                capture_output=True,
-                text=True,
-            )
-            envelope = json.loads(result.stdout)
-            assert envelope.get('command') == cmd_name, f'{cmd_name} envelope.command mismatch'
-
-    def test_all_envelopes_include_exit_code_and_schema_version(self) -> None:
-        """Every envelope must include exit_code and schema_version."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'tool-pool', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        assert 'exit_code' in envelope, 'Missing exit_code'
-        assert 'schema_version' in envelope, 'Missing schema_version'
-        assert envelope['schema_version'] == '1.0', 'Wrong schema_version'
--- a/tests/test_load_session_cli.py
+++ b/tests/test_load_session_cli.py
@@ -1,183 +0,0 @@
-"""Tests for load-session CLI parity with list-sessions/delete-session (ROADMAP #165).
-
-Verifies the session-lifecycle CLI triplet is now symmetric:
- --directory DIR accepted (alternate storage locations reachable)
- --output-format {text,json} accepted
- Not-found emits typed JSON error envelope, never a Python traceback
- Corrupted session file distinguished from not-found via 'kind'
- Legacy text-mode output unchanged (backward compat)
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.session_store import StoredSession, save_session  # noqa: E402
-
-
-_REPO_ROOT = Path(__file__).resolve().parent.parent
-
-
-def _run_cli(
-    *args: str, cwd: Path | None = None,
-) -> subprocess.CompletedProcess[str]:
-    """Always invoke the CLI with cwd=repo-root so ``python -m src.main``
-    can resolve the ``src`` package, regardless of where the test's
-    tmp_path is.
-    """
-    return subprocess.run(
-        [sys.executable, '-m', 'src.main', *args],
-        capture_output=True,
-        text=True,
-        cwd=str(cwd) if cwd else str(_REPO_ROOT),
-    )
-
-
-def _make_session(session_id: str) -> StoredSession:
-    return StoredSession(
-        session_id=session_id, messages=('hi',), input_tokens=1, output_tokens=2,
-    )
-
-
-class TestDirectoryFlagParity:
-    def test_load_session_accepts_directory_flag(self, tmp_path: Path) -> None:
-        save_session(_make_session('alpha'), tmp_path)
-        result = _run_cli('load-session', 'alpha', '--directory', str(tmp_path))
-        assert result.returncode == 0, result.stderr
-        assert 'alpha' in result.stdout
-
-    def test_load_session_without_directory_uses_cwd_default(
-        self, tmp_path: Path,
-    ) -> None:
-        """When --directory is omitted, fall back to .port_sessions in CWD.
-
-        Subprocess CWD must still be able to import ``src.main``, so we use
-        ``cwd=tmp_path`` which means ``python -m src.main`` needs ``src/`` on
-        sys.path. We set PYTHONPATH to the repo root via env.
-        """
-        sessions_dir = tmp_path / '.port_sessions'
-        sessions_dir.mkdir()
-        save_session(_make_session('beta'), sessions_dir)
-        import os
-        env = os.environ.copy()
-        env['PYTHONPATH'] = str(_REPO_ROOT)
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'load-session', 'beta'],
-            capture_output=True, text=True, cwd=str(tmp_path), env=env,
-        )
-        assert result.returncode == 0, result.stderr
-        assert 'beta' in result.stdout
-
-
-class TestOutputFormatFlagParity:
-    def test_json_mode_on_success(self, tmp_path: Path) -> None:
-        save_session(
-            StoredSession(
-                session_id='gamma', messages=('x', 'y'),
-                input_tokens=5, output_tokens=7,
-            ),
-            tmp_path,
-        )
-        result = _run_cli(
-            'load-session', 'gamma',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert result.returncode == 0
-        data = json.loads(result.stdout)
-        # Verify common envelope fields (SCHEMAS.md contract)
-        assert 'timestamp' in data
-        assert data['command'] == 'load-session'
-        assert data['exit_code'] == 0
-        assert data['schema_version'] == '1.0'
-        # Verify command-specific fields
-        assert data['session_id'] == 'gamma'
-        assert data['loaded'] is True
-        assert data['messages_count'] == 2
-        assert data['input_tokens'] == 5
-        assert data['output_tokens'] == 7
-
-    def test_text_mode_unchanged_on_success(self, tmp_path: Path) -> None:
-        """Legacy text output must be byte-identical for backward compat."""
-        save_session(_make_session('delta'), tmp_path)
-        result = _run_cli('load-session', 'delta', '--directory', str(tmp_path))
-        assert result.returncode == 0
-        lines = result.stdout.strip().split('\n')
-        assert lines == ['delta', '1 messages', 'in=1 out=2']
-
-
-class TestNotFoundTypedError:
-    def test_not_found_json_envelope(self, tmp_path: Path) -> None:
-        """Not-found emits structured JSON, never a Python traceback."""
-        result = _run_cli(
-            'load-session', 'missing',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert result.returncode == 1
-        assert 'Traceback' not in result.stderr, (
-            'regression #165: raw traceback leaked to stderr'
-        )
-        assert 'SessionNotFoundError' not in result.stdout, (
-            'regression #165: internal class name leaked into CLI output'
-        )
-        data = json.loads(result.stdout)
-        assert data['session_id'] == 'missing'
-        assert data['loaded'] is False
-        assert data['error']['kind'] == 'session_not_found'
-        assert data['error']['retryable'] is False
-        # directory field is populated so claws know where we looked
-        assert 'directory' in data['error']
-
-    def test_not_found_text_mode_no_traceback(self, tmp_path: Path) -> None:
-        """Text mode on not-found must not dump a Python stack either."""
-        result = _run_cli(
-            'load-session', 'missing', '--directory', str(tmp_path),
-        )
-        assert result.returncode == 1
-        assert 'Traceback' not in result.stderr
-        assert result.stdout.startswith('error:')
-
-
-class TestLoadFailedDistinctFromNotFound:
-    def test_corrupted_session_file_surfaces_distinct_kind(
-        self, tmp_path: Path,
-    ) -> None:
-        """A corrupted JSON file must emit kind='session_load_failed', not 'session_not_found'."""
-        (tmp_path / 'broken.json').write_text('{ not valid json')
-        result = _run_cli(
-            'load-session', 'broken',
-            '--directory', str(tmp_path),
-            '--output-format', 'json',
-        )
-        assert result.returncode == 1
-        data = json.loads(result.stdout)
-        assert data['error']['kind'] == 'session_load_failed'
-        assert data['error']['retryable'] is True, (
-            'corrupted file is potentially retryable (fs glitch) unlike not-found'
-        )
-
-
-class TestTripletParityConsistency:
-    """All three #160 CLI commands should accept the same flag pair."""
-
-    @pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
-    def test_all_three_accept_directory_flag(self, command: str) -> None:
-        help_text = _run_cli(command, '--help').stdout
-        assert '--directory' in help_text, (
-            f'{command} missing --directory flag (#165 parity gap)'
-        )
-
-    @pytest.mark.parametrize('command', ['list-sessions', 'delete-session', 'load-session'])
-    def test_all_three_accept_output_format_flag(self, command: str) -> None:
-        help_text = _run_cli(command, '--help').stdout
-        assert '--output-format' in help_text, (
-            f'{command} missing --output-format flag (#165 parity gap)'
-        )
--- a/tests/test_parse_error_envelope.py
+++ b/tests/test_parse_error_envelope.py
@@ -1,239 +0,0 @@
-"""#178 — argparse-level errors emit JSON envelope when --output-format json is requested.
-
-Before #178:
-  $ claw nonexistent --output-format json
-  usage: main.py [-h] {summary,manifest,...} ...
-  main.py: error: argument command: invalid choice: 'nonexistent' (choose from ...)
-  [exit 2, argparse dumps help to stderr, no JSON envelope]
-
-After #178:
-  $ claw nonexistent --output-format json
-  {"timestamp": "...", "command": "nonexistent", "exit_code": 1, ...,
-   "error": {"kind": "parse", "operation": "argparse", ...}}
-  [exit 1, JSON envelope on stdout, matches SCHEMAS.md contract]
-
-Contract:
- text mode: unchanged (argparse still dumps help to stderr, exit code 2)
- JSON mode: envelope matches SCHEMAS.md 'error' shape, exit code 1
- Parse errors use error.kind='parse' (distinct from runtime/session/etc.)
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-import pytest
-
-CLI = [sys.executable, '-m', 'src.main']
-REPO_ROOT = Path(__file__).resolve().parent.parent
-
-
-class TestParseErrorJsonEnvelope:
-    """Argparse errors emit JSON envelope when --output-format json is requested."""
-
-    def test_unknown_command_json_mode_emits_envelope(self) -> None:
-        """Unknown command + --output-format json → parse-error envelope."""
-        result = subprocess.run(
-            CLI + ['nonexistent-command', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 1, f"expected exit 1; got {result.returncode}"
-        envelope = json.loads(result.stdout)
-        # Common fields
-        assert envelope['schema_version'] == '1.0'
-        assert envelope['output_format'] == 'json'
-        assert envelope['exit_code'] == 1
-        # Error envelope shape
-        assert envelope['error']['kind'] == 'parse'
-        assert envelope['error']['operation'] == 'argparse'
-        assert envelope['error']['retryable'] is False
-        assert envelope['error']['target'] == 'nonexistent-command'
-        assert 'hint' in envelope['error']
-
-    def test_unknown_command_json_equals_syntax(self) -> None:
-        """--output-format=json syntax also works."""
-        result = subprocess.run(
-            CLI + ['nonexistent-command', '--output-format=json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 1
-        envelope = json.loads(result.stdout)
-        assert envelope['error']['kind'] == 'parse'
-
-    def test_unknown_command_text_mode_unchanged(self) -> None:
-        """Text mode (default) preserves argparse behavior: help to stderr, exit 2."""
-        result = subprocess.run(
-            CLI + ['nonexistent-command'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 2, f"text mode must preserve argparse exit 2; got {result.returncode}"
-        # stderr should have argparse error (help + error message)
-        assert 'invalid choice' in result.stderr
-        # stdout should be empty (no JSON leaked)
-        assert result.stdout == ''
-
-    def test_invalid_flag_json_mode_emits_envelope(self) -> None:
-        """Invalid flag at top level + --output-format json → envelope."""
-        result = subprocess.run(
-            CLI + ['--invalid-top-level-flag', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        # argparse might reject before --output-format is parsed; still emit envelope
-        assert result.returncode == 1, f"got {result.returncode}: {result.stderr}"
-        envelope = json.loads(result.stdout)
-        assert envelope['error']['kind'] == 'parse'
-
-    def test_missing_command_no_json_flag_behaves_normally(self) -> None:
-        """No --output-format flag + missing command → normal argparse behavior."""
-        result = subprocess.run(
-            CLI,
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        # argparse exits 2 when required subcommand is missing
-        assert result.returncode == 2
-        assert 'required' in result.stderr.lower() or 'the following arguments are required' in result.stderr.lower()
-
-    def test_valid_command_unaffected(self) -> None:
-        """Valid commands still work normally (no regression)."""
-        result = subprocess.run(
-            CLI + ['list-sessions', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0
-        envelope = json.loads(result.stdout)
-        assert envelope['command'] == 'list-sessions'
-        assert 'sessions' in envelope
-
-    def test_parse_error_envelope_contains_common_fields(self) -> None:
-        """Parse-error envelope must include all common fields per SCHEMAS.md."""
-        result = subprocess.run(
-            CLI + ['bogus', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        # All common fields required by SCHEMAS.md
-        for field in ('timestamp', 'command', 'exit_code', 'output_format', 'schema_version'):
-            assert field in envelope, f"common field '{field}' missing from parse-error envelope"
-
-
-class TestParseErrorSchemaCompliance:
-    """Parse-error envelope matches SCHEMAS.md error shape."""
-
-    def test_error_kind_is_parse(self) -> None:
-        """error.kind='parse' distinguishes argparse errors from runtime errors."""
-        result = subprocess.run(
-            CLI + ['unknown', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        assert envelope['error']['kind'] == 'parse'
-
-    def test_error_retryable_false(self) -> None:
-        """Parse errors are never retryable (typo won't magically fix itself)."""
-        result = subprocess.run(
-            CLI + ['unknown', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        assert envelope['error']['retryable'] is False
-
-
-class TestParseErrorStderrHygiene:
-    """#179: JSON mode must fully suppress argparse stderr output.
-
-    Before #179: stderr leaked argparse usage + error text even when --output-format json.
-    After #179: stderr is silent; envelope carries the real error message verbatim.
-    """
-
-    def test_json_mode_stderr_is_silent_on_unknown_command(self) -> None:
-        """Unknown command in JSON mode: stderr empty."""
-        result = subprocess.run(
-            CLI + ['nonexistent-cmd', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.stderr == '', (
-            f"JSON mode stderr must be empty; got:\n{result.stderr!r}"
-        )
-
-    def test_json_mode_stderr_is_silent_on_missing_arg(self) -> None:
-        """Missing required arg in JSON mode: stderr empty (no argparse usage leak)."""
-        result = subprocess.run(
-            CLI + ['load-session', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        assert result.stderr == '', (
-            f"JSON mode stderr must be empty on missing arg; got:\n{result.stderr!r}"
-        )
-
-    def test_json_mode_envelope_carries_real_argparse_message(self) -> None:
-        """#179: envelope.error.message contains argparse's actual text, not generic rejection."""
-        result = subprocess.run(
-            CLI + ['load-session', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        # Real argparse message: 'the following arguments are required: session_id'
-        msg = envelope['error']['message']
-        assert 'session_id' in msg, (
-            f"envelope.error.message must carry real argparse text mentioning missing arg; got: {msg!r}"
-        )
-        assert 'required' in msg.lower(), (
-            f"envelope.error.message must indicate what is required; got: {msg!r}"
-        )
-
-    def test_json_mode_envelope_carries_invalid_choice_details(self) -> None:
-        """#179: unknown command envelope includes valid-choice list from argparse."""
-        result = subprocess.run(
-            CLI + ['typo-command', '--output-format', 'json'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        envelope = json.loads(result.stdout)
-        msg = envelope['error']['message']
-        assert 'invalid choice' in msg.lower(), (
-            f"envelope must mention 'invalid choice'; got: {msg!r}"
-        )
-        # Should include at least one valid command name for discoverability
-        assert 'bootstrap' in msg or 'summary' in msg, (
-            f"envelope must include valid choices for discoverability; got: {msg!r}"
-        )
-
-    def test_text_mode_stderr_preserved_on_unknown_command(self) -> None:
-        """Text mode: argparse stderr behavior unchanged (backward compat)."""
-        result = subprocess.run(
-            CLI + ['nonexistent-cmd'],
-            cwd=REPO_ROOT,
-            capture_output=True,
-            text=True,
-        )
-        # Text mode still dumps argparse help to stderr
-        assert 'invalid choice' in result.stderr
-        assert result.returncode == 2
--- a/tests/test_porting_workspace.py
+++ b/tests/test_porting_workspace.py
@@ -173,105 +173,6 @@ class PortingWorkspaceTests(unittest.TestCase):
        self.assertIn(session_id, result.stdout)
        self.assertIn('messages', result.stdout)

-    def test_list_sessions_cli_runs(self) -> None:
-        """#160: list-sessions CLI enumerates stored sessions in text + json."""
-        import json
-        import tempfile
-        from src.session_store import StoredSession, save_session
-
-        with tempfile.TemporaryDirectory() as tmp:
-            tmp_path = Path(tmp)
-            for sid in ['alpha', 'bravo']:
-                save_session(
-                    StoredSession(session_id=sid, messages=('hi',), input_tokens=1, output_tokens=2),
-                    tmp_path,
-                )
-            # text mode
-            text_result = subprocess.run(
-                [sys.executable, '-m', 'src.main', 'list-sessions', '--directory', str(tmp_path)],
-                check=True, capture_output=True, text=True,
-            )
-            self.assertIn('alpha', text_result.stdout)
-            self.assertIn('bravo', text_result.stdout)
-            # json mode
-            json_result = subprocess.run(
-                [sys.executable, '-m', 'src.main', 'list-sessions',
-                 '--directory', str(tmp_path), '--output-format', 'json'],
-                check=True, capture_output=True, text=True,
-            )
-            data = json.loads(json_result.stdout)
-            # Verify common envelope fields (SCHEMAS.md contract)
-            self.assertIn('timestamp', data)
-            self.assertEqual(data['command'], 'list-sessions')
-            self.assertEqual(data['schema_version'], '1.0')
-            # Verify command-specific fields
-            self.assertEqual(data['sessions'], ['alpha', 'bravo'])
-            self.assertEqual(data['count'], 2)
-
-    def test_delete_session_cli_idempotent(self) -> None:
-        """#160: delete-session CLI is idempotent (not-found is exit 0, status=not_found)."""
-        import json
-        import tempfile
-        from src.session_store import StoredSession, save_session
-
-        with tempfile.TemporaryDirectory() as tmp:
-            tmp_path = Path(tmp)
-            save_session(
-                StoredSession(session_id='once', messages=('hi',), input_tokens=1, output_tokens=2),
-                tmp_path,
-            )
-            # first delete: success
-            first = subprocess.run(
-                [sys.executable, '-m', 'src.main', 'delete-session', 'once',
-                 '--directory', str(tmp_path), '--output-format', 'json'],
-                capture_output=True, text=True,
-            )
-            self.assertEqual(first.returncode, 0)
-            envelope_first = json.loads(first.stdout)
-            # Verify common envelope fields (SCHEMAS.md contract)
-            self.assertIn('timestamp', envelope_first)
-            self.assertEqual(envelope_first['command'], 'delete-session')
-            self.assertEqual(envelope_first['exit_code'], 0)
-            self.assertEqual(envelope_first['schema_version'], '1.0')
-            # Verify command-specific fields
-            self.assertEqual(envelope_first['session_id'], 'once')
-            self.assertEqual(envelope_first['deleted'], True)
-            self.assertEqual(envelope_first['status'], 'deleted')
-            # second delete: idempotent, still exit 0
-            second = subprocess.run(
-                [sys.executable, '-m', 'src.main', 'delete-session', 'once',
-                 '--directory', str(tmp_path), '--output-format', 'json'],
-                capture_output=True, text=True,
-            )
-            self.assertEqual(second.returncode, 0)
-            envelope_second = json.loads(second.stdout)
-            self.assertEqual(envelope_second['session_id'], 'once')
-            self.assertEqual(envelope_second['deleted'], False)
-            self.assertEqual(envelope_second['status'], 'not_found')
-
-    def test_delete_session_cli_partial_failure_exit_1(self) -> None:
-        """#160: partial-failure (permission error) surfaces as exit 1 + typed JSON error."""
-        import json
-        import tempfile
-
-        with tempfile.TemporaryDirectory() as tmp:
-            tmp_path = Path(tmp)
-            bad = tmp_path / 'locked.json'
-            bad.mkdir()
-            try:
-                result = subprocess.run(
-                    [sys.executable, '-m', 'src.main', 'delete-session', 'locked',
-                     '--directory', str(tmp_path), '--output-format', 'json'],
-                    capture_output=True, text=True,
-                )
-                self.assertEqual(result.returncode, 1)
-                data = json.loads(result.stdout)
-                self.assertFalse(data['deleted'])
-                self.assertEqual(data['error']['kind'], 'session_delete_failed')
-                self.assertTrue(data['error']['retryable'])
-            finally:
-                bad.rmdir()
-
    def test_tool_permission_filtering_cli_runs(self) -> None:
        result = subprocess.run(
            [sys.executable, '-m', 'src.main', 'tools', '--limit', '10', '--deny-prefix', 'mcp'],
--- a/tests/test_run_turn_loop_cancellation.py
+++ b/tests/test_run_turn_loop_cancellation.py
@@ -1,156 +0,0 @@
-"""Tests for run_turn_loop timeout triggering cooperative cancel (ROADMAP #164 Stage A).
-
-End-to-end integration: when the wall-clock timeout fires in run_turn_loop,
-the runtime must signal the cancel_event so any in-flight submit_message
-thread sees it at its next safe checkpoint and returns without mutating
-state.
-
-This closes the gap filed in #164: #161's timeout bounded caller wait but
-did not prevent ghost turns.
-"""
-
-from __future__ import annotations
-
-import sys
-import threading
-import time
-from pathlib import Path
-from unittest.mock import patch
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.models import UsageSummary  # noqa: E402
-from src.query_engine import TurnResult  # noqa: E402
-from src.runtime import PortRuntime  # noqa: E402
-
-
-def _completed(prompt: str) -> TurnResult:
-    return TurnResult(
-        prompt=prompt,
-        output='ok',
-        matched_commands=(),
-        matched_tools=(),
-        permission_denials=(),
-        usage=UsageSummary(),
-        stop_reason='completed',
-    )
-
-
-class TestTimeoutPropagatesCancelEvent:
-    def test_runtime_passes_cancel_event_to_submit_message(self) -> None:
-        """submit_message receives a cancel_event when a deadline is in play."""
-        runtime = PortRuntime()
-        captured_event: list[threading.Event | None] = []
-
-        def _capture(prompt, commands, tools, denials, cancel_event=None):
-            captured_event.append(cancel_event)
-            return _completed(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _capture
-
-            runtime.run_turn_loop(
-                'hello', max_turns=1, timeout_seconds=5.0,
-            )
-
-            # Runtime passed a real Event object, not None
-            assert len(captured_event) == 1
-            assert isinstance(captured_event[0], threading.Event)
-
-    def test_legacy_no_timeout_does_not_pass_cancel_event(self) -> None:
-        """Without timeout_seconds, the cancel_event is None (legacy behaviour)."""
-        runtime = PortRuntime()
-        captured_kwargs: list[dict] = []
-
-        def _capture(prompt, commands, tools, denials):
-            # Legacy call signature: no cancel_event kwarg
-            captured_kwargs.append({'prompt': prompt})
-            return _completed(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _capture
-
-            runtime.run_turn_loop('hello', max_turns=1)
-
-            # Legacy path didn't pass cancel_event at all
-            assert len(captured_kwargs) == 1
-
-    def test_timeout_sets_cancel_event_before_returning(self) -> None:
-        """When timeout fires mid-call, the event is set and the still-running
-        thread would see 'cancelled' if it checks before returning."""
-        runtime = PortRuntime()
-        observed_events_at_checkpoint: list[bool] = []
-        release = threading.Event()  # test-side release so the thread doesn't leak forever
-
-        def _slow_submit(prompt, commands, tools, denials, cancel_event=None):
-            # Simulate provider work: block until either cancel or a test-side release.
-            # If cancel fires, check if the event is observably set.
-            start = time.monotonic()
-            while time.monotonic() - start < 2.0:
-                if cancel_event is not None and cancel_event.is_set():
-                    observed_events_at_checkpoint.append(True)
-                    return TurnResult(
-                        prompt=prompt, output='',
-                        matched_commands=(), matched_tools=(),
-                        permission_denials=(), usage=UsageSummary(),
-                        stop_reason='cancelled',
-                    )
-                if release.is_set():
-                    break
-                time.sleep(0.05)
-            return _completed(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _slow_submit
-
-            # Tight deadline: 0.2s, submit will be mid-loop when timeout fires
-            start = time.monotonic()
-            results = runtime.run_turn_loop(
-                'hello', max_turns=1, timeout_seconds=0.2,
-            )
-            elapsed = time.monotonic() - start
-            release.set()  # let the background thread exit cleanly
-
-            # Runtime returned a timeout TurnResult to the caller
-            assert results[-1].stop_reason == 'timeout'
-            # And it happened within a reasonable window of the deadline
-            assert elapsed < 1.5, f'runtime did not honour deadline: {elapsed:.2f}s'
-
-            # Give the background thread a moment to observe the cancel.
-            # We don't assert on it directly (thread-level observability is
-            # timing-dependent), but the contract is: the event IS set, so any
-            # cooperative checkpoint will see it.
-            time.sleep(0.3)
-
-
-class TestCancelEventSharedAcrossTurns:
-    """Event is created once per run_turn_loop invocation and shared across turns."""
-
-    def test_same_event_threaded_to_every_submit_message(self) -> None:
-        runtime = PortRuntime()
-        captured_events: list[threading.Event] = []
-
-        def _capture(prompt, commands, tools, denials, cancel_event=None):
-            if cancel_event is not None:
-                captured_events.append(cancel_event)
-            return _completed(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _capture
-
-            runtime.run_turn_loop(
-                'hello', max_turns=3, timeout_seconds=5.0,
-                continuation_prompt='continue',
-            )
-
-            # All 3 turns received the same event object (same identity)
-            assert len(captured_events) == 3
-            assert all(e is captured_events[0] for e in captured_events), (
-                'runtime must share one cancel_event across turns, not create '
-                'a new one per turn \u2014 otherwise a late-arriving cancel on turn '
-                'N-1 cannot affect turn N'
-            )
--- a/tests/test_run_turn_loop_continuation.py
+++ b/tests/test_run_turn_loop_continuation.py
@@ -1,161 +0,0 @@
-"""Tests for run_turn_loop continuation contract (ROADMAP #163).
-
-The deprecated ``f'{prompt} [turn N]'`` suffix injection is gone. Verifies:
- No ``[turn N]`` string ever lands in a submitted prompt
- Default (``continuation_prompt=None``) stops the loop after turn 0
- Explicit ``continuation_prompt`` is submitted verbatim on subsequent turns
- The first turn always gets the original prompt, not the continuation
-"""
-
-from __future__ import annotations
-
-import subprocess
-import sys
-from pathlib import Path
-from unittest.mock import patch
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.models import UsageSummary  # noqa: E402
-from src.query_engine import TurnResult  # noqa: E402
-from src.runtime import PortRuntime  # noqa: E402
-
-
-def _completed_result(prompt: str) -> TurnResult:
-    return TurnResult(
-        prompt=prompt,
-        output='ok',
-        matched_commands=(),
-        matched_tools=(),
-        permission_denials=(),
-        usage=UsageSummary(),
-        stop_reason='completed',
-    )
-
-
-class TestNoTurnSuffixInjection:
-    """Core acceptance: no prompt submitted to the engine ever contains '[turn N]'."""
-
-    def test_default_path_submits_original_prompt_only(self) -> None:
-        runtime = PortRuntime()
-        submitted: list[str] = []
-
-        def _capture(prompt, commands, tools, denials):
-            submitted.append(prompt)
-            return _completed_result(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _capture
-
-            runtime.run_turn_loop('investigate this bug', max_turns=3)
-
-            # Without continuation_prompt, only turn 0 should run
-            assert submitted == ['investigate this bug']
-            # And no '[turn N]' suffix anywhere
-            for p in submitted:
-                assert '[turn' not in p, f'found [turn suffix in submitted prompt: {p!r}'
-
-    def test_with_continuation_prompt_no_turn_suffix(self) -> None:
-        runtime = PortRuntime()
-        submitted: list[str] = []
-
-        def _capture(prompt, commands, tools, denials):
-            submitted.append(prompt)
-            return _completed_result(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _capture
-
-            runtime.run_turn_loop(
-                'investigate this bug',
-                max_turns=3,
-                continuation_prompt='Continue.',
-            )
-
-            # Turn 0 = original, turns 1-2 = continuation, verbatim
-            assert submitted == ['investigate this bug', 'Continue.', 'Continue.']
-            # No harness-injected suffix anywhere
-            for p in submitted:
-                assert '[turn' not in p
-                assert not p.endswith(']')
-
-
-class TestContinuationDefaultStopsAfterTurnZero:
-    def test_default_continuation_returns_one_result(self) -> None:
-        runtime = PortRuntime()
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
-
-            results = runtime.run_turn_loop('x', max_turns=5)
-            assert len(results) == 1
-            assert results[0].prompt == 'x'
-
-    def test_default_continuation_does_not_call_engine_twice(self) -> None:
-        runtime = PortRuntime()
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
-
-            runtime.run_turn_loop('x', max_turns=10)
-            # Exactly one submit_message call despite max_turns=10
-            assert engine.submit_message.call_count == 1
-
-
-class TestExplicitContinuationBehaviour:
-    def test_first_turn_always_uses_original_prompt(self) -> None:
-        runtime = PortRuntime()
-        captured: list[str] = []
-
-        def _capture(prompt, *_):
-            captured.append(prompt)
-            return _completed_result(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _capture
-
-            runtime.run_turn_loop(
-                'original task', max_turns=2, continuation_prompt='keep going'
-            )
-
-            assert captured[0] == 'original task'
-            assert captured[1] == 'keep going'
-
-    def test_continuation_respects_max_turns(self) -> None:
-        runtime = PortRuntime()
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = lambda p, *_: _completed_result(p)
-
-            runtime.run_turn_loop('x', max_turns=3, continuation_prompt='go')
-            assert engine.submit_message.call_count == 3
-
-
-class TestCLIContinuationFlag:
-    def test_cli_default_runs_one_turn(self) -> None:
-        """Without --continuation-prompt, CLI should emit exactly '## Turn 1'."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
-             '--max-turns', '3', '--structured-output'],
-            check=True, capture_output=True, text=True,
-        )
-        assert '## Turn 1' in result.stdout
-        assert '## Turn 2' not in result.stdout
-        assert '[turn' not in result.stdout
-
-    def test_cli_with_continuation_runs_multiple_turns(self) -> None:
-        """With --continuation-prompt, CLI should run up to max_turns."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'turn-loop', 'review MCP tool',
-             '--max-turns', '2', '--structured-output',
-             '--continuation-prompt', 'continue'],
-            check=True, capture_output=True, text=True,
-        )
-        assert '## Turn 1' in result.stdout
-        assert '## Turn 2' in result.stdout
-        # The continuation text is visible (it's submitted as the turn prompt)
-        # but no harness-injected [turn N] suffix
-        assert '[turn' not in result.stdout
--- a/tests/test_run_turn_loop_permissions.py
+++ b/tests/test_run_turn_loop_permissions.py
@@ -1,95 +0,0 @@
-"""Tests for run_turn_loop permission denials parity (ROADMAP #159).
-
-Verifies that multi-turn sessions have the same security posture as
-single-turn bootstrap_session: denied_tools are inferred from matches
-and threaded through every turn, not hardcoded empty.
-"""
-
-from __future__ import annotations
-
-import sys
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.runtime import PortRuntime  # noqa: E402
-
-
-class TestPermissionDenialsInTurnLoop:
-    """#159: permission denials must be non-empty in run_turn_loop,
-    matching what bootstrap_session produces for the same prompt.
-    """
-
-    def test_turn_loop_surfaces_permission_denials_like_bootstrap(self) -> None:
-        """Symmetry check: turn_loop and bootstrap_session infer the same denials."""
-        runtime = PortRuntime()
-        prompt = 'run bash ls'
-
-        # Single-turn via bootstrap
-        bootstrap_result = runtime.bootstrap_session(prompt)
-        bootstrap_denials = bootstrap_result.turn_result.permission_denials
-
-        # Multi-turn via run_turn_loop (single turn, no continuation)
-        loop_results = runtime.run_turn_loop(prompt, max_turns=1)
-        loop_denials = loop_results[0].permission_denials
-
-        # Both should infer denials for bash-family tools
-        assert len(bootstrap_denials) > 0, (
-            'bootstrap_session should deny bash-family tools'
-        )
-        assert len(loop_denials) > 0, (
-            f'#159 regression: run_turn_loop returned empty denials; '
-            f'expected {len(bootstrap_denials)} like bootstrap_session'
-        )
-
-        # The denial kinds should match (both deny the same tools)
-        bootstrap_denied_names = {d.tool_name for d in bootstrap_denials}
-        loop_denied_names = {d.tool_name for d in loop_denials}
-        assert bootstrap_denied_names == loop_denied_names, (
-            f'asymmetric denials: bootstrap denied {bootstrap_denied_names}, '
-            f'loop denied {loop_denied_names}'
-        )
-
-    def test_turn_loop_with_continuation_preserves_denials(self) -> None:
-        """Denials are inferred once at loop start, then passed to every turn."""
-        runtime = PortRuntime()
-        from unittest.mock import patch
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            from src.models import UsageSummary
-            from src.query_engine import TurnResult
-
-            engine = mock_factory.return_value
-            submitted_denials: list[tuple] = []
-
-            def _capture(prompt, commands, tools, denials):
-                submitted_denials.append(denials)
-                return TurnResult(
-                    prompt=prompt,
-                    output='ok',
-                    matched_commands=(),
-                    matched_tools=(),
-                    permission_denials=denials,  # echo back the denials
-                    usage=UsageSummary(),
-                    stop_reason='completed',
-                )
-
-            engine.submit_message.side_effect = _capture
-
-            loop_results = runtime.run_turn_loop(
-                'run bash rm', max_turns=2, continuation_prompt='continue'
-            )
-
-            # Both turn 0 and turn 1 should have received the same denials
-            assert len(submitted_denials) == 2
-            assert submitted_denials[0] == submitted_denials[1], (
-                'denials should be consistent across all turns'
-            )
-            # And they should be non-empty (bash is destructive)
-            assert len(submitted_denials[0]) > 0, (
-                'turn-loop denials were empty — #159 regression'
-            )
-
-            # Turn results should reflect the denials that were passed
-            for result in loop_results:
-                assert len(result.permission_denials) > 0
--- a/tests/test_run_turn_loop_timeout.py
+++ b/tests/test_run_turn_loop_timeout.py
@@ -1,179 +0,0 @@
-"""Tests for run_turn_loop wall-clock timeout (ROADMAP #161).
-
-Covers:
- timeout_seconds=None preserves legacy unbounded behaviour
- timeout_seconds=X aborts a hung turn and emits stop_reason='timeout'
- Timeout budget is total wall-clock across all turns, not per-turn
- Already-exhausted budget short-circuits before the first turn runs
- Legacy path still runs without a ThreadPoolExecutor in the way
-"""
-
-from __future__ import annotations
-
-import sys
-import time
-from pathlib import Path
-from unittest.mock import patch
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.models import UsageSummary  # noqa: E402
-from src.query_engine import TurnResult  # noqa: E402
-from src.runtime import PortRuntime  # noqa: E402
-
-
-def _completed_result(prompt: str) -> TurnResult:
-    return TurnResult(
-        prompt=prompt,
-        output='ok',
-        matched_commands=(),
-        matched_tools=(),
-        permission_denials=(),
-        usage=UsageSummary(),
-        stop_reason='completed',
-    )
-
-
-class TestLegacyUnboundedBehaviour:
-    def test_no_timeout_preserves_existing_behaviour(self) -> None:
-        """timeout_seconds=None must not change legacy path at all."""
-        results = PortRuntime().run_turn_loop('review MCP tool', max_turns=2)
-        assert len(results) >= 1
-        for r in results:
-            assert r.stop_reason in {'completed', 'max_turns_reached', 'max_budget_reached'}
-            assert r.stop_reason != 'timeout'
-
-
-class TestTimeoutAbortsHungTurn:
-    def test_hung_submit_message_times_out(self) -> None:
-        """A stalled submit_message must be aborted and emit stop_reason='timeout'."""
-        runtime = PortRuntime()
-
-        # #164 Stage A: runtime now passes cancel_event as a 5th positional
-        # arg on the timeout path, so mocks must accept it (even if they ignore it).
-        def _hang(prompt, commands, tools, denials, cancel_event=None):
-            time.sleep(5.0)  # would block the loop
-            return _completed_result(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.config = None  # attribute-assigned in run_turn_loop
-            engine.submit_message.side_effect = _hang
-
-            start = time.monotonic()
-            results = runtime.run_turn_loop(
-                'review MCP tool', max_turns=3, timeout_seconds=0.3
-            )
-            elapsed = time.monotonic() - start
-
-            # Must exit well under the 5s hang
-            assert elapsed < 1.5, f'run_turn_loop did not honor timeout: {elapsed:.2f}s'
-            assert len(results) == 1
-            assert results[-1].stop_reason == 'timeout'
-
-
-class TestTimeoutBudgetIsTotal:
-    def test_budget_is_cumulative_across_turns(self) -> None:
-        """timeout_seconds is total wall-clock across all turns, not per-turn.
-
-        #163 interaction: multi-turn behaviour now requires an explicit
-        ``continuation_prompt``; otherwise the loop stops after turn 0 and
-        the cumulative-budget contract is trivially satisfied. We supply one
-        here so the test actually exercises the cross-turn deadline.
-        """
-        runtime = PortRuntime()
-        call_count = {'n': 0}
-
-        def _slow(prompt, commands, tools, denials, cancel_event=None):
-            call_count['n'] += 1
-            time.sleep(0.4)  # each turn burns 0.4s
-            return _completed_result(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _slow
-
-            start = time.monotonic()
-            # 0.6s budget, 0.4s per turn. First turn completes (~0.4s),
-            # second turn times out before finishing.
-            results = runtime.run_turn_loop(
-                'review MCP tool',
-                max_turns=5,
-                timeout_seconds=0.6,
-                continuation_prompt='continue',
-            )
-            elapsed = time.monotonic() - start
-
-            # Should exit at around 0.6s, not 2.0s (5 turns * 0.4s)
-            assert elapsed < 1.5, f'cumulative budget not honored: {elapsed:.2f}s'
-            # Last result should be the timeout
-            assert results[-1].stop_reason == 'timeout'
-
-
-class TestExhaustedBudget:
-    def test_zero_timeout_short_circuits_first_turn(self) -> None:
-        """timeout_seconds=0 emits timeout before the first submit_message call."""
-        runtime = PortRuntime()
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            # submit_message should never be called when budget is already 0
-            engine.submit_message.side_effect = AssertionError(
-                'submit_message should not run when budget is exhausted'
-            )
-
-            results = runtime.run_turn_loop(
-                'review MCP tool', max_turns=3, timeout_seconds=0.0
-            )
-
-            assert len(results) == 1
-            assert results[0].stop_reason == 'timeout'
-
-
-class TestTimeoutResultShape:
-    def test_timeout_result_has_correct_prompt_and_matches(self) -> None:
-        """Synthetic TurnResult on timeout must carry the turn's prompt + routed matches."""
-        runtime = PortRuntime()
-
-        def _hang(prompt, commands, tools, denials, cancel_event=None):
-            time.sleep(5.0)
-            return _completed_result(prompt)
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = _hang
-
-            results = runtime.run_turn_loop(
-                'review MCP tool', max_turns=2, timeout_seconds=0.2
-            )
-
-            timeout_result = results[-1]
-            assert timeout_result.stop_reason == 'timeout'
-            assert timeout_result.prompt == 'review MCP tool'
-            # matched_commands / matched_tools should still be populated from routing,
-            # so downstream transcripts don't lose the routing context.
-            # These may be empty tuples depending on routing; they must be tuples.
-            assert isinstance(timeout_result.matched_commands, tuple)
-            assert isinstance(timeout_result.matched_tools, tuple)
-            assert isinstance(timeout_result.usage, UsageSummary)
-
-
-class TestNegativeTimeoutTreatedAsExhausted:
-    def test_negative_timeout_short_circuits(self) -> None:
-        """A negative budget should behave identically to exhausted."""
-        runtime = PortRuntime()
-
-        with patch('src.runtime.QueryEnginePort.from_workspace') as mock_factory:
-            engine = mock_factory.return_value
-            engine.submit_message.side_effect = AssertionError(
-                'submit_message should not run when budget is negative'
-            )
-
-            results = runtime.run_turn_loop(
-                'review MCP tool', max_turns=3, timeout_seconds=-1.0
-            )
-
-            assert len(results) == 1
-            assert results[0].stop_reason == 'timeout'
--- a/tests/test_session_store.py
+++ b/tests/test_session_store.py
@@ -1,173 +0,0 @@
-"""Tests for session_store CRUD surface (ROADMAP #160).
-
-Covers:
- list_sessions enumeration
- session_exists boolean check
- delete_session idempotency + race-safety + partial-failure contract
- SessionNotFoundError typing (KeyError subclass)
- SessionDeleteError typing (OSError subclass)
-"""
-
-from __future__ import annotations
-
-import sys
-from pathlib import Path
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent / 'src'))
-
-from session_store import (  # noqa: E402
-    StoredSession,
-    SessionDeleteError,
-    SessionNotFoundError,
-    delete_session,
-    list_sessions,
-    load_session,
-    save_session,
-    session_exists,
-)
-
-
-def _make_session(session_id: str) -> StoredSession:
-    return StoredSession(
-        session_id=session_id,
-        messages=('hello',),
-        input_tokens=1,
-        output_tokens=2,
-    )
-
-
-class TestListSessions:
-    def test_empty_directory_returns_empty_list(self, tmp_path: Path) -> None:
-        assert list_sessions(tmp_path) == []
-
-    def test_nonexistent_directory_returns_empty_list(self, tmp_path: Path) -> None:
-        missing = tmp_path / 'never-created'
-        assert list_sessions(missing) == []
-
-    def test_lists_saved_sessions_sorted(self, tmp_path: Path) -> None:
-        save_session(_make_session('charlie'), tmp_path)
-        save_session(_make_session('alpha'), tmp_path)
-        save_session(_make_session('bravo'), tmp_path)
-        assert list_sessions(tmp_path) == ['alpha', 'bravo', 'charlie']
-
-    def test_ignores_non_json_files(self, tmp_path: Path) -> None:
-        save_session(_make_session('real'), tmp_path)
-        (tmp_path / 'notes.txt').write_text('ignore me')
-        (tmp_path / 'data.yaml').write_text('ignore me too')
-        assert list_sessions(tmp_path) == ['real']
-
-
-class TestSessionExists:
-    def test_returns_true_for_saved_session(self, tmp_path: Path) -> None:
-        save_session(_make_session('present'), tmp_path)
-        assert session_exists('present', tmp_path) is True
-
-    def test_returns_false_for_missing_session(self, tmp_path: Path) -> None:
-        assert session_exists('absent', tmp_path) is False
-
-    def test_returns_false_for_nonexistent_directory(self, tmp_path: Path) -> None:
-        missing = tmp_path / 'never-created'
-        assert session_exists('anything', missing) is False
-
-
-class TestLoadSession:
-    def test_raises_typed_error_on_missing(self, tmp_path: Path) -> None:
-        with pytest.raises(SessionNotFoundError) as exc_info:
-            load_session('nonexistent', tmp_path)
-        assert 'nonexistent' in str(exc_info.value)
-
-    def test_not_found_error_is_keyerror_subclass(self, tmp_path: Path) -> None:
-        """Orchestrators catching KeyError should still work."""
-        with pytest.raises(KeyError):
-            load_session('nonexistent', tmp_path)
-
-    def test_not_found_error_is_not_filenotfounderror(self, tmp_path: Path) -> None:
-        """Callers can distinguish 'not found' from IO errors."""
-        with pytest.raises(SessionNotFoundError):
-            load_session('nonexistent', tmp_path)
-        # Specifically, it should NOT match bare FileNotFoundError alone
-        # (SessionNotFoundError inherits from KeyError, not FileNotFoundError)
-        assert not issubclass(SessionNotFoundError, FileNotFoundError)
-
-
-class TestDeleteSessionIdempotency:
-    """Contract: delete_session(x) followed by delete_session(x) must be safe."""
-
-    def test_first_delete_returns_true(self, tmp_path: Path) -> None:
-        save_session(_make_session('to-delete'), tmp_path)
-        assert delete_session('to-delete', tmp_path) is True
-
-    def test_second_delete_returns_false_no_raise(self, tmp_path: Path) -> None:
-        """Idempotency: deleting an already-deleted session is a no-op."""
-        save_session(_make_session('once'), tmp_path)
-        delete_session('once', tmp_path)
-        # Second call must not raise
-        assert delete_session('once', tmp_path) is False
-
-    def test_delete_nonexistent_returns_false_no_raise(self, tmp_path: Path) -> None:
-        """Never-existed session is treated identically to already-deleted."""
-        assert delete_session('never-existed', tmp_path) is False
-
-    def test_delete_removes_only_target(self, tmp_path: Path) -> None:
-        save_session(_make_session('keep'), tmp_path)
-        save_session(_make_session('remove'), tmp_path)
-        delete_session('remove', tmp_path)
-        assert list_sessions(tmp_path) == ['keep']
-
-
-class TestDeleteSessionPartialFailure:
-    """Contract: file exists but cannot be removed -> SessionDeleteError."""
-
-    def test_partial_failure_raises_session_delete_error(self, tmp_path: Path) -> None:
-        """If a directory exists where a session file should be, unlink fails."""
-        bad_path = tmp_path / 'locked.json'
-        bad_path.mkdir()
-        try:
-            with pytest.raises(SessionDeleteError) as exc_info:
-                delete_session('locked', tmp_path)
-            # Underlying cause should be wrapped
-            assert exc_info.value.__cause__ is not None
-            assert isinstance(exc_info.value.__cause__, OSError)
-        finally:
-            bad_path.rmdir()
-
-    def test_delete_error_is_oserror_subclass(self, tmp_path: Path) -> None:
-        """Callers catching OSError should still work for retries."""
-        bad_path = tmp_path / 'locked.json'
-        bad_path.mkdir()
-        try:
-            with pytest.raises(OSError):
-                delete_session('locked', tmp_path)
-        finally:
-            bad_path.rmdir()
-
-
-class TestRaceSafety:
-    """Contract: delete_session must be race-safe between exists-check and unlink."""
-
-    def test_concurrent_deletion_returns_false_not_raises(
-        self, tmp_path: Path, monkeypatch
-    ) -> None:
-        """If another process deletes between exists-check and unlink, return False."""
-        save_session(_make_session('racy'), tmp_path)
-        # Simulate: file disappears right before unlink (concurrent deletion)
-        path = tmp_path / 'racy.json'
-        path.unlink()
-        # Now delete_session should return False, not raise
-        assert delete_session('racy', tmp_path) is False
-
-
-class TestRoundtrip:
-    def test_save_list_load_delete_cycle(self, tmp_path: Path) -> None:
-        session = _make_session('lifecycle')
-        save_session(session, tmp_path)
-        assert 'lifecycle' in list_sessions(tmp_path)
-        assert session_exists('lifecycle', tmp_path)
-        loaded = load_session('lifecycle', tmp_path)
-        assert loaded.session_id == 'lifecycle'
-        assert loaded.messages == ('hello',)
-        assert delete_session('lifecycle', tmp_path) is True
-        assert not session_exists('lifecycle', tmp_path)
-        assert list_sessions(tmp_path) == []
--- a/tests/test_show_command_tool_output_format.py
+++ b/tests/test_show_command_tool_output_format.py
@@ -1,203 +0,0 @@
-"""Tests for --output-format flag on show-command and show-tool (ROADMAP #167).
-
-Verifies parity with session-lifecycle CLI family (#160/#165/#166):
- show-command and show-tool now accept --output-format {text,json}
- Found case returns success with JSON envelope: {name, found: true, source_hint, responsibility}
- Not-found case returns typed error envelope: {name, found: false, error: {kind, message, retryable}}
- Legacy text output (default) unchanged for backward compat
- Exit code 0 on success, 1 on not-found (matching load-session contract)
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-import sys
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-
-class TestShowCommandOutputFormat:
-    """show-command --output-format {text,json} parity with session-lifecycle family."""
-
-    def test_show_command_found_json(self) -> None:
-        """show-command with found entry returns JSON envelope."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
-
-        envelope = json.loads(result.stdout)
-        assert envelope['found'] is True
-        assert envelope['name'] == 'add-dir'
-        assert 'source_hint' in envelope
-        assert 'responsibility' in envelope
-        # No error field when found
-        assert 'error' not in envelope
-
-    def test_show_command_not_found_json(self) -> None:
-        """show-command with missing entry returns typed error envelope."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'nonexistent-cmd', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
-
-        envelope = json.loads(result.stdout)
-        assert envelope['found'] is False
-        assert envelope['name'] == 'nonexistent-cmd'
-        assert envelope['error']['kind'] == 'command_not_found'
-        assert envelope['error']['retryable'] is False
-        # No source_hint/responsibility when not found
-        assert 'source_hint' not in envelope or envelope.get('source_hint') is None
-        assert 'responsibility' not in envelope or envelope.get('responsibility') is None
-
-    def test_show_command_text_mode_backward_compat(self) -> None:
-        """show-command text mode (default) is unchanged from pre-#167."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0
-
-        # Text output is newline-separated (name, source_hint, responsibility)
-        lines = result.stdout.strip().split('\n')
-        assert len(lines) == 3
-        assert lines[0] == 'add-dir'
-        assert 'commands/add-dir/add-dir.tsx' in lines[1]
-
-    def test_show_command_text_mode_not_found(self) -> None:
-        """show-command text mode on not-found returns prose error."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'missing'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 1
-        assert 'not found' in result.stdout.lower()
-        assert 'missing' in result.stdout
-
-    def test_show_command_default_is_text(self) -> None:
-        """Omitting --output-format defaults to text."""
-        result_implicit = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        result_explicit = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'text'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result_implicit.stdout == result_explicit.stdout
-
-
-class TestShowToolOutputFormat:
-    """show-tool --output-format {text,json} parity with session-lifecycle family."""
-
-    def test_show_tool_found_json(self) -> None:
-        """show-tool with found entry returns JSON envelope."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0, f'Expected exit 0, got {result.returncode}: {result.stderr}'
-
-        envelope = json.loads(result.stdout)
-        assert envelope['found'] is True
-        assert envelope['name'] == 'BashTool'
-        assert 'source_hint' in envelope
-        assert 'responsibility' in envelope
-        assert 'error' not in envelope
-
-    def test_show_tool_not_found_json(self) -> None:
-        """show-tool with missing entry returns typed error envelope."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-tool', 'NotARealTool', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 1, f'Expected exit 1 on not-found, got {result.returncode}'
-
-        envelope = json.loads(result.stdout)
-        assert envelope['found'] is False
-        assert envelope['name'] == 'NotARealTool'
-        assert envelope['error']['kind'] == 'tool_not_found'
-        assert envelope['error']['retryable'] is False
-
-    def test_show_tool_text_mode_backward_compat(self) -> None:
-        """show-tool text mode (default) is unchanged from pre-#167."""
-        result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        assert result.returncode == 0
-
-        lines = result.stdout.strip().split('\n')
-        assert len(lines) == 3
-        assert lines[0] == 'BashTool'
-        assert 'tools/BashTool/BashTool.tsx' in lines[1]
-
-
-class TestShowCommandToolFormatParity:
-    """Verify symmetry between show-command and show-tool formats."""
-
-    def test_both_accept_output_format_flag(self) -> None:
-        """Both commands accept the same --output-format choices."""
-        # Just ensure both fail with invalid choice (they accept text/json)
-        result_cmd = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'invalid'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        result_tool = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'invalid'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        # Both should fail with argument parser error
-        assert result_cmd.returncode != 0
-        assert result_tool.returncode != 0
-        assert 'invalid choice' in result_cmd.stderr
-        assert 'invalid choice' in result_tool.stderr
-
-    def test_json_envelope_shape_consistency(self) -> None:
-        """Both commands return consistent JSON envelope shape."""
-        cmd_result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-command', 'add-dir', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-        tool_result = subprocess.run(
-            [sys.executable, '-m', 'src.main', 'show-tool', 'BashTool', '--output-format', 'json'],
-            cwd=Path(__file__).resolve().parent.parent,
-            capture_output=True,
-            text=True,
-        )
-
-        cmd_envelope = json.loads(cmd_result.stdout)
-        tool_envelope = json.loads(tool_result.stdout)
-
-        # Same top-level keys for found=true case
-        assert set(cmd_envelope.keys()) == set(tool_envelope.keys())
-        assert cmd_envelope['found'] is True
-        assert tool_envelope['found'] is True
--- a/tests/test_submit_message_budget.py
+++ b/tests/test_submit_message_budget.py
@@ -1,167 +0,0 @@
-"""Tests for submit_message budget-overflow atomicity (ROADMAP #162).
-
-Covers:
- Budget overflow returns stop_reason='max_budget_reached' without mutating session
- mutable_messages, transcript_store, permission_denials, total_usage all unchanged
- Session persisted after overflow does not contain the overflow turn
- Engine remains usable after overflow: subsequent in-budget call succeeds
- Normal (non-overflow) path still commits state as before
-"""
-
-from __future__ import annotations
-
-import sys
-from pathlib import Path
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.models import PermissionDenial, UsageSummary  # noqa: E402
-from src.port_manifest import build_port_manifest  # noqa: E402
-from src.query_engine import QueryEngineConfig, QueryEnginePort  # noqa: E402
-from src.session_store import StoredSession, load_session, save_session  # noqa: E402
-
-
-def _make_engine(max_budget_tokens: int = 10) -> QueryEnginePort:
-    engine = QueryEnginePort(manifest=build_port_manifest())
-    engine.config = QueryEngineConfig(max_budget_tokens=max_budget_tokens)
-    return engine
-
-
-class TestBudgetOverflowDoesNotMutate:
-    """The core #162 contract: overflow must leave session state untouched."""
-
-    def test_mutable_messages_unchanged_on_overflow(self) -> None:
-        engine = _make_engine(max_budget_tokens=10)
-        pre_count = len(engine.mutable_messages)
-        overflow_prompt = ' '.join(['word'] * 50)
-        result = engine.submit_message(overflow_prompt)
-        assert result.stop_reason == 'max_budget_reached'
-        assert len(engine.mutable_messages) == pre_count
-
-    def test_transcript_unchanged_on_overflow(self) -> None:
-        engine = _make_engine(max_budget_tokens=10)
-        pre_count = len(engine.transcript_store.entries)
-        overflow_prompt = ' '.join(['word'] * 50)
-        result = engine.submit_message(overflow_prompt)
-        assert result.stop_reason == 'max_budget_reached'
-        assert len(engine.transcript_store.entries) == pre_count
-
-    def test_permission_denials_unchanged_on_overflow(self) -> None:
-        engine = _make_engine(max_budget_tokens=10)
-        pre_count = len(engine.permission_denials)
-        denials = (PermissionDenial(tool_name='bash', reason='gated in test'),)
-        overflow_prompt = ' '.join(['word'] * 50)
-        result = engine.submit_message(overflow_prompt, denied_tools=denials)
-        assert result.stop_reason == 'max_budget_reached'
-        assert len(engine.permission_denials) == pre_count
-
-    def test_total_usage_unchanged_on_overflow(self) -> None:
-        engine = _make_engine(max_budget_tokens=10)
-        pre_usage = engine.total_usage
-        overflow_prompt = ' '.join(['word'] * 50)
-        result = engine.submit_message(overflow_prompt)
-        assert result.stop_reason == 'max_budget_reached'
-        assert engine.total_usage == pre_usage
-
-    def test_turn_result_reports_pre_mutation_usage(self) -> None:
-        """The TurnResult.usage must reflect session state as-if overflow never happened."""
-        engine = _make_engine(max_budget_tokens=10)
-        pre_usage = engine.total_usage
-        overflow_prompt = ' '.join(['word'] * 50)
-        result = engine.submit_message(overflow_prompt)
-        assert result.stop_reason == 'max_budget_reached'
-        assert result.usage == pre_usage
-
-
-class TestOverflowPersistence:
-    """Session persisted after overflow must not contain the overflow turn."""
-
-    def test_persisted_session_empty_when_first_turn_overflows(
-        self, tmp_path: Path, monkeypatch
-    ) -> None:
-        """When the very first call overflows, persisted session has zero messages."""
-        monkeypatch.chdir(tmp_path)
-        engine = _make_engine(max_budget_tokens=10)
-        overflow_prompt = ' '.join(['word'] * 50)
-        result = engine.submit_message(overflow_prompt)
-        assert result.stop_reason == 'max_budget_reached'
-
-        path_str = engine.persist_session()
-        path = Path(path_str)
-        assert path.exists()
-        loaded = load_session(path.stem, path.parent)
-        assert loaded.messages == (), (
-            f'overflow turn poisoned session: {loaded.messages!r}'
-        )
-
-    def test_persisted_session_retains_only_successful_turns(
-        self, tmp_path: Path, monkeypatch
-    ) -> None:
-        """A successful turn followed by an overflow persists only the successful turn."""
-        monkeypatch.chdir(tmp_path)
-        # Budget large enough for one short turn but not a second big one.
-        # Token counting is whitespace-split (see UsageSummary.add_turn),
-        # so overflow prompts must contain many whitespace-separated words.
-        engine = QueryEnginePort(manifest=build_port_manifest())
-        engine.config = QueryEngineConfig(max_budget_tokens=50)
-
-        ok = engine.submit_message('short')
-        assert ok.stop_reason == 'completed'
-        assert 'short' in engine.mutable_messages
-
-        # 500 whitespace-separated tokens — definitely over a 50-token budget
-        overflow_prompt = ' '.join(['word'] * 500)
-        overflow = engine.submit_message(overflow_prompt)
-        assert overflow.stop_reason == 'max_budget_reached'
-
-        path = Path(engine.persist_session())
-        loaded = load_session(path.stem, path.parent)
-        assert loaded.messages == ('short',), (
-            f'expected only the successful turn, got {loaded.messages!r}'
-        )
-
-
-class TestEngineUsableAfterOverflow:
-    """After overflow, engine must still be usable — overflow is rejection, not corruption."""
-
-    def test_subsequent_in_budget_call_succeeds(self) -> None:
-        """After an overflow rejection, raising the budget and retrying works."""
-        engine = _make_engine(max_budget_tokens=10)
-        overflow_prompt = ' '.join(['word'] * 100)
-        overflow = engine.submit_message(overflow_prompt)
-        assert overflow.stop_reason == 'max_budget_reached'
-
-        # Raise the budget and retry — the engine should be in a clean state
-        engine.config = QueryEngineConfig(max_budget_tokens=10_000)
-        ok = engine.submit_message('short retry')
-        assert ok.stop_reason == 'completed'
-        assert 'short retry' in engine.mutable_messages
-        # The overflow prompt should never have been recorded
-        assert overflow_prompt not in engine.mutable_messages
-
-    def test_multiple_overflow_calls_remain_idempotent(self) -> None:
-        """Repeated overflow calls must not accumulate hidden state."""
-        engine = _make_engine(max_budget_tokens=10)
-        overflow_prompt = ' '.join(['word'] * 50)
-        for _ in range(5):
-            result = engine.submit_message(overflow_prompt)
-            assert result.stop_reason == 'max_budget_reached'
-        assert len(engine.mutable_messages) == 0
-        assert len(engine.transcript_store.entries) == 0
-        assert engine.total_usage == UsageSummary()
-
-
-class TestNormalPathStillCommits:
-    """Regression guard: non-overflow path must still mutate state as before."""
-
-    def test_in_budget_turn_commits_all_state(self) -> None:
-        engine = QueryEnginePort(manifest=build_port_manifest())
-        engine.config = QueryEngineConfig(max_budget_tokens=10_000)
-        result = engine.submit_message('review MCP tool')
-        assert result.stop_reason == 'completed'
-        assert len(engine.mutable_messages) == 1
-        assert len(engine.transcript_store.entries) == 1
-        assert engine.total_usage.input_tokens > 0
-        assert engine.total_usage.output_tokens > 0
--- a/tests/test_submit_message_cancellation.py
+++ b/tests/test_submit_message_cancellation.py
@@ -1,220 +0,0 @@
-"""Tests for cooperative cancellation in submit_message (ROADMAP #164 Stage A).
-
-Verifies that cancel_event enables safe early termination:
- Event set before call => immediate return with stop_reason='cancelled'
- Event set between budget check and commit => still 'cancelled', no mutation
- Event set after commit => not observable (honest cooperative limit)
- Legacy callers (cancel_event=None) see zero behaviour change
- State is untouched on cancellation: mutable_messages, transcript_store,
-  permission_denials, total_usage all preserved
-
-This closes the #161 follow-up gap filed as #164: wedged provider threads
-can no longer silently commit ghost turns after the caller observed a
-timeout.
-"""
-
-from __future__ import annotations
-
-import sys
-import threading
-from pathlib import Path
-
-sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
-
-from src.models import PermissionDenial  # noqa: E402
-from src.port_manifest import build_port_manifest  # noqa: E402
-from src.query_engine import QueryEngineConfig, QueryEnginePort, TurnResult  # noqa: E402
-
-
-def _fresh_engine(**config_overrides) -> QueryEnginePort:
-    config = QueryEngineConfig(**config_overrides) if config_overrides else QueryEngineConfig()
-    return QueryEnginePort(manifest=build_port_manifest(), config=config)
-
-
-class TestCancellationBeforeCall:
-    """Event set before submit_message is invoked => immediate 'cancelled'."""
-
-    def test_pre_set_event_returns_cancelled_immediately(self) -> None:
-        engine = _fresh_engine()
-        event = threading.Event()
-        event.set()
-
-        result = engine.submit_message('hello', cancel_event=event)
-
-        assert result.stop_reason == 'cancelled'
-        assert result.prompt == 'hello'
-        # Output is empty on pre-budget cancel (no synthesis)
-        assert result.output == ''
-
-    def test_pre_set_event_preserves_mutable_messages(self) -> None:
-        engine = _fresh_engine()
-        event = threading.Event()
-        event.set()
-
-        engine.submit_message('ghost turn', cancel_event=event)
-
-        assert engine.mutable_messages == [], (
-            'cancelled turn must not appear in mutable_messages'
-        )
-
-    def test_pre_set_event_preserves_transcript_store(self) -> None:
-        engine = _fresh_engine()
-        event = threading.Event()
-        event.set()
-
-        engine.submit_message('ghost turn', cancel_event=event)
-
-        assert engine.transcript_store.entries == [], (
-            'cancelled turn must not appear in transcript_store'
-        )
-
-    def test_pre_set_event_preserves_usage_counters(self) -> None:
-        engine = _fresh_engine()
-        initial_usage = engine.total_usage
-        event = threading.Event()
-        event.set()
-
-        engine.submit_message('expensive prompt ' * 100, cancel_event=event)
-
-        assert engine.total_usage == initial_usage, (
-            'cancelled turn must not increment token counters'
-        )
-
-    def test_pre_set_event_preserves_permission_denials(self) -> None:
-        engine = _fresh_engine()
-        event = threading.Event()
-        event.set()
-
-        denials = (PermissionDenial(tool_name='BashTool', reason='destructive'),)
-        engine.submit_message('run bash ls', denied_tools=denials, cancel_event=event)
-
-        assert engine.permission_denials == [], (
-            'cancelled turn must not extend permission_denials'
-        )
-
-
-class TestCancellationAfterBudgetCheck:
-    """Event set between budget projection and commit => 'cancelled', state intact.
-
-    This simulates the realistic racy case: engine starts computing output,
-    caller hits deadline, sets event. Engine observes at post-budget checkpoint
-    and returns cleanly.
-    """
-
-    def test_post_budget_cancel_returns_cancelled(self) -> None:
-        engine = _fresh_engine()
-        event = threading.Event()
-
-        # Patch: set the event after projection but before mutation. We do this
-        # by wrapping _format_output (called mid-submit) to set the event.
-        original_format = engine._format_output
-
-        def _set_then_format(*args, **kwargs):
-            result = original_format(*args, **kwargs)
-            event.set()  # trigger cancel right after output is built
-            return result
-
-        engine._format_output = _set_then_format  # type: ignore[method-assign]
-
-        result = engine.submit_message('hello', cancel_event=event)
-
-        assert result.stop_reason == 'cancelled'
-        # Output IS built here (we're past the pre-budget checkpoint), so it's
-        # not empty. The contract is about *state*, not output synthesis.
-        assert result.output != ''
-        # Critical: state still unchanged
-        assert engine.mutable_messages == []
-        assert engine.transcript_store.entries == []
-
-
-class TestCancellationAfterCommit:
-    """Event set after commit is not observable \u2014 honest cooperative limit."""
-
-    def test_post_commit_cancel_is_not_observable(self) -> None:
-        engine = _fresh_engine()
-        event = threading.Event()
-
-        # Event only set *after* submit_message returns. The first call has
-        # already committed before the event is set.
-        result = engine.submit_message('hello', cancel_event=event)
-        event.set()  # too late
-
-        assert result.stop_reason == 'completed', (
-            'cancel set after commit must not retroactively invalidate the turn'
-        )
-        assert engine.mutable_messages == ['hello']
-
-    def test_next_call_observes_cancel(self) -> None:
-        """The cancel_event persists \u2014 the next call on the same engine sees it."""
-        engine = _fresh_engine()
-        event = threading.Event()
-
-        engine.submit_message('first', cancel_event=event)
-        assert engine.mutable_messages == ['first']
-
-        event.set()
-        # Next call observes the cancel at entry
-        result = engine.submit_message('second', cancel_event=event)
-
-        assert result.stop_reason == 'cancelled'
-        # 'second' must NOT have been committed
-        assert engine.mutable_messages == ['first']
-
-
-class TestLegacyCallersUnchanged:
-    """cancel_event=None (default) => zero behaviour change from pre-#164."""
-
-    def test_no_event_submits_normally(self) -> None:
-        engine = _fresh_engine()
-        result = engine.submit_message('hello')
-
-        assert result.stop_reason == 'completed'
-        assert engine.mutable_messages == ['hello']
-
-    def test_no_event_with_budget_overflow_still_rejects_atomically(self) -> None:
-        """#162 atomicity contract survives when cancel_event is absent."""
-        engine = _fresh_engine(max_budget_tokens=1)
-        words = ' '.join(['word'] * 100)
-
-        result = engine.submit_message(words)  # no cancel_event
-
-        assert result.stop_reason == 'max_budget_reached'
-        assert engine.mutable_messages == []
-
-    def test_no_event_respects_max_turns(self) -> None:
-        """max_turns_reached contract survives when cancel_event is absent."""
-        engine = _fresh_engine(max_turns=1)
-        engine.submit_message('first')
-        result = engine.submit_message('second')  # no cancel_event
-
-        assert result.stop_reason == 'max_turns_reached'
-        assert engine.mutable_messages == ['first']
-
-
-class TestCancellationVsOtherStopReasons:
-    """cancel_event has a defined precedence relative to budget/turns."""
-
-    def test_cancel_precedes_max_turns_check(self) -> None:
-        """If cancel is set when capacity is also full, cancel wins (clearer signal)."""
-        engine = _fresh_engine(max_turns=0)  # immediately full
-        event = threading.Event()
-        event.set()
-
-        result = engine.submit_message('hello', cancel_event=event)
-
-        # cancel_event check is the very first thing in submit_message,
-        # so it fires before the max_turns check even sees capacity
-        assert result.stop_reason == 'cancelled'
-
-    def test_cancel_does_not_override_commit(self) -> None:
-        """Completed turn with late cancel still reports 'completed' \u2014 the
-        turn already succeeded; we don't lie about it."""
-        engine = _fresh_engine()
-        event = threading.Event()
-
-        # Event gets set after the mutation is done \u2014 submit_message doesn't
-        # re-check after commit
-        result = engine.submit_message('hello', cancel_event=event)
-        event.set()
-
-        assert result.stop_reason == 'completed'