docs(roadmap): add branch lock module normalization gap

2026-05-22 13:46:44 +00:00 · 2026-05-21 23:00:47 +00:00
parent f45b651e18
commit 3d877d78f3
1 changed files with 2 additions and 0 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -6675,3 +6675,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 559. **Auto-compaction can refuse to compact very large short sessions because `compact_session` still enforces the default message-count gate even after real provider usage crosses the input-token threshold** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 22:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@9ef521b` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: `gajae-issue-313-omx-launch-resilience-receipt`, `omx-pr-2447-ralplan-consensus-final-review`. Code inspection: `ConversationRuntime::maybe_auto_compact` at `runtime/src/conversation.rs:559-572` triggers from actual cumulative provider `input_tokens`, then calls `compact_session` with only `max_estimated_tokens: 0` overridden. But `compact_session` first calls `should_compact`, and `should_compact` at `runtime/src/compact.rs:41-50` still requires `compactable.len() > config.preserve_recent_messages` before considering token budget. Because `CompactionConfig::default().preserve_recent_messages` is 4, a session with 1-4 extremely large messages and real provider usage above `auto_compaction_input_tokens_threshold` returns `removed_message_count == 0`; `maybe_auto_compact` then silently returns `None`. The existing auto-compaction regression at `conversation.rs:1520-1572` seeds enough turns so the message-count predicate passes, but it does not cover a short huge transcript that crosses the real usage threshold. **Required fix shape:** (a) distinguish manual estimated-token compaction from auto-compaction-after-real-usage; (b) when actual provider usage crosses the threshold, allow compaction even if message count is <= default preserved tail, while preserving at least the latest user/assistant boundary safely; (c) add a regression with one or two huge messages plus `AssistantEvent::Usage { input_tokens: 120_000 }` proving auto-compaction emits `AutoCompactionEvent`; (d) make the skip reason observable when auto-compaction threshold is crossed but no messages are removed (`too_few_messages`, `tool_boundary`, `empty_prefix`, etc.); (e) ensure tool-use/tool-result boundary protection still wins over unsafe compaction. **Why this matters:** provider-reported usage is the trusted signal that the context window is hot. If auto-compaction ignores that signal for short but huge sessions, users hit context exhaustion with no auto-compaction event and no explanation. Source: gaebal-gajae dogfood response to Clawhip message `1507140966774079521` on 2026-05-21.

 560. **Auto-compaction observability reports only removed message count, not token pressure or before/after token estimates, so operators cannot tell whether compaction actually relieved context risk** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 22:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@a8c67b0` and binary built from source SHA `25d663d`. Active tmux sessions at probe time: `gajae-issue-314-omx-launch-resilience-review`, `gajae-pr-314-final-review`, `omx-pr-2447-ralplan-consensus-review3`. Code inspection: `AutoCompactionEvent` at `runtime/src/conversation.rs:123-127` contains only `removed_message_count`; `maybe_auto_compact` at `conversation.rs:559-581` has access to cumulative provider usage and the pre/post session but drops all token-pressure context. The CLI notice at `rusty-claude-cli/src/main.rs:3560-3562` prints only `[auto-compacted: removed N messages]`, and JSON output at `main.rs:5111-5114` exposes only `removed_messages` plus that same notice. The parity harness assertion at `mock_parity_harness.rs:691-712` only checks that the `auto_compaction` key exists and usage input tokens are large; it does not require any before/after estimate, threshold, trigger usage, or reduction metadata. **Required fix shape:** (a) extend `AutoCompactionEvent` with `trigger_input_tokens`, `threshold_input_tokens`, `estimated_tokens_before`, `estimated_tokens_after`, `removed_message_count`, and a `reason/trigger` enum; (b) compute before/after estimates around `compact_session` and include them in CLI text and JSON; (c) add tests proving JSON contains these fields for auto-compaction and that the CLI notice includes enough context to debug risk relief; (d) include a skipped/degraded event shape when threshold crossed but no messages removed, tying into #559; (e) update parity harness to assert semantic values, not merely key presence. **Why this matters:** auto-compaction is a context-safety action. Without token-pressure and reduction telemetry, a user sees “removed 2 messages” but cannot tell if the session went from 120k to 5k tokens, 120k to 119k, or failed to relieve the risk at all. Source: gaebal-gajae dogfood response to Clawhip message `1507148512159203338` on 2026-05-21.
+
+561. **Branch-lock collision detection treats module strings literally, so equivalent paths with `./`, duplicate slashes, or trailing slashes evade same-branch overlap detection** — dogfooded 2026-05-21 from the `#clawcode-building-in-public` 23:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@f45b651` and binary built from source SHA `25d663d`. Active tmux session at probe time: `gajae-issue-316-inflight-review-superseded-merge`. Code inspection: `detect_branch_lock_collisions` in `runtime/src/branch_lock.rs:23-49` delegates to `overlapping_modules`, which compares raw `modules: Vec<String>` entries via `modules_overlap` at `branch_lock.rs:65-69`: exact equality or prefix with `format!("{right}/")` / `format!("{left}/")`. No normalization is applied. As a result, two lanes on the same branch with semantically identical module scopes like `runtime/mcp` vs `./runtime/mcp`, `runtime/mcp` vs `runtime/mcp/`, `runtime//mcp` vs `runtime/mcp`, or `crates/runtime/../runtime/mcp` are not detected as collisions, even though they target the same files. The existing tests cover exact same module, nested raw-prefix module, and different branches only; they do not exercise path normalization or malformed-but-common module inputs. This is distinct from Jobdori's #562 empty-modules whole-branch gap: even non-empty module locks can bypass collision checks when strings differ syntactically. **Required fix shape:** (a) normalize module scopes before comparison by trimming whitespace, removing leading `./`, collapsing repeated slashes, dropping trailing slashes, and resolving `.`/`..` components without escaping repo root; (b) store/report both normalized collision scope and raw input modules for auditability; (c) reject or explicitly mark invalid module paths that normalize outside the repo; (d) add tests for `./runtime/mcp`, `runtime/mcp/`, `runtime//mcp`, nested normalized paths, and invalid `../` escape attempts; (e) keep deterministic sorted/deduped collision output after normalization. **Why this matters:** branch locks are a coordination safety rail. If two agents can claim the same branch/module by spelling the path differently, lock receipts give false confidence and concurrent lanes can still overwrite or review-stomp each other. Source: gaebal-gajae dogfood response to Clawhip message `1507156065941192705` on 2026-05-21.