docs(roadmap): add git availability hang

This commit is contained in:
Yeachan-Heo
2026-05-20 12:01:19 +00:00
parent 4d185f0dee
commit 916bf5f24d

View File

@@ -6541,3 +6541,5 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
492. **`system-prompt --output-format json` and `status --output-format json` can both zero-byte hang on `GitContext`'s unbounded branch/log/staged-file probes, so prompt provenance and local diagnostics share the same startup deadlock surface** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 11:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@56555a3` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `git` shim that sleeps 20s only for `rev-parse --abbrev-ref HEAD`, `log --oneline`, and `diff --cached --name-only` (the `GitContext::detect` branch/recent-commits/staged-files probes) while delegating all other git commands to `/usr/bin/git`. Results: `claw system-prompt --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw status --output-format json` also timed out with zero output. Earlier controls showed shims delaying unrelated `fetch`/`ls-remote` do not affect these commands. **Required fix shape:** (a) route `GitContext::detect` probes through the same timeout-aware git helper as #490/#491; (b) make `system-prompt` emit a bounded degraded prompt-provenance envelope when git context times out (`git_context:{status:"timeout", branch:null, recent_commits_truncated:true}`) instead of hanging; (c) make `status` omit/degrade `GitContext` fields independently from boot-preflight metadata; (d) add fake-git shim regressions for each GitContext probe (`rev-parse --abbrev-ref`, `log --oneline`, `diff --cached --name-only`) across both `system-prompt` and `status`. **Why this matters:** `system-prompt` is the primary prompt-misdelivery/provenance debugger. If the prompt renderer itself blocks on git context before emitting JSON, users cannot inspect what prompt would have been delivered when startup is slow or git is locked. Source: gaebal-gajae dogfood response to Clawhip message `1506612484520542258` on 2026-05-20.
493. **`status` and `doctor` block on `tmux --version` availability checks with no timeout, so a wedged tmux binary/socket makes the local health surfaces zero-byte hang even though prompt rendering still works** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 11:30 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@2bf6924` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `tmux` shim at the front of `PATH` that sleeps 20s for every invocation; `git` was normal. Results: `claw status --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw doctor --output-format json` also timed out with zero output; control `claw system-prompt --output-format json` exited 0 with prompt JSON under the same fake-tmux environment. Code path: `build_boot_preflight_snapshot` populates `required_binaries` via `command_available("tmux")`, which runs `tmux --version` through blocking `.output()` and no deadline. **Required fix shape:** (a) route `command_available` checks through a small timeout helper; (b) mark slow binary probes as `{name:"tmux", available:null, timeout:true}` instead of blocking the entire status/doctor response; (c) avoid invoking `tmux` at all when `$TMUX` is absent if the field is only informational, or cache the availability probe; (d) add fake-binary shim regressions for slow `tmux` and slow `git --version` proving status/doctor stay bounded. **Why this matters:** status/doctor are the recovery surfaces for tmux/session lifecycle breakage. If a broken `tmux` itself prevents the health command from returning JSON, orchestrators lose the exact diagnostic path needed to explain session/pane failures. Source: gaebal-gajae dogfood response to Clawhip message `1506620033886060665` on 2026-05-20.
494. **`status`/`doctor` also block on `git --version` binary-availability checks with no timeout, while `diff` and `system-prompt` still work, so an unhealthy git binary can kill the health surfaces before they report degraded tool availability** — dogfooded 2026-05-20 from the `#clawcode-building-in-public` 12:00 UTC nudge on `/home/bellman/Workspace/claw-code-pr2967` with branch/origin `docs/roadmap-workdir-provenance@4d185f0` and binary `./rust/target/debug/claw` built from source SHA `25d663d`. Fixture: clean isolated git repo plus fake `git` shim at the front of `PATH` that sleeps 20s only for `git --version` and delegates all other git commands to `/usr/bin/git`. Results: `claw status --output-format json` timed out after 6s with `stdout=0`/`stderr=0`; `claw doctor --output-format json` timed out with zero output. Positive controls under the same environment: `claw diff --output-format json` returned `{"kind":"diff","result":"clean"...}` and `claw system-prompt --output-format json` returned prompt JSON, proving ordinary git operations were fine and the hang is specifically `command_available("git")` in boot preflight `required_binaries`. **Required fix shape:** (a) same timeout-aware binary probe as #493, but cover `git` and `claw` too; (b) represent slow probes as timeout/degraded availability instead of blocking status/doctor; (c) prefer `which`/PATH existence plus optional bounded version string over mandatory `--version`; (d) add fake-binary regressions for slow `git --version`, slow `tmux --version`, and slow current-exe/version probe. **Why this matters:** status/doctor should explain missing or broken binaries. If the binary availability probe itself hangs, health checks fail before they can say `git` is unhealthy, forcing operators back to shell debugging. Source: gaebal-gajae dogfood response to Clawhip message `1506627582756651018` on 2026-05-20.