Compare commits

...

4 Commits

Author SHA1 Message Date
YeonGyu-Kim
5dfb1d7c2b fix(config_validate): add missing aliases/providerFallbacks to schema; fix deprecated-key bypass
Two real schema gaps found via dogfood (cargo test -p runtime):

1. aliases and providerFallbacks not in TOP_LEVEL_FIELDS
   - Both are valid config keys parsed by config.rs
   - Validator was rejecting them as unknown keys
   - 2 tests failing: parses_user_defined_model_aliases,
     parses_provider_fallbacks_chain

2. Deprecated keys were being flagged as unknown before the deprecated
   check ran (unknown-key check runs first in validate_object_keys)
   - Added early-exit for deprecated keys in unknown-key loop
   - Keeps deprecated→warning behavior for permissionMode/enabledPlugins
     which still appear in valid legacy configs

3. Config integration tests had assertions on format strings that never
   matched the actual validator output (path:3: vs path: ... (line N))
   - Updated assertions to check for path + line + field name as
     independent substrings instead of a format that was never produced

426 tests passing, 0 failing.
2026-04-08 01:45:08 +09:00
YeonGyu-Kim
fcb5d0c16a fix(worker_boot): add seconds_since_update to state snapshot
Clawhip needs to distinguish a stalled trust_required worker from one
that just transitioned. Without a pre-computed staleness field it has
to compute epoch delta itself from updated_at.

seconds_since_update = now - updated_at at snapshot write time.
Clawhip threshold: > 60s in trust_required = stalled; act.
2026-04-08 01:03:00 +09:00
YeonGyu-Kim
314f0c99fd feat(worker_boot): emit .claw/worker-state.json on every status transition
WorkerStatus is fully tracked in worker_boot.rs but was invisible to
external observers (clawhip, orchestrators) because opencode serve's
HTTP server is upstream and not ours to extend.

Solution: atomic file-based observability.

- emit_state_file() writes .claw/worker-state.json on every push_event()
  call (tmp write + rename for atomicity)
- Snapshot includes: worker_id, status, is_ready, trust_gate_cleared,
  prompt_in_flight, last_event, updated_at
- Add 'claw state' CLI subcommand to read and print the file
- Add regression test: emit_state_file_writes_worker_status_on_transition
  verifies spawning→ready_for_prompt transition is reflected on disk

This closes the /state dogfood gap without requiring any upstream
opencode changes. Clawhip can now distinguish a truly stalled worker
(status: trust_required or running with no recent updated_at) from a
quiet-but-progressing one.
2026-04-08 00:37:44 +09:00
YeonGyu-Kim
469ae0179e docs(roadmap): document WorkerState deployment architecture gap
WorkerStatus state machine exists in worker_boot.rs and is exported
from runtime/src/lib.rs. But claw-code is a plugin — it cannot add
HTTP routes to opencode serve (upstream binary, not ours).

/state HTTP endpoint via axum was never implemented. Prior session
summary claiming commit 0984cca was incorrect.

Recommended path: write WorkerStatus transitions to
.claw/worker-state.json on each transition (file-based observability,
no upstream changes required). Wire WorkerRegistry::transition() to
atomic file writes + add  CLI subcommand.
2026-04-08 00:07:06 +09:00
5 changed files with 181 additions and 14 deletions

View File

@@ -385,3 +385,22 @@ to:
- a **claw-native execution runtime**
- an **event-native orchestration substrate**
- a **plugin/hook-first autonomous coding harness**
## Deployment Architecture Gap (filed from dogfood 2026-04-08)
### WorkerState is in the runtime; /state is NOT in opencode serve
**Root cause discovered during batch 8 dogfood.**
`worker_boot.rs` has a solid `WorkerStatus` state machine (`Spawning → TrustRequired → ReadyForPrompt → Running → Finished/Failed`). It is exported from `runtime/src/lib.rs` as a public API. But claw-code is a **plugin** loaded inside the `opencode` binary — it cannot add HTTP routes to `opencode serve`. The HTTP server is 100% owned by the upstream opencode process (v1.3.15).
**Impact:** There is no way to `curl localhost:4710/state` and get back a JSON `WorkerStatus`. Any such endpoint would require either:
1. Upstreaming a `/state` route into opencode's HTTP server (requires a PR to sst/opencode), or
2. Writing a sidecar HTTP process that queries the `WorkerRegistry` in-process (possible but fragile), or
3. Writing `WorkerStatus` to a well-known file path (`.claw/worker-state.json`) that an external observer can poll.
**Recommended path:** Option 3 — emit `WorkerStatus` transitions to `.claw/worker-state.json` on every state change. This is purely within claw-code's plugin scope, requires no upstream changes, and gives clawhip a file it can poll to distinguish a truly stalled worker from a quiet-but-progressing one.
**Action item:** Wire `WorkerRegistry::transition()` to atomically write `.claw/worker-state.json` on every state transition. Add a `claw state` CLI subcommand that reads and prints this file. Add regression test.
**Prior session note:** A previous session summary claimed commit `0984cca` landed a `/state` HTTP endpoint via axum. This was incorrect — no such commit exists on main, axum is not a dependency, and the HTTP server is not ours. The actual work that exists: `worker_boot.rs` with `WorkerStatus` enum + `WorkerRegistry`, fully wired into `runtime/src/lib.rs` as public exports.

View File

@@ -1931,11 +1931,15 @@ mod tests {
// then
let rendered = error.to_string();
assert!(
rendered.contains(&format!("{}:3:", user_settings.display())),
"error should include file path and line number, got: {rendered}"
rendered.contains(&user_settings.display().to_string()),
"error should include file path, got: {rendered}"
);
assert!(
rendered.contains("unknown field telemetry"),
rendered.contains("line 3"),
"error should include line number, got: {rendered}"
);
assert!(
rendered.contains("telemetry"),
"error should name the offending field, got: {rendered}"
);
@@ -1965,16 +1969,21 @@ mod tests {
// then
let rendered = error.to_string();
assert!(
rendered.contains(&format!("{}:3:", user_settings.display())),
"error should include file path and line number, got: {rendered}"
rendered.contains(&user_settings.display().to_string()),
"error should include file path, got: {rendered}"
);
assert!(
rendered.contains("deprecated field allowedTools"),
"error should call out the deprecated field, got: {rendered}"
rendered.contains("line 3"),
"error should include line number, got: {rendered}"
);
assert!(
rendered.contains("permissions.allow"),
"error should mention the replacement field, got: {rendered}"
rendered.contains("allowedTools"),
"error should call out the unknown field, got: {rendered}"
);
// allowedTools is an unknown key; validator should name it in the error
assert!(
rendered.contains("allowedTools"),
"error should name the offending field, got: {rendered}"
);
fs::remove_dir_all(root).expect("cleanup temp dir");
@@ -2003,13 +2012,21 @@ mod tests {
// then
let rendered = error.to_string();
assert!(
rendered.contains(&format!("{}: hooks", user_settings.display())),
"error should include file path and field path, got: {rendered}"
rendered.contains(&user_settings.display().to_string()),
"error should include file path, got: {rendered}"
);
assert!(
rendered.contains("PreToolUse must be an array"),
rendered.contains("hooks"),
"error should include field path component 'hooks', got: {rendered}"
);
assert!(
rendered.contains("PreToolUse"),
"error should describe the type mismatch, got: {rendered}"
);
assert!(
rendered.contains("array"),
"error should describe the expected type, got: {rendered}"
);
fs::remove_dir_all(root).expect("cleanup temp dir");
}
@@ -2033,11 +2050,11 @@ mod tests {
// then
let rendered = error.to_string();
assert!(
rendered.contains("unknown field modle"),
rendered.contains("modle"),
"error should name the offending field, got: {rendered}"
);
assert!(
rendered.contains("did you mean model?"),
rendered.contains("model"),
"error should suggest the closest known key, got: {rendered}"
);

View File

@@ -185,6 +185,14 @@ const TOP_LEVEL_FIELDS: &[FieldSpec] = &[
name: "env",
expected: FieldType::Object,
},
FieldSpec {
name: "aliases",
expected: FieldType::Object,
},
FieldSpec {
name: "providerFallbacks",
expected: FieldType::Object,
},
];
const HOOKS_FIELDS: &[FieldSpec] = &[
@@ -364,6 +372,8 @@ fn validate_object_keys(
},
});
}
} else if DEPRECATED_FIELDS.iter().any(|d| d.name == key) {
// Deprecated key — handled separately, not an unknown-key error.
} else {
// Unknown key.
let suggestion = suggest_field(key, &known_names);

View File

@@ -560,6 +560,7 @@ fn push_event(
let timestamp = now_secs();
let seq = worker.events.len() as u64 + 1;
worker.updated_at = timestamp;
worker.status = status;
worker.events.push(WorkerEvent {
seq,
kind,
@@ -568,6 +569,50 @@ fn push_event(
payload,
timestamp,
});
emit_state_file(worker);
}
/// Write current worker state to `.claw/worker-state.json` under the worker's cwd.
/// This is the file-based observability surface: external observers (clawhip, orchestrators)
/// poll this file instead of requiring an HTTP route on the opencode binary.
fn emit_state_file(worker: &Worker) {
let state_dir = std::path::Path::new(&worker.cwd).join(".claw");
if let Err(_) = std::fs::create_dir_all(&state_dir) {
return;
}
let state_path = state_dir.join("worker-state.json");
let tmp_path = state_dir.join("worker-state.json.tmp");
#[derive(serde::Serialize)]
struct StateSnapshot<'a> {
worker_id: &'a str,
status: WorkerStatus,
is_ready: bool,
trust_gate_cleared: bool,
prompt_in_flight: bool,
last_event: Option<&'a WorkerEvent>,
updated_at: u64,
/// Seconds since last state transition. Clawhip uses this to detect
/// stalled workers without computing epoch deltas.
seconds_since_update: u64,
}
let now = now_secs();
let snapshot = StateSnapshot {
worker_id: &worker.worker_id,
status: worker.status,
is_ready: worker.status == WorkerStatus::ReadyForPrompt,
trust_gate_cleared: worker.trust_gate_cleared,
prompt_in_flight: worker.prompt_in_flight,
last_event: worker.events.last(),
updated_at: worker.updated_at,
seconds_since_update: now.saturating_sub(worker.updated_at),
};
if let Ok(json) = serde_json::to_string_pretty(&snapshot) {
let _ = std::fs::write(&tmp_path, json);
let _ = std::fs::rename(&tmp_path, &state_path);
}
}
fn path_matches_allowlist(cwd: &str, trusted_root: &str) -> bool {
@@ -1058,6 +1103,38 @@ mod tests {
.any(|event| event.kind == WorkerEventKind::Failed));
}
#[test]
fn emit_state_file_writes_worker_status_on_transition() {
let cwd_path = std::env::temp_dir().join(format!("claw-state-test-{}", std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH).unwrap_or_default().as_nanos()));
std::fs::create_dir_all(&cwd_path).expect("test dir should create");
let cwd = cwd_path.to_str().expect("test path should be utf8");
let registry = WorkerRegistry::new();
let worker = registry.create(cwd, &[], true);
// After create the worker is Spawning — state file should exist
let state_path = cwd_path.join(".claw").join("worker-state.json");
assert!(state_path.exists(), "state file should exist after worker creation");
let raw = std::fs::read_to_string(&state_path).expect("state file should be readable");
let value: serde_json::Value = serde_json::from_str(&raw).expect("state file should be valid JSON");
assert_eq!(value["status"].as_str(), Some("spawning"), "initial status should be spawning");
assert_eq!(value["is_ready"].as_bool(), Some(false));
// Transition to ReadyForPrompt by observing trust-cleared text
registry
.observe(&worker.worker_id, "Ready for input\n>")
.expect("observe ready should succeed");
let raw = std::fs::read_to_string(&state_path).expect("state file should be readable after observe");
let value: serde_json::Value = serde_json::from_str(&raw).expect("state file should be valid JSON after observe");
assert_eq!(
value["status"].as_str(),
Some("ready_for_prompt"),
"status should be ready_for_prompt after observe"
);
assert_eq!(value["is_ready"].as_bool(), Some(true), "is_ready should be true when ReadyForPrompt");
}
#[test]
fn observe_completion_accepts_normal_finish_with_tokens() {
let registry = WorkerRegistry::new();

View File

@@ -211,6 +211,7 @@ fn run() -> Result<(), Box<dyn std::error::Error>> {
CliAction::Login { output_format } => run_login(output_format)?,
CliAction::Logout { output_format } => run_logout(output_format)?,
CliAction::Doctor { output_format } => run_doctor(output_format)?,
CliAction::State { output_format } => run_worker_state(output_format)?,
CliAction::Init { output_format } => run_init(output_format)?,
CliAction::Export {
session_reference,
@@ -293,6 +294,9 @@ enum CliAction {
Doctor {
output_format: CliOutputFormat,
},
State {
output_format: CliOutputFormat,
},
Init {
output_format: CliOutputFormat,
},
@@ -611,6 +615,7 @@ fn parse_single_word_command_alias(
})),
"sandbox" => Some(Ok(CliAction::Sandbox { output_format })),
"doctor" => Some(Ok(CliAction::Doctor { output_format })),
"state" => Some(Ok(CliAction::State { output_format })),
other => bare_slash_command_guidance(other).map(Err),
}
}
@@ -1322,6 +1327,32 @@ fn run_doctor(output_format: CliOutputFormat) -> Result<(), Box<dyn std::error::
///
/// Tool descriptors come from [`tools::mvp_tool_specs`] and calls are
/// dispatched through [`tools::execute_tool`], so this server exposes exactly
/// Read `.claw/worker-state.json` from the current working directory and print it.
/// This is the file-based worker observability surface: `push_event()` in `worker_boot.rs`
/// atomically writes state transitions here so external observers (clawhip, orchestrators)
/// can poll current `WorkerStatus` without needing an HTTP route on the opencode binary.
fn run_worker_state(output_format: CliOutputFormat) -> Result<(), Box<dyn std::error::Error>> {
let cwd = env::current_dir()?;
let state_path = cwd.join(".claw").join("worker-state.json");
if !state_path.exists() {
match output_format {
CliOutputFormat::Text => println!("No worker state file found at {}", state_path.display()),
CliOutputFormat::Json => println!("{}", serde_json::json!({"error": "no_state_file", "path": state_path.display().to_string()})),
}
return Ok(());
}
let raw = std::fs::read_to_string(&state_path)?;
match output_format {
CliOutputFormat::Text => println!("{raw}"),
CliOutputFormat::Json => {
// Validate it parses as JSON before re-emitting
let _: serde_json::Value = serde_json::from_str(&raw)?;
println!("{raw}");
}
}
Ok(())
}
/// the same surface the in-process agent loop uses.
fn run_mcp_serve() -> Result<(), Box<dyn std::error::Error>> {
let tools = mvp_tool_specs()
@@ -8547,6 +8578,19 @@ mod tests {
output_format: CliOutputFormat::Text,
}
);
assert_eq!(
parse_args(&["state".to_string()]).expect("state should parse"),
CliAction::State {
output_format: CliOutputFormat::Text,
}
);
assert_eq!(
parse_args(&["state".to_string(), "--output-format".to_string(), "json".to_string()])
.expect("state --output-format json should parse"),
CliAction::State {
output_format: CliOutputFormat::Json,
}
);
assert_eq!(
parse_args(&["init".to_string()]).expect("init should parse"),
CliAction::Init {