Commit Graph

54 Commits

Author SHA1 Message Date
bellman
be8112f5f5 feat: add native Ollama provider support via OLLAMA_HOST env var
- OLLAMA_HOST takes priority over OPENAI_BASE_URL for local Ollama instances
- No API key required; placeholder token used for Authorization header
- Model names like 'qwen3:8b' bypass strict provider/model syntax validation
- detect_provider_kind() checks OLLAMA_HOST first in routing cascade
- ProviderClient dispatch uses from_ollama_env() when OLLAMA_HOST is set
- Updated USAGE.md and docs with OLLAMA_HOST as preferred env var
- Added OLLAMA_CONFIG constant and from_ollama_env() to openai_compat
- Added test_ollama_host_bypasses_model_validation unit test
- Supersedes PR #3213 (which had a duplicate if-let bug in mod.rs)
2026-06-05 12:12:56 +09:00
TheArchitectit
9e50cb6e20 Merge remote-tracking branch 'upstream/main' into worktree-api-timeout-retry-v2
# Conflicts:
#	rust/crates/runtime/src/config.rs
#	rust/crates/runtime/src/lib.rs
2026-06-04 09:17:43 -05:00
bellman
bcc5bfde9c fix: route local OpenAI-compatible models 2026-06-03 23:16:46 +09:00
bellman
54d785d0c0 fix: preserve DeepSeek V4 thinking history 2026-06-03 21:53:54 +09:00
TheArchitectit
04bc5f5788 feat: API timeout config, Retry-After header, configurable retry, and 400 transient retry
Cherry-picked from PR #2816 onto current upstream/main, resolving
conflicts from PR #3015's merge (which added retry_after to ApiError
but some construction sites were missing it).

Commits preserved:
- ade85398: API timeout config, Retry-After header, configurable retry
  - TimeoutConfig in HTTP client builder (connect 30s, request 5min)
  - CLAW_API_CONNECT_TIMEOUT and CLAW_API_REQUEST_TIMEOUT env vars
  - Retry-After header parsing on 429 responses
  - ApiTimeoutConfig in runtime config (settings.json)
- 8a883430: retry 400 responses with transient gateway error bodies
  - Detects known gateway phrases in 400 response bodies
  - Marks them as retryable instead of hard-failing
- ed91a61e: add 'no parseable body' to CONTEXT_WINDOW_ERROR_MARKERS
  - Some providers return 400 with 'no parseable body' for oversized
    requests instead of a proper context_length_exceeded error

Commits skipped (already in upstream via PR #3015):
- 453ab642: optional id field (already merged)
- baa8d1ba: HTML detection in streaming (already merged)
- 33d2f789: JSON error detection in streaming (already merged)

8 files changed, 299 insertions, 80 deletions
2026-06-02 15:35:29 -05:00
TheArchitectit
414a1aca4f fix: retry 400 responses with transient gateway error bodies
Some providers/proxies return HTTP 400 with bodies like "no parseable
body" or "connection reset" during transient network blips. These are
not real bad requests — they're gateway errors wearing a 400 mask.
Detect known gateway error phrases in 400 response bodies and mark
them as retryable so the existing exponential backoff handles them.
2026-06-02 15:30:41 -05:00
TheArchitectit
d8c57ed317 feat: API timeout config, Retry-After header support, and configurable retry
- Add TimeoutConfig to HTTP client builder with connect_timeout (30s)
  and request_timeout (5min) defaults, configurable via
  CLAW_API_CONNECT_TIMEOUT and CLAW_API_REQUEST_TIMEOUT env vars
- Add with_timeout() builder to both AnthropicClient and
  OpenAiCompatClient for per-client timeout configuration
- Parse Retry-After header on 429 responses and use it to override
  exponential backoff delay when present
- Add ApiTimeoutConfig to runtime config with apiTimeout settings
  in ~/.claw/settings.json (connectTimeout, requestTimeout, maxRetries)
- Add retry_after field to ApiError::Api for propagating rate limit
  backoff hints through the retry pipeline
2026-06-02 15:30:22 -05:00
Yeachan-Heo
779cf1c234 test(api): fill thinking in stream chunk fixtures 2026-05-25 12:49:36 +09:00
YeonGyu-Kim
495e7a015c fix: remove stale retry_after field, Team variant, config_load_error_kind, denied_tools initializer errors
- Remove retry_after: None from ApiError::Api structs in openai_compat.rs (field was removed)
- Remove SlashCommand::Team parse arm (variant was removed from enum)
- Add config_load_error_kind: None to doctor path StatusContext initializer
- Add Thinking arm to all ContentBlock match blocks in trident.rs
- Remove cargo fmt drift across commands, config, compact, tools, trident
2026-05-25 12:01:09 +09:00
YeonGyu-Kim
3364dc4bee chore: fix conflict markers and cargo fmt drift in main (commands, openai_compat, trident, config, tools) 2026-05-25 11:51:44 +09:00
TheArchitectit
7149bbc3d9 fix: streaming robustness — OpenAI parsing, error detection, reasoning content
Improves SSE parsing with raw JSON error detection, HTML response detection (for misconfigured endpoints), thinking/reasoning content from provider-specific delta fields, #[serde(default)] on streaming types for lenient deserialization, compact session boundary guard, and /team slash command. Adds install.sh convenience script.
2026-05-25 11:22:47 +09:00
Ajinkya Kardile
b071fac2cf feat: add native Gemini support to openai_compat provider
Adds early return in wire_model_for_base_url for Gemini/Gemma/XAI/Kimi/Grok model prefixes to ensure the provider prefix is preserved correctly when routing through the OpenAI-compatible provider path.
2026-05-25 11:21:37 +09:00
bellman
04c2abb412 Stabilize final gate before release checkpoint
Resolve the G012 evidence gate by fixing permission-mode regressions, platform-sensitive tests, and the clippy surface that blocked an all-targets verification run.

Constraint: G012 final gate required docs, board, full workspace tests, and clippy -D warnings evidence before checkpointing.

Rejected: documenting the worker-2 gate failure as an accepted gap | the failing tests and lints were locally reproducible and fixable.

Confidence: high

Scope-risk: moderate

Directive: Preserve read-only permission requirements for read/glob/grep tools; write/edit remain workspace-write or danger-full-access when outside the workspace.

Tested: python3 .github/scripts/check_doc_source_of_truth.py; python3 .github/scripts/check_release_readiness.py; python3 scripts/validate_cc2_board.py --board .omx/cc2/board.json; python3 .omx/cc2/validate_issue_parity_intake.py .omx/cc2/issue-parity-intake.json; cargo fmt --manifest-path rust/Cargo.toml --all -- --check; cargo check --manifest-path rust/Cargo.toml --workspace; cargo test --manifest-path rust/Cargo.toml --workspace -- --nocapture; cargo clippy --manifest-path rust/Cargo.toml --workspace --all-targets -- -D warnings

Not-tested: live network provider smoke tests and remote PR/issue mutations.
2026-05-15 13:34:57 +09:00
bellman
ea95bf2576 omx(team): auto-checkpoint worker-3 [unknown] 2026-05-15 10:30:16 +09:00
bellman
dec8efa5c8 omx(team): auto-checkpoint worker-1 [1] 2026-05-15 10:30:09 +09:00
bellman
bc32639ce3 omx(team): auto-checkpoint worker-1 [1] 2026-05-15 10:30:03 +09:00
bellman
a212c662e5 omx(team): auto-checkpoint worker-1 [1] 2026-05-15 10:30:00 +09:00
bellman
2cac66cd38 Stabilize provider compatibility integration verification
Keep integrated G008 provider changes formatted and compile-ready so worker follow-up commits can merge against a clean leader baseline.

Constraint: G008 provider verification must pass before ultragoal checkpointing.
Confidence: high
Scope-risk: narrow
Directive: Keep provider compatibility follow-ups rebased on this formatted baseline before retrying failed cherry-picks.
Tested: cargo test --manifest-path rust/Cargo.toml -p api providers:: -- --nocapture; cargo test --manifest-path rust/Cargo.toml -p api --test openai_compat_integration -- --nocapture --test-threads=1
Not-tested: full workspace clippy; known pre-existing runtime policy_engine LaneContext clippy warning remains outside this change.

Co-authored-by: OmX <omx@oh-my-codex.dev>
2026-05-15 10:28:50 +09:00
bellman
685f078204 omx(team): auto-checkpoint worker-1 [1] 2026-05-15 10:23:37 +09:00
bellman
a6ca5c489b omx(team): auto-checkpoint worker-4 [unknown] 2026-05-15 10:21:28 +09:00
bellman
29029bfc14 omx(team): auto-checkpoint worker-1 [1] 2026-05-15 10:21:18 +09:00
YeonGyu-Kim
75c08bc982 fix: REPL display, /compact panic, identity leak, DeepSeek reasoning, thinking blocks
Five interrelated fixes from parallel Hephaestus sessions:

1. fix(repl): display assistant text after spinner (#2981, #2982, #2937)
   - Added final_assistant_text() call after run_turn spinner completes
   - REPL now shows response text like run_prompt_json does

2. fix(compact): handle Thinking content blocks (#2985)
   - Added ContentBlock::Thinking variant throughout compact summarizer
   - Prevents panic when /compact encounters thinking blocks

3. fix(prompt): provider-aware model identity (#2822)
   - New ModelFamilyIdentity enum (Claude vs Generic)
   - Non-Anthropic models no longer say 'I am Claude'
   - model_family_identity_for() detects provider and sets identity

4. fix(openai): preserve DeepSeek reasoning_content (#2821)
   - Stream parser now captures reasoning_content from OpenAI-compat
   - Emits ThinkingDelta/SignatureDelta events for reasoning models
   - Thinking blocks included in conversation history for re-send

5. feat(runtime): Thinking block support across codebase
   - AssistantEvent::Thinking variant in conversation.rs
   - ContentBlock::Thinking in session serialization
   - Thinking-aware compact summarization
   - Tests for thinking block ordering and content

Closes #2981, #2982, #2937, #2985, #2822, #2821
2026-05-06 15:32:34 +09:00
Yeachan-Heo
74ea754d29 Restore Rust formatting compliance
Run rustfmt from the Rust workspace so CI format checks pass without changing behavior.

Constraint: Scope is formatting-only across tracked Rust files

Confidence: high

Scope-risk: narrow

Tested: cd rust && cargo fmt --check

Tested: git diff --check
2026-04-28 09:19:16 +00:00
Yeachan-Heo
d037f9faa8 Fix strip_routing_prefix to handle kimi provider prefix (US-023)
Add "kimi" to the strip_routing_prefix matches so that models like
"kimi/kimi-k2.5" have their prefix stripped before sending to the
DashScope API (consistent with qwen/openai/xai/grok handling).

Also add unit test strip_routing_prefix_strips_kimi_provider_prefix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:50:15 +00:00
Yeachan-Heo
4cb1db9faa Implement US-022: Enhanced error context for API failures
Add structured error context to API failures:
- Request ID tracking across retries with full context in error messages
- Provider-specific error code mapping with actionable suggestions
- Suggested user actions for common error types (401, 403, 413, 429, 500, 502-504)
- Added suggested_action field to ApiError::Api variant
- Updated enrich_bearer_auth_error to preserve suggested_action

Files changed:
- rust/crates/api/src/error.rs: Add suggested_action field, update Display
- rust/crates/api/src/providers/openai_compat.rs: Add suggested_action_for_status()
- rust/crates/api/src/providers/anthropic.rs: Update error handling

All tests pass, clippy clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 19:15:00 +00:00
Yeachan-Heo
5e65b33042 US-021: Add request body size pre-flight check for OpenAI-compatible provider 2026-04-16 17:41:57 +00:00
Yeachan-Heo
87b982ece5 US-011: Performance optimization for API request serialization
Added criterion benchmarks and optimized flatten_tool_result_content:
- Added criterion dev-dependency and request_building benchmark suite
- Optimized flatten_tool_result_content to pre-allocate capacity and avoid
  intermediate Vec construction (was collecting to Vec then joining)
- Made key functions public for benchmarking: translate_message,
  build_chat_completion_request, flatten_tool_result_content,
  is_reasoning_model, model_rejects_is_error_field

Benchmark results:
- flatten_tool_result_content/single_text: ~17ns
- translate_message/text_only: ~200ns
- build_chat_completion_request/10 messages: ~16.4µs
- is_reasoning_model detection: ~26-42ns

All 119 unit tests and 29 integration tests pass.
cargo clippy passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:11:45 +00:00
Yeachan-Heo
3e4e1585b5 US-009: Add comprehensive unit tests for kimi model compatibility fix
Added 4 unit tests to verify is_error field handling for kimi models:
- model_rejects_is_error_field_detects_kimi_models: Detects kimi-k2.5, kimi-k1.5, dashscope/kimi-k2.5 (case insensitive)
- translate_message_includes_is_error_for_non_kimi_models: Verifies gpt-4o, grok-3, claude include is_error
- translate_message_excludes_is_error_for_kimi_models: Verifies kimi models exclude is_error (prevents 400 Bad Request)
- build_chat_completion_request_kimi_vs_non_kimi_tool_results: Full integration test for request building

All 119 unit tests and 29 integration tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 10:54:48 +00:00
Yeachan-Heo
124e8661ed Remove the deprecated Claude subscription login path and restore a green Rust workspace
ROADMAP #37 was still open even though several earlier backlog items were
already closed. This change removes the local login/logout surface, stops
startup auth resolution from treating saved OAuth credentials as a supported
path, and updates diagnostics/help to point users at ANTHROPIC_API_KEY or
ANTHROPIC_AUTH_TOKEN only.

While proving the change with the user-requested workspace gates, clippy
surfaced additional pre-existing warning failures across the Rust workspace.
Those were cleaned up in-place so the required `cargo fmt`, `cargo clippy
--workspace --all-targets -- -D warnings`, and `cargo test --workspace`
sequence now passes end to end.

Constraint: User explicitly required full-workspace fmt/clippy/test before commit/push
Constraint: Existing dirty leader worktree had to be stashed before attempted OMX team worktree launch
Rejected: Keep login/logout but hide them from help | left unsupported auth flow and saved OAuth fallback intact
Rejected: Stop after ROADMAP #37 targeted tests | did not satisfy required full-workspace verification gate
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: Do not reintroduce saved OAuth as a silent Anthropic startup fallback without an explicit supported auth policy
Tested: cargo fmt --all --check; cargo clippy --workspace --all-targets -- -D warnings; cargo test --workspace
Not-tested: Remote push effects beyond origin/main update
2026-04-11 17:24:44 +00:00
YeonGyu-Kim
6ae8850d45 fix(api): silence dead_code warning and remove duplicated #[test] attr
- Add #[allow(dead_code)] on test-only Delta struct (content field
  used for deserialization but not read in assertion)
- Remove duplicated #[test] attribute on
  assistant_message_without_tool_calls_omits_tool_calls_field

Zero warnings in cargo test --workspace.
2026-04-10 07:33:22 +09:00
YeonGyu-Kim
a3d0c9e5e7 fix(api): sanitize orphaned tool messages at request-building layer
Adds sanitize_tool_message_pairing() called from build_chat_completion_request()
after translate_message() runs. Drops any role:"tool" message whose
immediately-preceding non-tool message is role:"assistant" but has no
tool_calls entry matching the tool_call_id.

This is the second layer of the tool-pairing invariant defense:
- 6e301c8: compaction boundary fix (producer layer)
- this commit: request-builder sanitizer (sender layer)

Together these close the 400-error loop for resumed/compacted multi-turn
tool sessions on OpenAI-compatible backends.

Sanitization only fires when preceding message is role:assistant (not
user/system) to avoid dropping valid translation artifacts from mixed
user-message content blocks.

Regression tests: sanitize_drops_orphaned_tool_messages covers valid pair,
orphaned tool (no tool_calls in preceding assistant), mismatched id, and
two tool results both referencing the same assistant turn.

116 api + 159 CLI + 431 runtime tests pass. Fmt clean.
2026-04-10 01:35:00 +09:00
YeonGyu-Kim
ed42f8f298 fix(api): surface provider error in SSE stream frames (companion to ff416ff)
Same fix as ff416ff but for the streaming path. Some backends embed an
error JSON object in an SSE data: frame:

  data: {"error":{"message":"context too long","code":400}}

parse_sse_frame() was attempting to deserialize this as ChatCompletionChunk
and failing with 'missing field' / 'invalid type', hiding the actual
backend error message.

Fix: check for an 'error' key before full chunk deserialization, same as
the non-streaming path in ff416ff. Symmetric pair:
- ff416ff: non-streaming path (response body)
- this:    streaming path (SSE data: frame)

115 api + 159 CLI tests pass. Fmt clean.
2026-04-09 23:03:33 +09:00
YeonGyu-Kim
ff416ff3e7 fix(api): surface provider error body before attempting completion parse
When a local/proxy OpenAI-compatible backend returns an error object:
  {"error":{"message":"...","type":"...","code":...}}

claw was trying to deserialize it as a ChatCompletionResponse and
failing with the cryptic 'failed to parse OpenAI response: missing
field id', completely hiding the actual backend error message.

Fix: before full deserialization, check if the parsed JSON has an
'error' key and promote it directly to ApiError::Api so the user
sees the real error (e.g. 'The number of tokens to keep from the
initial prompt is greater than the context length').

Source: devilayu in #claw-code 2026-04-09 — local LM Studio context
limit error was invisible; user saw 'missing field id' instead.
159 CLI + 115 api tests pass. Fmt clean.
2026-04-09 22:33:07 +09:00
YeonGyu-Kim
6ac7d8cd46 fix(api): omit tool_calls field from assistant messages when empty
When serializing a multi-turn conversation for the OpenAI-compatible path,
assistant messages with no tool calls were always emitting 'tool_calls: []'.
Some providers reject requests where a prior assistant turn carries an
explicit empty tool_calls array (400 on subsequent turns after a plain
text assistant response).

Fix: only include 'tool_calls' in the serialized assistant message when
the vec is non-empty. Empty case omits the field entirely.

This is a companion fix to fd7aade (null tool_calls in stream delta).
The two bugs are symmetric: fd7aade handled inbound null -> empty vec;
this handles outbound empty vec -> field omitted.

Two regression tests added:
- assistant_message_without_tool_calls_omits_tool_calls_field
- assistant_message_with_tool_calls_includes_tool_calls_field

115 api tests pass. Fmt clean.

Source: gaebal-gajae repro 2026-04-09 (400 on multi-turn, companion to
null tool_calls stream-delta fix).
2026-04-09 22:06:25 +09:00
YeonGyu-Kim
fd7aade5b5 fix(api): tolerate null tool_calls in OpenAI-compat stream delta chunks
Some OpenAI-compatible providers emit 'tool_calls: null' in streaming
delta chunks instead of omitting the field or using an empty array:

  "delta": {"content":"","function_call":null,"tool_calls":null}

serde's #[serde(default)] only handles absent keys — an explicit null
value still fails deserialization with:
  'invalid type: null, expected a sequence'

Fix: replace #[serde(default)] with a custom deserializer helper
deserialize_null_as_empty_vec() that maps null -> Vec::default(),
keeping the existing absent-key default behaviour.

Regression test added: delta_with_null_tool_calls_deserializes_as_empty_vec
uses the exact provider response shape from gaebal-gajae's repro (2026-04-09).

112 api lib tests pass. Fmt clean.

Companion to gaebal-gajae's local 448cf2c — independently reproduced
and landed on main.
2026-04-09 21:39:52 +09:00
YeonGyu-Kim
eb044f0a02 fix(api): emit max_completion_tokens for gpt-5* on OpenAI-compat path — closes ROADMAP #35
gpt-5.x models reject requests with max_tokens and require max_completion_tokens.
Detect wire model starting with 'gpt-5' and switch the JSON key accordingly.
Older models (gpt-4o etc.) continue to receive max_tokens unchanged.

Two regression tests added:
- gpt5_uses_max_completion_tokens_not_max_tokens
- non_gpt5_uses_max_tokens

140 api tests pass, cargo fmt clean.
2026-04-09 09:33:45 +09:00
Jobdori
e4c3871882 feat(api): add reasoning_effort field to MessageRequest and OpenAI-compat path
Users of OpenAI-compatible reasoning models (o4-mini, o3, deepseek-r1,
etc.) had no way to control reasoning effort — the field was missing from
MessageRequest and never emitted in the request body.

Changes:
- Add `reasoning_effort: Option<String>` to `MessageRequest` in types.rs
  - Annotated with skip_serializing_if = "Option::is_none" for clean JSON
  - Accepted values: "low", "medium", "high" (passed through verbatim)
- In `build_chat_completion_request`, emit `"reasoning_effort"` when set
- Two unit tests:
  - `reasoning_effort_is_included_when_set`: o4-mini + "high" → field present
  - `reasoning_effort_omitted_when_not_set`: gpt-4o, no field → absent

Existing callers use `..Default::default()` and are unaffected.
One struct-literal test that listed all fields explicitly updated with
`reasoning_effort: None`.

The CLI flag to expose this to users is a follow-up (ROADMAP #34 partial).
This commit lands the foundational API-layer plumbing needed for that.

Partial ROADMAP #34.
2026-04-09 04:02:59 +09:00
Jobdori
beb09df4b8 style(api): cargo fmt fix on normalize_object_schema test assertions 2026-04-09 03:43:59 +09:00
Jobdori
e7e0fd2dbf fix(api): strict object schema for OpenAI /responses endpoint
OpenAI /responses validates tool function schemas strictly:
- object types must have "properties" (at minimum {})
- "additionalProperties": false is required

/chat/completions is lenient and accepts schemas without these fields,
but /responses rejects them with "object schema missing properties" /
"invalid_function_parameters".

Add normalize_object_schema() which recursively walks the JSON Schema
tree and fills in missing "properties"/{} and "additionalProperties":false
on every object-type node. Existing values are not overwritten.

Call it in openai_tool_definition() before building the request payload
so both /chat/completions and /responses receive strict-validator-safe
schemas.

Add unit tests covering:
- bare object schema gets both fields injected
- nested object schemas are normalised recursively
- existing additionalProperties is not overwritten

Fixes the live repro where gpt-5.4 via OpenAI compat accepted connection
and routing but rejected every tool call with schema validation errors.

Closes ROADMAP #33.
2026-04-09 03:03:43 +09:00
Jobdori
adcea6bceb fix(api): route DashScope models to dashscope config, not openai
ProviderClient::from_model_with_anthropic_auth was dispatching every
ProviderKind::OpenAi match to OpenAiCompatConfig::openai(), which reads
OPENAI_API_KEY and points at api.openai.com. But DashScope models
(qwen-plus, qwen/qwen3-coder, etc.) also return ProviderKind::OpenAi
from detect_provider_kind because DashScope speaks the OpenAI wire
format. The metadata layer correctly identifies them as needing
DASHSCOPE_API_KEY and the DashScope compatible-mode endpoint, but that
metadata was being ignored at dispatch time.

Result: users running `claw --model qwen-plus` with DASHSCOPE_API_KEY
set would get a "missing OPENAI_API_KEY" error instead of being routed
to DashScope.

Fix: consult providers::metadata_for_model in the OpenAi dispatch arm
and pick dashscope() vs openai() based on metadata.auth_env.

Adds a regression test asserting ProviderClient::from_model("qwen-plus")
builds with the DashScope base URL. Exposes a pub base_url() accessor
on OpenAiCompatClient so the test can verify the routing.

Authored by droid (Kimi K2.5 Turbo) via acpx, cleaned up by Jobdori
(removed unsafe blocks unnecessary under edition 2021, imported
ProviderClient from super, adopted EnvVarGuard pattern from
providers/mod.rs tests).

Co-Authored-By: Droid <noreply@factory.ai>
2026-04-08 18:04:37 +09:00
YeonGyu-Kim
3ac97e635e feat(api): add qwen/ prefix routing for Alibaba DashScope provider
Users in Discord #clawcode-get-help (web3g) asked for Qwen 3.6 Plus via
native Alibaba DashScope API instead of OpenRouter, which has stricter
rate limits. This commit adds first-class routing for qwen/ and bare
qwen- prefixed model names.

Changes:
- DEFAULT_DASHSCOPE_BASE_URL constant: /compatible-mode/v1 endpoint
- OpenAiCompatConfig::dashscope() factory mirroring openai()/xai()
- DASHSCOPE_ENV_VARS + credential_env_vars() wiring
- metadata_for_model: qwen/ and qwen- prefix routes to DashScope with
  auth_env=DASHSCOPE_API_KEY, reuses ProviderKind::OpenAi because
  DashScope speaks the OpenAI REST shape
- is_reasoning_model: detect qwen-qwq, qwq-*, and *-thinking variants
  so tuning params (temperature, top_p, etc.) get stripped before
  payload assembly (same pattern as o1/o3/grok-3-mini)

Tests added:
- providers::tests::qwen_prefix_routes_to_dashscope_not_anthropic
- openai_compat::tests::qwen_reasoning_variants_are_detected

89 api lib tests passing, 0 failing. cargo fmt --check: clean.

Closes the user-reported gap: 'use Qwen 3.6 Plus via Alibaba API
directly, not OpenRouter' without needing OPENAI_BASE_URL override
or unsetting ANTHROPIC_API_KEY.
2026-04-08 14:06:26 +09:00
YeonGyu-Kim
c7b3296ef6 style: cargo fmt — fix CI formatting failures
Pre-existing formatting issues in anthropic.rs surfaced by CI cargo fmt check.
No functional changes.
2026-04-08 11:21:13 +09:00
YeonGyu-Kim
b513d6e462 fix(api): sanitize tuning params for reasoning models (o1/o3/grok-3-mini)
Reasoning models reject temperature, top_p, frequency_penalty, and
presence_penalty with 400 errors. Instead of letting these flow through
and returning cryptic provider errors, strip them silently at the
request-builder boundary.

is_reasoning_model() classifies: o1*, o3*, o4*, grok-3-mini.
stop sequences are preserved (safe for all providers).

Tests added:
- reasoning_model_strips_tuning_params: o1-mini strips all 4 params, keeps stop
- grok_3_mini_is_reasoning_model: classification coverage for grok-3-mini, o1,
  o3-mini, and negative cases (gpt-4o, grok-3, claude)

85 api lib tests passing, 0 failing.
2026-04-08 07:32:47 +09:00
YeonGyu-Kim
c667d47c70 feat(api): add tuning params (temperature, top_p, penalties, stop) to MessageRequest
MessageRequest was missing standard OpenAI-compatible generation tuning
parameters. Callers had no way to control temperature, top_p,
frequency_penalty, presence_penalty, or stop sequences.

Changes:
- Added 5 optional fields to MessageRequest (all Option, None by default)
- Wired into build_chat_completion_request: only included in payload when set
- All existing construction sites updated with ..Default::default()
- MessageRequest now derives Default for ergonomic partial construction

Tests added:
- tuning_params_included_in_payload_when_set: all 5 params flow into JSON
- tuning_params_omitted_from_payload_when_none: absent params stay absent

83 api lib tests passing, 0 failing.
cargo check --workspace: 0 warnings.
2026-04-08 07:07:33 +09:00
YeonGyu-Kim
5bcbc86a2b feat: b5-slash-help — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim
6a6c5acb02 feat: b5-reasoning-guard — batch 5 upstream parity 2026-04-07 14:51:27 +09:00
YeonGyu-Kim
f982f24926 fix(api): Windows env hint + .env file loading fallback
When API key missing on Windows, hint about setx. Load .env from CWD
as fallback with simple key=value parser.
2026-04-07 14:22:41 +09:00
YeonGyu-Kim
2a642871ad fix(api): enrich JSON parse errors with response body, provider, and model
Raw 'json_error: no field X' now includes truncated response body,
provider name, and model ID for debugging context.
2026-04-07 14:22:05 +09:00
Yeachan-Heo
d94d792a48 Expose actionable ids for opaque provider failures
Issue #22 was triggered by generic upstream fatal wrappers that only surfaced 'Something went wrong', which left repeated Jobdori-style failures opaque in the CLI. Capture provider request ids on error responses, classify the known generic wrapper as provider_internal, and prefix the user-visible runtime error with the failure class plus session/trace identifiers so operators can correlate the failure quickly.

Constraint: Keep the fix small and user-safe without redesigning the broader runtime error taxonomy
Constraint: Preserve existing non-generic error text unless the wrapper is the known opaque fatal surface
Rejected: Broadly rewriting every runtime error into classified envelopes | unnecessary scope expansion for issue #22
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more opaque wrappers appear, extend the marker list and classification helper rather than reintroducing raw wrapper text alone
Tested: cargo test -p api detects_generic_fatal_wrapper_and_classifies_it_as_provider_internal -- --nocapture; cargo test -p api retries_exhausted_preserves_nested_request_id_and_failure_class -- --nocapture; cargo test -p rusty-claude-cli opaque_provider_wrapper_surfaces_failure_class_session_and_trace -- --nocapture; cargo test -p rusty-claude-cli retry_exhaustion_preserves_internal_failure_class_for_generic_provider_wrapper -- --nocapture; cargo test --workspace
Not-tested: Live upstream reproduction of the Jobdori failure against a real provider session
2026-04-06 00:30:28 +00:00
Yeachan-Heo
fa72cd665e Block oversized requests before providers hard-fail
The runtime already tracked rough token estimates for compaction, but provider-bound
requests still relied on naive model output limits and could be sent upstream even
when the selected model could not fit the estimated prompt plus requested output.

This adds a small model token/context registry in the API layer, estimates request
size from the serialized prompt payload, and fails locally with a dedicated
context-window error before Anthropic or xAI calls are made. Focused integration
coverage asserts the preflight fires before any HTTP request leaves the process.

Constraint: Keep the first pass minimal and reusable across both Anthropic and OpenAI-compatible providers
Rejected: Auto-compact-and-retry in the same patch | broader control-flow change than the requested minimal preflight
Confidence: medium
Scope-risk: narrow
Reversibility: clean
Directive: Expand the model registry before enabling preflight for additional providers or aliases
Tested: cargo build -p api -p tools -p rusty-claude-cli; cargo test -p api
Not-tested: End-to-end CLI auto-compaction or retry behavior after a local context_window_blocked failure
2026-04-05 16:39:58 +00:00