diff --git a/USAGE.md b/USAGE.md index d7149aa5..5e9a41cf 100644 --- a/USAGE.md +++ b/USAGE.md @@ -306,9 +306,7 @@ Reasoning variants (`qwen-qwq-*`, `qwq-*`, `*-thinking`) automatically strip `te The OpenAI-compatible backend also serves as the gateway for **OpenRouter**, **Ollama**, and any other service that speaks the OpenAI `/v1/chat/completions` wire format — just point `OPENAI_BASE_URL` at the service. -**Model-name prefix routing:** If a model name starts with `openai/`, `gpt-`, `qwen/`, `qwen-`, `kimi/`, or `kimi-`, the provider is selected by the prefix regardless of which env vars are set. This prevents accidental misrouting to Anthropic when multiple credentials exist in the environment. Kimi and Qwen prefixes route to DashScope compatible mode; `openai/` is stripped before the Chat Completions request is sent. - -**Token and cost accounting:** Anthropic usage fields are recorded directly, including cache create/read tokens. OpenAI-compatible responses normalize `prompt_tokens` / `completion_tokens` into the same internal usage fields; when a provider reports `prompt_tokens_details.cached_tokens`, cached prompt tokens are counted as cache reads and subtracted from uncached input tokens so totals are not double-counted. `/status`, `/cost`, and `/usage` expose cumulative token totals and an estimated cost; unknown third-party prices use the built-in estimated-default pricing marker. +**Model-name prefix routing:** If a model name starts with `openai/`, `gpt-`, `qwen/`, `qwen-`, `kimi/`, or `kimi-`, the provider is selected by the prefix regardless of which env vars are set. This prevents accidental misrouting to Anthropic when multiple credentials exist in the environment. For the default OpenAI API, `openai/` is a routing prefix and is stripped before the request hits the wire. For a custom `OPENAI_BASE_URL`, slash-containing OpenAI-compatible slugs (for example OpenRouter-style `openai/gpt-4.1-mini`) are preserved so the gateway receives the model ID it expects. ### Tested models and aliases @@ -326,7 +324,7 @@ These are the models registered in the built-in alias table with known token lim | `gpt-4.1` / `gpt-4.1-mini` / `gpt-4.1-nano` | same | OpenAI-compatible | 32 768 | 1 047 576 | | `gpt-5.4` / `gpt-5.4-mini` / `gpt-5.4-nano` | same | OpenAI-compatible | 128 000 | 1 000 000 / 400 000 | -Any model name that does not match an alias is passed through verbatim. This is how you use OpenRouter model slugs (`openai/gpt-4.1-mini`), Ollama tags (`llama3.2`), or full Anthropic model IDs (`claude-sonnet-4-20250514`). +Any model name that does not match an alias is passed through verbatim after provider routing is resolved. This is how you use OpenRouter model slugs (`openai/gpt-4.1-mini` with a custom `OPENAI_BASE_URL`), Ollama tags (`llama3.2`), or full Anthropic model IDs (`claude-sonnet-4-20250514`). ### User-defined aliases @@ -349,9 +347,17 @@ Local project settings override user-level settings. Aliases resolve through the 1. If the resolved model name starts with `claude` → Anthropic. 2. If it starts with `grok` → xAI. 3. If it starts with `openai/` or `gpt-` → OpenAI-compatible. -4. If it starts with `qwen/`, `qwen-`, `kimi/`, or `kimi-` → DashScope compatible mode. -5. If `OPENAI_BASE_URL` is set with `OPENAI_API_KEY`, route unprefixed custom/local model names to OpenAI-compatible. -6. Otherwise, `claw` checks which credential is set: Anthropic first, then OpenAI, then xAI; if nothing matches, it defaults to Anthropic. +4. If it starts with `qwen/`, `qwen-`, `kimi/`, or `kimi-` → DashScope-compatible OpenAI wire format. +5. If `OPENAI_BASE_URL` and `OPENAI_API_KEY` are set, unknown model names route to the OpenAI-compatible client for local/gateway servers. +6. Otherwise, `claw` checks which credential is set: Anthropic first, then OpenAI, then xAI. If only `OPENAI_BASE_URL` is set, it still routes to OpenAI-compatible for authless local servers. +7. If nothing matches, it defaults to Anthropic. + + +### Provider diagnostics and custom OpenAI-compatible parameters + +The API layer exposes a provider diagnostics snapshot via `api::provider_diagnostics_for_model(model)`. It reports the resolved provider, auth/base-url environment variables, default base URL, whether the provider uses the OpenAI-compatible wire format, whether reasoning tuning parameters are stripped, whether DeepSeek V4 reasoning history is preserved, proxy support, extra-body support, and whether slash-containing model IDs are preserved for custom OpenAI-compatible gateways. + +For gateway features that are not first-class request fields yet, `MessageRequest::extra_body` passes through provider-specific JSON parameters such as `web_search_options` or `parallel_tool_calls`. Core protocol fields (`model`, `messages`, `stream`, `tools`, `tool_choice`, `max_tokens`, and `max_completion_tokens`) are protected and cannot be overridden through `extra_body`. ## FAQ diff --git a/docs/MODEL_COMPATIBILITY.md b/docs/MODEL_COMPATIBILITY.md index 90b101dd..f2f53acf 100644 --- a/docs/MODEL_COMPATIBILITY.md +++ b/docs/MODEL_COMPATIBILITY.md @@ -9,8 +9,8 @@ This document describes model-specific handling in the OpenAI-compatible provide - [Kimi Models (is_error Exclusion)](#kimi-models-is_error-exclusion) - [Reasoning Models (Tuning Parameter Stripping)](#reasoning-models-tuning-parameter-stripping) - [GPT-5 (max_completion_tokens)](#gpt-5-max_completion_tokens) - - [Qwen and Kimi Models (DashScope Routing)](#qwen-and-kimi-models-dashscope-routing) - - [OpenAI-Compatible Usage Accounting](#openai-compatible-usage-accounting) + - [Qwen Models (DashScope Routing)](#qwen-models-dashscope-routing) + - [Custom Gateway Slugs and Extra Body Parameters](#custom-gateway-slugs-and-extra-body-parameters) - [Implementation Details](#implementation-details) - [Adding New Models](#adding-new-models) - [Testing](#testing) @@ -23,6 +23,8 @@ The `openai_compat.rs` provider translates Claude Code's internal message format - Sampling parameters (temperature, top_p, etc.) - Token limit fields (`max_tokens` vs `max_completion_tokens`) - Base URL routing +- Provider-specific extra body parameters (`web_search_options`, `parallel_tool_calls`, local-server switches, etc.) +- Provider diagnostics for status/doctor-style surfaces ## Model-Specific Handling @@ -141,13 +143,17 @@ pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/com --- -### OpenAI-Compatible Usage Accounting +### Custom Gateway Slugs and Extra Body Parameters -**Affected providers:** OpenAI-compatible, xAI, DashScope, OpenRouter/Ollama/local gateways that return Chat Completions `usage`. +**Affected models:** Slash-containing model IDs routed through the OpenAI-compatible provider, especially custom gateways configured with `OPENAI_BASE_URL` such as OpenRouter, local routers, or other `/v1/chat/completions` services. -**Behavior:** `prompt_tokens` and `completion_tokens` are normalized into the shared `Usage` shape used by Anthropic. If a provider includes `prompt_tokens_details.cached_tokens`, cached prompt tokens are recorded as `cache_read_input_tokens` and subtracted from uncached `input_tokens`, preserving an accurate `total_tokens()` without double-counting. Streaming OpenAI responses request `stream_options.include_usage` where supported so final usage chunks feed the same accounting path. +**Behavior:** +- The default OpenAI API treats `openai/` as a routing prefix and sends the bare model name on the wire. +- Custom OpenAI-compatible base URLs preserve slash-containing slugs such as `openai/gpt-4.1-mini` so the gateway receives the exact model ID it expects. +- `MessageRequest::extra_body` passes through custom request JSON after core fields are populated. This supports provider-specific options such as `web_search_options` and `parallel_tool_calls`. +- Protected core fields (`model`, `messages`, `stream`, `tools`, `tool_choice`, `max_tokens`, `max_completion_tokens`) cannot be overridden through `extra_body`. -**Status/cost surfaces:** `/status`, `/cost`, and JSON output expose cumulative input/output/cache/total token fields plus an estimated cost marker. Unknown third-party model pricing uses the `estimated-default` pricing label rather than pretending provider-specific prices are known. +**Testing:** See `custom_openai_gateway_preserves_slash_model_ids_and_extra_body_params` in `openai_compat_integration.rs` and `extra_body_params_are_passed_through_without_overriding_core_fields` in `openai_compat.rs`. ## Implementation Details @@ -164,8 +170,8 @@ rust/crates/api/src/providers/openai_compat.rs | `model_rejects_is_error_field()` | Detects models that don't support `is_error` in tool results | | `is_reasoning_model()` | Detects reasoning models that need tuning param stripping | | `translate_message()` | Converts internal messages to OpenAI format (applies `is_error` logic) | -| `build_chat_completion_request()` | Constructs full request payload (applies all model-specific logic) | -| `OpenAiUsage::normalized()` | Maps Chat Completions usage and cached prompt token details into shared token/cost accounting fields | +| `build_chat_completion_request()` | Constructs full request payload (applies all model-specific logic and safe `extra_body` passthrough) | +| `provider_diagnostics_for_model()` | Produces provider/status diagnostics including auth/base-url vars, reasoning behavior, proxy support, extra-body support, and slash-model preservation | ### Provider Prefix Handling @@ -178,7 +184,7 @@ let canonical = model.to_ascii_lowercase() .unwrap_or(model); ``` -This ensures consistent detection regardless of whether models are referenced with or without provider prefixes. +This ensures consistent detection regardless of whether models are referenced with or without provider prefixes. Wire-model handling is more specific: known routing prefixes are stripped for provider-native defaults, while custom OpenAI-compatible base URLs preserve slash-containing gateway slugs. ## Adding New Models @@ -196,11 +202,15 @@ When adding support for new models: - Does it require `max_completion_tokens` instead of `max_tokens`? - Update the `max_tokens_key` logic -4. **Add tests** +4. **Check custom gateway behavior** + - Should slash-containing IDs be preserved for custom `OPENAI_BASE_URL` gateways? + - Does the feature belong in a typed request field or `extra_body` passthrough? + +5. **Add tests** - Unit test for detection function - Integration test in `build_chat_completion_request` -5. **Update this documentation** +6. **Update this documentation** - Add the model to the affected lists - Document any special behavior @@ -217,6 +227,8 @@ cargo test --package api model_rejects_is_error_field cargo test --package api reasoning_model cargo test --package api gpt5 cargo test --package api qwen +cargo test --package api custom_openai_gateway_preserves_slash_model_ids_and_extra_body_params +cargo test --package api provider_diagnostics_explain_openai_compatible_capabilities ``` ### Test Files @@ -244,6 +256,6 @@ fn my_new_model_is_detected() { --- -*Last updated: 2026-04-16* +*Last updated: 2026-05-15* For questions or updates, see the implementation in `rust/crates/api/src/providers/openai_compat.rs`.