fix: route local OpenAI-compatible models

2026-06-04 11:36:44 +00:00 · 2026-06-03 23:16:46 +09:00
parent 9522674c87
commit bcc5bfde9c
7 changed files with 264 additions and 40 deletions
--- a/docs/MODEL_COMPATIBILITY.md
+++ b/docs/MODEL_COMPATIBILITY.md
@@ -148,12 +148,12 @@ pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/com
 **Affected models:** Slash-containing model IDs routed through the OpenAI-compatible provider, especially custom gateways configured with `OPENAI_BASE_URL` such as OpenRouter, local routers, or other `/v1/chat/completions` services.

 **Behavior:**
- The default OpenAI API treats `openai/` as a routing prefix and sends the bare model name on the wire.
- Custom OpenAI-compatible base URLs preserve slash-containing slugs such as `openai/gpt-4.1-mini` so the gateway receives the exact model ID it expects.
+- The default OpenAI API and local/private OpenAI-compatible base URLs treat `openai/` as a routing prefix and send the bare model name on the wire.
+- Non-local custom OpenAI-compatible base URLs preserve slash-containing slugs such as `openai/gpt-4.1-mini` so gateways like OpenRouter receive the exact model ID they expect. Local slash-containing model IDs can use `local/`, which strips only that escape-hatch prefix and sends the remainder verbatim.
 - `MessageRequest::extra_body` passes through custom request JSON after core fields are populated. This supports provider-specific options such as `web_search_options` and `parallel_tool_calls`.
 - Protected core fields (`model`, `messages`, `stream`, `tools`, `tool_choice`, `max_tokens`, `max_completion_tokens`) cannot be overridden through `extra_body`.

-**Testing:** See `custom_openai_gateway_preserves_slash_model_ids_and_extra_body_params` in `openai_compat_integration.rs` and `extra_body_params_are_passed_through_without_overriding_core_fields` in `openai_compat.rs`.
+**Testing:** See `custom_openai_gateway_preserves_slash_model_ids_and_extra_body_params` in `openai_compat_integration.rs`, `wire_model_strips_openai_prefix_for_default_and_local_preserves_custom_gateways`, `local_routing_prefix_strips_only_escape_hatch`, and `extra_body_params_are_passed_through_without_overriding_core_fields` in `openai_compat.rs`.

 ## Implementation Details

--- a/docs/local-openai-compatible-providers.md
+++ b/docs/local-openai-compatible-providers.md
@@ -13,7 +13,7 @@ If you need the most polished daily-driver experience for a specific non-Claude

 ## OpenAI-compatible routing basics

-Set `OPENAI_BASE_URL` to the server’s `/v1` endpoint and set `OPENAI_API_KEY` to either the required token or a harmless placeholder for local servers that expect an Authorization header. The model name must match what the server exposes.
+Set `OPENAI_BASE_URL` to the server’s `/v1` endpoint and set `OPENAI_API_KEY` to either the required token or a harmless placeholder for local servers that expect an Authorization header. Authless local/private OpenAI-compatible servers can leave `OPENAI_API_KEY` unset. The model name must match what the server exposes.

 ```bash
 export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
@@ -24,8 +24,8 @@ claw --model "qwen3:latest" prompt "Reply exactly HELLO_WORLD_123"
 Routing notes:

 - Use the `openai/` prefix for OpenAI-compatible gateways when you need prefix routing to win over ambient Anthropic credentials, for example `--model "openai/gpt-4.1-mini"` with OpenRouter.
- For local servers, prefer the exact model ID reported by the server (`qwen3:latest`, `llama3.2`, `Qwen/Qwen2.5-Coder-7B-Instruct`, etc.). If your local gateway exposes slash-containing IDs, use that exact slug.
- If you have multiple provider keys in your environment, remove unrelated keys while smoke-testing a local route or choose a model prefix that unambiguously selects the intended provider.
+- For local servers, prefer the exact model ID reported by the server (`qwen3:latest`, `llama3.2`, etc.). If your local gateway exposes slash-containing IDs, prefix the exact slug with `local/` so Claw routes through OpenAI-compatible transport while sending the rest verbatim, for example `--model "local/Qwen/Qwen2.5-Coder-7B-Instruct"`.
+- If you have multiple provider keys in your environment, `OPENAI_BASE_URL` plus local-looking tags such as `llama3.2` or `qwen2.5-coder:7b` selects the local OpenAI-compatible route; use `local/` for slash-containing local IDs.
 - Tool workflows need model/server support for OpenAI-compatible tool calls. Plain prompt smoke tests can pass even when slash/tool workflows still fail because the server returns an incompatible tool-call shape.

 ## Raw `/v1/chat/completions` smoke test
@@ -58,11 +58,11 @@ In another shell:

 ```bash
 export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
-export OPENAI_API_KEY="local-dev-token"
+unset OPENAI_API_KEY
 claw --model "qwen3:latest" prompt "Reply exactly HELLO_WORLD_123"
 ```

-If Ollama is running without auth and your build accepts authless local OpenAI-compatible servers, `unset OPENAI_API_KEY` is also acceptable. Use a placeholder token rather than a real cloud API key for local testing.
+If Ollama is running without auth, `unset OPENAI_API_KEY` is acceptable. Use a placeholder token rather than a real cloud API key if your local server requires an Authorization header.

 ## llama.cpp server