fix: route local OpenAI-compatible models

2026-06-04 19:46:46 +00:00 · 2026-06-03 23:16:46 +09:00
parent 9522674c87
commit bcc5bfde9c
7 changed files with 264 additions and 40 deletions
--- a/USAGE.md
+++ b/USAGE.md
@@ -298,6 +298,18 @@ cd rust
 ./target/debug/claw --model "llama3.2" prompt "summarize this repository in one sentence"
 ```

+For Ollama tags with punctuation (for example `qwen2.5-coder:7b`), `OPENAI_BASE_URL` selects the local OpenAI-compatible route even when `OPENAI_API_KEY` is unset:
+
+```bash
+export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
+unset OPENAI_API_KEY
+
+cd rust
+./target/debug/claw --model "qwen2.5-coder:7b" prompt "reply with ready"
+```
+
+If the local server exposes a slash-containing model ID, prefix it with `local/` so Claw selects the OpenAI-compatible transport while sending the remainder verbatim on the wire: `--model "local/Qwen/Qwen3.6-27B-FP8"`.
+
 ### OpenRouter

 ```bash
@@ -340,7 +352,7 @@ Reasoning variants (`qwen-qwq-*`, `qwq-*`, `*-thinking`) automatically strip `te

 The OpenAI-compatible backend also serves as the gateway for **OpenRouter**, **Ollama**, and any other service that speaks the OpenAI `/v1/chat/completions` wire format — just point `OPENAI_BASE_URL` at the service.

-**Model-name prefix routing:** If a model name starts with `openai/`, `gpt-`, `qwen/`, `qwen-`, `kimi/`, or `kimi-`, the provider is selected by the prefix regardless of which env vars are set. This prevents accidental misrouting to Anthropic when multiple credentials exist in the environment. For the default OpenAI API, `openai/` is a routing prefix and is stripped before the request hits the wire. For a custom `OPENAI_BASE_URL`, slash-containing OpenAI-compatible slugs (for example OpenRouter-style `openai/gpt-4.1-mini`) are preserved so the gateway receives the model ID it expects.
+**Model-name prefix routing:** If a model name starts with `openai/`, `local/`, `gpt-`, `qwen/`, `qwen-`, `kimi/`, or `kimi-`, the provider is selected by the prefix regardless of which env vars are set. This prevents accidental misrouting to Anthropic when multiple credentials exist in the environment. For the default OpenAI API and local/private OpenAI-compatible endpoints, `openai/` is a routing prefix and is stripped before the request hits the wire. For non-local custom `OPENAI_BASE_URL` gateways, slash-containing OpenAI-compatible slugs (for example OpenRouter-style `openai/gpt-4.1-mini`) are preserved so the gateway receives the model ID it expects. The `local/` prefix is an explicit escape hatch for local slash-containing model IDs: it is stripped while the rest of the model ID is sent verbatim.

 ### Tested models and aliases

@@ -360,7 +372,7 @@ These are the models registered in the built-in alias table with known token lim
 | `gpt-4.1` / `gpt-4.1-mini` / `gpt-4.1-nano` | same | OpenAI-compatible | 32 768 | 1 047 576 |
 | `gpt-5.4` / `gpt-5.4-mini` / `gpt-5.4-nano` | same | OpenAI-compatible | 128 000 | 1 000 000 / 400 000 |

-Any model name that does not match an alias is passed through verbatim after provider routing is resolved. This is how you use OpenRouter model slugs (`openai/gpt-4.1-mini` with a custom `OPENAI_BASE_URL`), Ollama tags (`llama3.2`), or full Anthropic model IDs (`claude-sonnet-4-20250514`).
+Any model name that does not match an alias is passed through verbatim after provider routing is resolved. This is how you use OpenRouter model slugs (`openai/gpt-4.1-mini` with a custom `OPENAI_BASE_URL`), Ollama tags (`llama3.2` or `qwen2.5-coder:7b`), slash-containing local IDs (`local/Qwen/Qwen3.6-27B-FP8`), or full Anthropic model IDs (`claude-sonnet-4-20250514`).

 ### User-defined aliases

@@ -382,9 +394,9 @@ Local project settings override user-level settings. Aliases resolve through the

 1. If the resolved model name starts with `claude` → Anthropic.
 2. If it starts with `grok` → xAI.
-3. If it starts with `openai/` or `gpt-` → OpenAI-compatible.
+3. If it starts with `openai/`, `local/`, or `gpt-` → OpenAI-compatible.
 4. If it starts with `qwen/`, `qwen-`, `kimi/`, or `kimi-` → DashScope-compatible OpenAI wire format.
-5. If `OPENAI_BASE_URL` and `OPENAI_API_KEY` are set, unknown model names route to the OpenAI-compatible client for local/gateway servers.
+5. If `OPENAI_BASE_URL` is set, local-looking unknown model names such as `llama3.2` or `qwen2.5-coder:7b` route to the OpenAI-compatible client for local/gateway servers.
 6. Otherwise, `claw` checks which credential is set: Anthropic first, then OpenAI, then xAI. If only `OPENAI_BASE_URL` is set, it still routes to OpenAI-compatible for authless local servers.
 7. If nothing matches, it defaults to Anthropic.

--- a/docs/MODEL_COMPATIBILITY.md
+++ b/docs/MODEL_COMPATIBILITY.md
@@ -148,12 +148,12 @@ pub const DEFAULT_DASHSCOPE_BASE_URL: &str = "https://dashscope.aliyuncs.com/com
 **Affected models:** Slash-containing model IDs routed through the OpenAI-compatible provider, especially custom gateways configured with `OPENAI_BASE_URL` such as OpenRouter, local routers, or other `/v1/chat/completions` services.

 **Behavior:**
- The default OpenAI API treats `openai/` as a routing prefix and sends the bare model name on the wire.
- Custom OpenAI-compatible base URLs preserve slash-containing slugs such as `openai/gpt-4.1-mini` so the gateway receives the exact model ID it expects.
+- The default OpenAI API and local/private OpenAI-compatible base URLs treat `openai/` as a routing prefix and send the bare model name on the wire.
+- Non-local custom OpenAI-compatible base URLs preserve slash-containing slugs such as `openai/gpt-4.1-mini` so gateways like OpenRouter receive the exact model ID they expect. Local slash-containing model IDs can use `local/`, which strips only that escape-hatch prefix and sends the remainder verbatim.
 - `MessageRequest::extra_body` passes through custom request JSON after core fields are populated. This supports provider-specific options such as `web_search_options` and `parallel_tool_calls`.
 - Protected core fields (`model`, `messages`, `stream`, `tools`, `tool_choice`, `max_tokens`, `max_completion_tokens`) cannot be overridden through `extra_body`.

-**Testing:** See `custom_openai_gateway_preserves_slash_model_ids_and_extra_body_params` in `openai_compat_integration.rs` and `extra_body_params_are_passed_through_without_overriding_core_fields` in `openai_compat.rs`.
+**Testing:** See `custom_openai_gateway_preserves_slash_model_ids_and_extra_body_params` in `openai_compat_integration.rs`, `wire_model_strips_openai_prefix_for_default_and_local_preserves_custom_gateways`, `local_routing_prefix_strips_only_escape_hatch`, and `extra_body_params_are_passed_through_without_overriding_core_fields` in `openai_compat.rs`.

 ## Implementation Details

--- a/docs/local-openai-compatible-providers.md
+++ b/docs/local-openai-compatible-providers.md
@@ -13,7 +13,7 @@ If you need the most polished daily-driver experience for a specific non-Claude

 ## OpenAI-compatible routing basics

-Set `OPENAI_BASE_URL` to the server’s `/v1` endpoint and set `OPENAI_API_KEY` to either the required token or a harmless placeholder for local servers that expect an Authorization header. The model name must match what the server exposes.
+Set `OPENAI_BASE_URL` to the server’s `/v1` endpoint and set `OPENAI_API_KEY` to either the required token or a harmless placeholder for local servers that expect an Authorization header. Authless local/private OpenAI-compatible servers can leave `OPENAI_API_KEY` unset. The model name must match what the server exposes.

 ```bash
 export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
@@ -24,8 +24,8 @@ claw --model "qwen3:latest" prompt "Reply exactly HELLO_WORLD_123"
 Routing notes:

 - Use the `openai/` prefix for OpenAI-compatible gateways when you need prefix routing to win over ambient Anthropic credentials, for example `--model "openai/gpt-4.1-mini"` with OpenRouter.
- For local servers, prefer the exact model ID reported by the server (`qwen3:latest`, `llama3.2`, `Qwen/Qwen2.5-Coder-7B-Instruct`, etc.). If your local gateway exposes slash-containing IDs, use that exact slug.
- If you have multiple provider keys in your environment, remove unrelated keys while smoke-testing a local route or choose a model prefix that unambiguously selects the intended provider.
+- For local servers, prefer the exact model ID reported by the server (`qwen3:latest`, `llama3.2`, etc.). If your local gateway exposes slash-containing IDs, prefix the exact slug with `local/` so Claw routes through OpenAI-compatible transport while sending the rest verbatim, for example `--model "local/Qwen/Qwen2.5-Coder-7B-Instruct"`.
+- If you have multiple provider keys in your environment, `OPENAI_BASE_URL` plus local-looking tags such as `llama3.2` or `qwen2.5-coder:7b` selects the local OpenAI-compatible route; use `local/` for slash-containing local IDs.
 - Tool workflows need model/server support for OpenAI-compatible tool calls. Plain prompt smoke tests can pass even when slash/tool workflows still fail because the server returns an incompatible tool-call shape.

 ## Raw `/v1/chat/completions` smoke test
@@ -58,11 +58,11 @@ In another shell:

 ```bash
 export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
-export OPENAI_API_KEY="local-dev-token"
+unset OPENAI_API_KEY
 claw --model "qwen3:latest" prompt "Reply exactly HELLO_WORLD_123"
 ```

-If Ollama is running without auth and your build accepts authless local OpenAI-compatible servers, `unset OPENAI_API_KEY` is also acceptable. Use a placeholder token rather than a real cloud API key for local testing.
+If Ollama is running without auth, `unset OPENAI_API_KEY` is acceptable. Use a placeholder token rather than a real cloud API key if your local server requires an Authorization header.

 ## llama.cpp server

--- a/rust/crates/api/src/client.rs
+++ b/rust/crates/api/src/client.rs
@@ -235,4 +235,22 @@ mod tests {
            other => panic!("Expected ProviderClient::OpenAi for qwen-plus, got: {other:?}"),
        }
    }
+
+    #[test]
+    fn local_openai_base_url_routes_authless_ollama_models() {
+        let _lock = env_lock();
+        let _base_url = EnvVarGuard::set("OPENAI_BASE_URL", Some("http://127.0.0.1:11434/v1"));
+        let _openai_key = EnvVarGuard::set("OPENAI_API_KEY", None);
+        let _anthropic_key = EnvVarGuard::set("ANTHROPIC_API_KEY", Some("test-anthropic-key"));
+        let _anthropic_token = EnvVarGuard::set("ANTHROPIC_AUTH_TOKEN", None);
+
+        let client = ProviderClient::from_model("qwen2.5-coder:7b")
+            .expect("local model should route to OpenAI-compatible client without auth");
+        match client {
+            ProviderClient::OpenAi(openai_client) => {
+                assert_eq!(openai_client.base_url(), "http://127.0.0.1:11434/v1")
+            }
+            other => panic!("Expected ProviderClient::OpenAi for local model, got: {other:?}"),
+        }
+    }
 }
--- a/rust/crates/api/src/providers/mod.rs
+++ b/rust/crates/api/src/providers/mod.rs
@@ -262,6 +262,14 @@ pub fn metadata_for_model(model: &str) -> Option<ProviderMetadata> {
            default_base_url: openai_compat::DEFAULT_OPENAI_BASE_URL,
        });
    }
+    if canonical.starts_with("local/") {
+        return Some(ProviderMetadata {
+            provider: ProviderKind::OpenAi,
+            auth_env: "OPENAI_API_KEY",
+            base_url_env: "OPENAI_BASE_URL",
+            default_base_url: openai_compat::DEFAULT_OPENAI_BASE_URL,
+        });
+    }
    // Alibaba DashScope compatible-mode endpoint. Routes qwen/* and bare
    // qwen-* model names (qwen-max, qwen-plus, qwen-turbo, qwen-qwq, etc.)
    // to the OpenAI-compat client pointed at DashScope's /compatible-mode/v1.
@@ -337,17 +345,21 @@ pub fn provider_diagnostics_for_model(model: &str) -> ProviderDiagnostics {
    }
 }

+fn looks_like_local_openai_model(model: &str) -> bool {
+    model.contains(':') || model.contains('.')
+}
+
 #[must_use]
 pub fn detect_provider_kind(model: &str) -> ProviderKind {
-    if let Some(metadata) = metadata_for_model(model) {
+    let resolved_model = resolve_model_alias(model);
+    if let Some(metadata) = metadata_for_model(&resolved_model) {
        return metadata.provider;
    }
-    // When OPENAI_BASE_URL is set, the user explicitly configured an
-    // OpenAI-compatible endpoint. Prefer it over the Anthropic fallback
-    // even when the model name has no recognized prefix — this is the
-    // common case for local providers (Ollama, LM Studio, vLLM, etc.)
-    // where model names like "qwen2.5-coder:7b" don't match any prefix.
-    if std::env::var_os("OPENAI_BASE_URL").is_some() && openai_compat::has_api_key("OPENAI_API_KEY")
+    // When OPENAI_BASE_URL is set and the unknown model name looks like a
+    // local server tag (for example `llama3.2` or `qwen2.5-coder:7b`), prefer
+    // the OpenAI-compatible endpoint over ambient Anthropic credentials.
+    if std::env::var_os("OPENAI_BASE_URL").is_some()
+        && looks_like_local_openai_model(&resolved_model)
    {
        return ProviderKind::OpenAi;
    }
@@ -1042,6 +1054,18 @@ mod tests {
        assert_eq!(kind2, ProviderKind::OpenAi);
    }

+    #[test]
+    fn local_prefix_routes_to_openai_not_anthropic() {
+        let meta = super::metadata_for_model("local/Qwen/Qwen3.6-27B-FP8")
+            .expect("local/ prefix must resolve to OpenAI-compatible metadata");
+        assert_eq!(meta.provider, ProviderKind::OpenAi);
+        assert_eq!(meta.auth_env, "OPENAI_API_KEY");
+        assert_eq!(meta.base_url_env, "OPENAI_BASE_URL");
+
+        let kind = detect_provider_kind("local/Qwen/Qwen3.6-27B-FP8");
+        assert_eq!(kind, ProviderKind::OpenAi);
+    }
+
    #[test]
    fn qwen_prefix_routes_to_dashscope_not_anthropic() {
        // User request from Discord #clawcode-get-help: web3g wants to use
--- a/rust/crates/api/src/providers/openai_compat.rs
+++ b/rust/crates/api/src/providers/openai_compat.rs
@@ -1,5 +1,6 @@
 use std::borrow::Cow;
 use std::collections::{BTreeMap, VecDeque};
+use std::net::Ipv4Addr;
 use std::sync::atomic::{AtomicU64, Ordering};
 use std::time::{Duration, SystemTime, UNIX_EPOCH};

@@ -131,13 +132,22 @@ impl OpenAiCompatClient {
    }

    pub fn from_env(config: OpenAiCompatConfig) -> Result<Self, ApiError> {
-        let Some(api_key) = read_env_non_empty(config.api_key_env)? else {
-            return Err(ApiError::missing_credentials(
-                config.provider_name,
-                config.credential_env_vars(),
-            ));
+        let base_url = read_base_url(config);
+        let api_key = match read_env_non_empty(config.api_key_env)? {
+            Some(api_key) => api_key,
+            None if config.provider_name == "OpenAI"
+                && is_local_openai_compatible_base_url(&base_url) =>
+            {
+                "local-dev-token".to_string()
+            }
+            None => {
+                return Err(ApiError::missing_credentials(
+                    config.provider_name,
+                    config.credential_env_vars(),
+                ));
+            }
        };
-        Ok(Self::new(api_key, config))
+        Ok(Self::new(api_key, config).with_base_url(base_url))
    }

    #[must_use]
@@ -915,14 +925,18 @@ pub fn model_requires_reasoning_content_in_history(model: &str) -> bool {

 /// Strip routing prefix (e.g., "openai/gpt-4" → "gpt-4") for the wire.
 /// The prefix is used only to select transport; the backend expects the
-/// bare model id.
+/// bare model id. Use `local/` to force OpenAI-compatible routing while
+/// preserving any slashes that follow the prefix.
 #[allow(dead_code)]
 fn strip_routing_prefix(model: &str) -> &str {
    if let Some(pos) = model.find('/') {
        let prefix = &model[..pos];
        // Only strip if the prefix before "/" is a known routing prefix,
        // not if "/" appears in the middle of the model name for other reasons.
-        if matches!(prefix, "openai" | "xai" | "grok" | "qwen" | "kimi") {
+        if matches!(
+            prefix,
+            "openai" | "xai" | "grok" | "qwen" | "kimi" | "local"
+        ) {
            &model[pos + 1..]
        } else {
            model
@@ -932,6 +946,44 @@ fn strip_routing_prefix(model: &str) -> &str {
    }
 }

+fn normalize_base_url_for_model_routing(url: &str) -> &str {
+    let trimmed = url.trim_end_matches('/');
+    trimmed
+        .strip_suffix("/chat/completions")
+        .map(|value| value.trim_end_matches('/'))
+        .unwrap_or(trimmed)
+}
+
+fn url_host(url: &str) -> &str {
+    let after_scheme = url.split_once("://").map_or(url, |(_, rest)| rest);
+    let authority = after_scheme.split(['/', '?', '#']).next().unwrap_or("");
+    let host_port = authority
+        .rsplit_once('@')
+        .map_or(authority, |(_, host_port)| host_port);
+    if host_port.starts_with('[') {
+        return host_port
+            .split(']')
+            .next()
+            .unwrap_or("")
+            .trim_start_matches('[');
+    }
+    host_port.split(':').next().unwrap_or("")
+}
+
+fn is_local_openai_compatible_base_url(url: &str) -> bool {
+    let host = url_host(url.trim());
+    if host.eq_ignore_ascii_case("localhost") || host == "::1" {
+        return true;
+    }
+    let Ok(address) = host.parse::<Ipv4Addr>() else {
+        return false;
+    };
+    let [first, second, ..] = address.octets();
+    matches!(first, 10 | 127)
+        || first == 192 && second == 168
+        || first == 172 && (16..=31).contains(&second)
+}
+
 fn wire_model_for_base_url<'a>(
    model: &'a str,
    config: OpenAiCompatConfig,
@@ -944,26 +996,22 @@ fn wire_model_for_base_url<'a>(
    let lowered_prefix = prefix.to_ascii_lowercase();

    if lowered_prefix == "openai" {
-        let trimmed_base_url = base_url.trim_end_matches('/');
-        let default_openai = DEFAULT_OPENAI_BASE_URL.trim_end_matches('/');
-        if matches!(
-            lowered_prefix.as_str(),
-            "xai" | "grok" | "kimi" | "gemini" | "gemma"
-        ) {
+        let normalized_base_url = normalize_base_url_for_model_routing(base_url);
+        let default_base_url = normalize_base_url_for_model_routing(config.default_base_url);
+        if normalized_base_url.eq_ignore_ascii_case(default_base_url)
+            || is_local_openai_compatible_base_url(base_url)
+        {
            return Cow::Borrowed(&model[pos + 1..]);
        }
-        if config.provider_name == "OpenAI" && trimmed_base_url != default_openai {
-            // Only preserve the full slug if it's NOT a model we want to strip
-            if !model.contains("gemini") && !model.contains("gemma") {
-                return Cow::Borrowed(model);
-            }
-        }
-        return Cow::Borrowed(&model[pos + 1..]);
+        return Cow::Borrowed(model);
    }

    if matches!(lowered_prefix.as_str(), "xai" | "grok" | "qwen" | "kimi") {
        return Cow::Borrowed(&model[pos + 1..]);
    }
+    if lowered_prefix == "local" {
+        return Cow::Borrowed(&model[pos + 1..]);
+    }

    Cow::Borrowed(model)
 }
@@ -1708,6 +1756,7 @@ mod tests {
        ToolChoice, ToolDefinition, ToolResultContentBlock,
    };
    use serde_json::json;
+    use std::borrow::Cow;
    use std::collections::BTreeMap;
    use std::sync::{Mutex, OnceLock};

@@ -2147,6 +2196,28 @@ mod tests {
        ));
    }

+    #[test]
+    fn local_openai_base_url_does_not_require_api_key() {
+        let _lock = env_lock();
+        let original_base_url = std::env::var_os("OPENAI_BASE_URL");
+        let original_api_key = std::env::var_os("OPENAI_API_KEY");
+        std::env::set_var("OPENAI_BASE_URL", "http://127.0.0.1:11434/v1");
+        std::env::remove_var("OPENAI_API_KEY");
+
+        let client = OpenAiCompatClient::from_env(OpenAiCompatConfig::openai())
+            .expect("local OpenAI-compatible endpoint should not require an API key");
+        assert_eq!(client.base_url(), "http://127.0.0.1:11434/v1");
+
+        match original_base_url {
+            Some(value) => std::env::set_var("OPENAI_BASE_URL", value),
+            None => std::env::remove_var("OPENAI_BASE_URL"),
+        }
+        match original_api_key {
+            Some(value) => std::env::set_var("OPENAI_API_KEY", value),
+            None => std::env::remove_var("OPENAI_API_KEY"),
+        }
+    }
+
    #[test]
    fn endpoint_builder_accepts_base_urls_and_full_endpoints() {
        assert_eq!(
@@ -2762,6 +2833,66 @@ mod tests {
        }
    }

+    #[test]
+    fn wire_model_strips_openai_prefix_for_default_and_local_preserves_custom_gateways() {
+        assert_eq!(
+            super::wire_model_for_base_url(
+                "openai/gpt-4o",
+                OpenAiCompatConfig::openai(),
+                super::DEFAULT_OPENAI_BASE_URL,
+            ),
+            Cow::Borrowed("gpt-4o")
+        );
+        assert_eq!(
+            super::wire_model_for_base_url(
+                "openai/qwen2.5-coder:7b",
+                OpenAiCompatConfig::openai(),
+                "http://127.0.0.1:11434/v1",
+            ),
+            Cow::Borrowed("qwen2.5-coder:7b")
+        );
+        assert_eq!(
+            super::wire_model_for_base_url(
+                "openai/llama3.2",
+                OpenAiCompatConfig::openai(),
+                "http://localhost:11434/v1/chat/completions",
+            ),
+            Cow::Borrowed("llama3.2")
+        );
+        assert_eq!(
+            super::wire_model_for_base_url(
+                "openai/gpt-4.1-mini",
+                OpenAiCompatConfig::openai(),
+                "https://openrouter.ai/api/v1",
+            ),
+            Cow::Borrowed("openai/gpt-4.1-mini")
+        );
+        assert_eq!(
+            super::wire_model_for_base_url(
+                "openai/gpt-4.1-mini",
+                OpenAiCompatConfig::openai(),
+                "https://not-localhost.example.com/v1",
+            ),
+            Cow::Borrowed("openai/gpt-4.1-mini")
+        );
+    }
+
+    #[test]
+    fn local_routing_prefix_strips_only_escape_hatch() {
+        assert_eq!(
+            super::strip_routing_prefix("local/Qwen/Qwen3.6-27B-FP8"),
+            "Qwen/Qwen3.6-27B-FP8"
+        );
+        assert_eq!(
+            super::wire_model_for_base_url(
+                "local/Qwen/Qwen3.6-27B-FP8",
+                OpenAiCompatConfig::openai(),
+                "http://127.0.0.1:8000/v1",
+            ),
+            Cow::Borrowed("Qwen/Qwen3.6-27B-FP8")
+        );
+    }
+
    #[test]
    fn check_request_body_size_allows_large_requests_for_openai() {
        // Create a request that exceeds DashScope's limit but is under OpenAI's 100MB limit
--- a/rust/crates/rusty-claude-cli/src/main.rs
+++ b/rust/crates/rusty-claude-cli/src/main.rs
@@ -2098,6 +2098,9 @@ fn validate_model_syntax(model: &str) -> Result<(), String> {
    if is_bare_provider_model(trimmed) {
        return Ok(());
    }
+    if is_local_openai_model_syntax(trimmed) {
+        return Ok(());
+    }
    // Check provider/model format: provider_id/model_id
    let parts: Vec<&str> = trimmed.split('/').collect();
    if parts.len() != 2 || parts[0].is_empty() || parts[1].is_empty() {
@@ -2128,6 +2131,13 @@ fn is_bare_provider_model(model: &str) -> bool {
    model.starts_with("claude-") || model.starts_with("gpt-")
 }

+fn is_local_openai_model_syntax(model: &str) -> bool {
+    if let Some(rest) = model.strip_prefix("local/") {
+        return !rest.is_empty() && rest.split('/').all(|segment| !segment.is_empty());
+    }
+    std::env::var_os("OPENAI_BASE_URL").is_some() && (model.contains(':') || model.contains('.'))
+}
+
 fn config_alias_for_current_dir(alias: &str) -> Option<String> {
    if alias.is_empty() {
        return None;
@@ -13577,6 +13587,35 @@ mod tests {
            !err_garbage.contains("Did you mean"),
            "Unrelated model errors should not get a hint: {err_garbage}"
        );
+
+        let original_openai_base_url = std::env::var_os("OPENAI_BASE_URL");
+        std::env::set_var("OPENAI_BASE_URL", "http://127.0.0.1:11434/v1");
+        match parse_args(&[
+            "prompt".to_string(),
+            "test".to_string(),
+            "--model".to_string(),
+            "qwen2.5-coder:7b".to_string(),
+        ])
+        .expect("Ollama-style tag should parse when OPENAI_BASE_URL is set")
+        {
+            CliAction::Prompt { model, .. } => assert_eq!(model, "qwen2.5-coder:7b"),
+            other => panic!("expected CliAction::Prompt, got: {other:?}"),
+        }
+        match parse_args(&[
+            "prompt".to_string(),
+            "test".to_string(),
+            "--model".to_string(),
+            "local/Qwen/Qwen3.6-27B-FP8".to_string(),
+        ])
+        .expect("local/ slash-containing model should parse")
+        {
+            CliAction::Prompt { model, .. } => assert_eq!(model, "local/Qwen/Qwen3.6-27B-FP8"),
+            other => panic!("expected CliAction::Prompt, got: {other:?}"),
+        }
+        match original_openai_base_url {
+            Some(value) => std::env::set_var("OPENAI_BASE_URL", value),
+            None => std::env::remove_var("OPENAI_BASE_URL"),
+        }
    }

    #[test]