Tibo2403 · Tibo2403 · Jun 27, 2026 · Jun 27, 2026 · Jun 27, 2026 · Jun 27, 2026
@@ -8,7 +8,9 @@ New-Item -ItemType Directory -Force -Path $target | Out-Null
 $files = @(
     'litellm-cost-routing.yaml',
     'codex_key_session_web.py',
-    'Start-CodexKeySessionWeb.ps1'
+    'Start-CodexKeySessionWeb.ps1',
+    'Start-CodexQwenOllama.ps1',
+    'Test-CodexLiteLLMDispatch.ps1'
 )
 
 foreach ($file in $files) {

@@ -58,17 +58,20 @@ dispatches those aliases across OpenAI and Gemini while keeping API keys in
 environment variables. When `HF_TOKEN` is available, it can also route Hugging
 Face and multi-provider tasks through the `codex-hf-cheap` and `codex-hf-fast`
 LiteLLM aliases, or launch an optional `cost-routing-hf` Codex profile that
-points directly at the Hugging Face router. `codex-routing-policy.yaml` keeps
-the default provider rules and fallback order editable without changing Python
-code.
+points directly at the Hugging Face router. `codex-qwen-local` is available as
+a local Ollama fallback through `Qwen/Qwen2.5-Coder-7B-Instruct-GGUF`.
+`codex-routing-policy.yaml` keeps the default provider rules and fallback order
+editable without changing Python code.
 
 See [`README_Codex_Cost_Routing.md`](README_Codex_Cost_Routing.md) for setup,
 activation, LiteLLM configuration, and usage instructions.
 
 To enter OpenAI, Gemini, or Hugging Face keys through a local page for one
 session, run `Start-CodexKeySessionWeb.ps1` and open
 `http://127.0.0.1:8787/`. Keys are kept in memory for the LiteLLM subprocess
-and are not written to disk.
+and are not written to disk. Use `Test-CodexLiteLLMDispatch.ps1` to verify the
+local proxy aliases, or add `-Call -Model codex-hf-cheap` after entering a
+provider key to make one minimal dispatch request.
 
 ## LLM Review Tools
 

@@ -10,21 +10,52 @@ applies budgets, and selects one of these LiteLLM aliases:
 - `codex-default` for normal coding work
 - `codex-long` for long-context reads, log review, and synthesis
 - `codex-deep` for difficult debugging, security, and architecture decisions
+- `codex-no-openai` for Gemini + local Qwen routing when OpenAI quota is low
+  or exhausted
 - `codex-cheap` and `codex-strong` as backward-compatible aliases
 - `codex-hf-cheap` for simple Hugging Face / open-model tasks when `HF_TOKEN`
   is set
 - `codex-hf-fast` for larger Hugging Face / multi-provider tasks when
   `HF_TOKEN` is set
 
-OpenAI and Gemini are both configured through LiteLLM model groups. The normal
-default keeps most code-generation traffic on OpenAI while letting Gemini absorb
-long-context and lower-risk work. This reduces token saturation without sending
-high-stakes changes blindly to the cheapest model.
+OpenAI, Gemini, and local Qwen are configured through LiteLLM model groups. The
+normal default now balances OpenAI with Gemini relief and keeps Qwen as a local
+zero-cost fallback. This reduces token saturation without sending high-stakes
+changes blindly to the cheapest model.
 
 API keys are never committed or written to a configuration file. `OPENAI_API_KEY`
 is required for the default profile; `GEMINI_API_KEY` is optional but recommended
 to activate the OpenAI/Gemini dispatching path.
 
+## OpenAI Quota Saver
+
+When OpenAI quota is low or exhausted, use the `codex-no-openai` alias. It routes
+through Gemini first and local Qwen second, without OpenAI entries in the model
+group:
+
+```powershell
+codex --model codex-no-openai
+```
+
+For one-shot wrapper calls, either force the provider:
+
+```powershell
+python .\scripts\python\codex_cost_router.py run --dry-run `
+  --provider no-openai `
+  "Refactor this Python API without using OpenAI quota"
+```
+
+or set a temporary session mode:
+
+```powershell
+$env:CODEX_ROUTER_OPENAI_MODE = 'avoid'
+python .\scripts\python\codex_cost_router.py run --dry-run `
+  "Refactor this Python API without using OpenAI quota"
+```
+
+For a durable default, set `avoid_openai: true` in
+`codex-routing-policy.yaml`.
+
 ## Hugging Face Integration
 
 Hugging Face can be used in two optional places.
@@ -34,7 +65,7 @@ provider pool. The local config still includes two optional aliases:
 
 ```yaml
 codex-hf-cheap -> huggingface/groq/openai/gpt-oss-120b
-codex-hf-fast  -> huggingface/together/deepseek-ai/DeepSeek-R1
+codex-hf-fast  -> huggingface/together/openai/gpt-oss-120b
 ```
 
 Set `HF_TOKEN` in the shell before starting the router. A fine-grained token
@@ -56,6 +87,35 @@ python .\scripts\python\codex_cost_router.py run --dry-run `
 `--provider auto` routes Hugging Face or multi-provider prompts to the HF aliases
 only when `HF_TOKEN` is present. Otherwise it keeps the OpenAI-backed aliases.
 
+LiteLLM also uses `HUGGINGFACE_API_KEY` while resolving some Inference Provider
+mappings. The local web session exports the submitted `HF_TOKEN` under both
+names for the LiteLLM subprocess. If you start LiteLLM manually, set both names
+to the same token:
+
+```powershell
+$env:HF_TOKEN = 'hf_...'
+$env:HUGGINGFACE_API_KEY = $env:HF_TOKEN
+```
+
+## Local Ollama Qwen Fallback
+
+The local LiteLLM config includes `codex-qwen-local` as a final fallback for
+the main Codex aliases. It uses Ollama's OpenAI-compatible endpoint with the
+lighter Qwen2.5 Coder 7B GGUF model:
+
+```powershell
+.\scripts\python\Start-CodexQwenOllama.ps1
+```
+
+The script starts Ollama if needed and pulls:
+
+```text
+hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:latest
+```
+
+LiteLLM then reaches it through `http://127.0.0.1:11434/v1`. No provider API key
+is required for this local fallback.
+
 Second, Hugging Face can be added as an optional Codex-facing layer. Running
 `enable` now installs two managed profiles:
 
@@ -100,6 +160,7 @@ Default policy:
 default_provider: auto
 default_codex_provider: litellm
 open_models_only: false
+avoid_openai: false
 max_cost_usd: 0.0
 
 task_provider_rules:
@@ -155,9 +216,11 @@ If you prefer entering keys in a local page for one work session, start:
 ```
 
 Then open `http://127.0.0.1:8787/`, paste `OPENAI_API_KEY`,
-`GEMINI_API_KEY`, or `HF_TOKEN`, and submit the form. The page starts the
-LiteLLM proxy on `http://127.0.0.1:4000/v1` with those keys only in the proxy
-process environment. The keys are not written to disk and the web server
+`GEMINI_API_KEY`, `HF_TOKEN`, or optional custom Qwen endpoint fields, and
+submit the form. For the default local Qwen/Ollama fallback, run
+`Start-CodexQwenOllama.ps1`; no Qwen API key is needed. The page starts the
+LiteLLM proxy on `http://127.0.0.1:4000/v1` with submitted values only in the
+proxy process environment. The keys are not written to disk and the web server
 suppresses request logging.
 
 To launch the optional Hugging Face-facing profile instead of the local LiteLLM
@@ -183,6 +246,23 @@ python .\scripts\python\codex_cost_router.py doctor
 If a browser opened on `http://localhost:4000/health` shows `Unauthorized`,
 that is expected: the local proxy is protected by `LITELLM_API_KEY`.
 
+Validate the local proxy aliases without making a paid/model call:
+
+```powershell
+.\scripts\python\Test-CodexLiteLLMDispatch.ps1
+```
+
+Run a real minimal provider call after entering the relevant key in the local
+web page:
+
+```powershell
+.\scripts\python\Test-CodexLiteLLMDispatch.ps1 -Model codex-hf-cheap -Call
+.\scripts\python\Test-CodexLiteLLMDispatch.ps1 -Model codex-qwen-local -Call
+.\scripts\python\Test-CodexLiteLLMDispatch.ps1 -Model codex-default -Call
+```
+
+The test prints a compact JSON result and never prints provider tokens.
+
 ## Optimized One-Shot Requests
 
 Use the Python wrapper when prompt cleanup and dynamic model routing are needed:
@@ -225,6 +305,7 @@ Prompts and API keys are not logged.
 - `codex_cost_router.py`: prompt optimization and one-shot routing.
 - `codex_key_session_web.py`: local-only web form for session keys.
 - `Start-CodexKeySessionWeb.ps1`: PowerShell launcher for the local key page.
+- `Test-CodexLiteLLMDispatch.ps1`: local proxy alias and optional call test.
 - `codex-routing-policy.yaml`: editable routing policy and fallback order.
 - `litellm-cost-routing.yaml`: local LiteLLM OSS OpenAI/Gemini model groups,
   context-window fallbacks, cooldowns, and compatibility aliases.

@@ -0,0 +1,55 @@
+[CmdletBinding()]
+param(
+    [string]$Model = "hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:latest",
+    [switch]$SkipPull
+)
+
+$ErrorActionPreference = "Stop"
+
+$ollama = Get-Command ollama -ErrorAction SilentlyContinue
+if (-not $ollama) {
+    throw "Ollama is not installed or not available in PATH."
+}
+
+try {
+    Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 3 | Out-Null
+}
+catch {
+    Start-Process -WindowStyle Hidden -FilePath $ollama.Source -ArgumentList @("serve")
+    $ready = $false
+    for ($i = 0; $i -lt 40; $i++) {
+        Start-Sleep -Milliseconds 500
+        try {
+            Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 2 | Out-Null
+            $ready = $true
+            break
+        }
+        catch {
+            $ready = $false
+        }
+    }
+    if (-not $ready) {
+        throw "Ollama did not become ready on http://127.0.0.1:11434."
+    }
+}
+
+$tags = Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 10
+$modelNames = @($tags.models | ForEach-Object { $_.name })
+if (($modelNames -notcontains $Model) -and (-not $SkipPull)) {
+    & $ollama.Source pull $Model
+    if ($LASTEXITCODE -ne 0) {
+        throw "ollama pull failed for $Model"
+    }
+}
+
+$tags = Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 10
+$modelNames = @($tags.models | ForEach-Object { $_.name })
+if ($modelNames -notcontains $Model) {
+    throw "$Model is not installed. Run without -SkipPull to download it."
+}
+
+[pscustomobject]@{
+    ok = $true
+    model = $Model
+    api_base = "http://127.0.0.1:11434/v1"
+} | ConvertTo-Json
@@ -0,0 +1,95 @@
+[CmdletBinding()]
+param(
+    [string]$BaseUrl = "http://127.0.0.1:4000/v1",
+    [string]$ApiKey = "sk-local-codex",
+    [string]$Model = "codex-default",
+    [switch]$Call,
+    [int]$TimeoutSec = 90
+)
+
+$ErrorActionPreference = "Stop"
+
+$headers = @{
+    "Authorization" = "Bearer $ApiKey"
+    "Content-Type" = "application/json"
+}
+
+function ConvertTo-ShortError {
+    param([object]$ErrorRecord)
+
+    $message = $ErrorRecord.Exception.Message
+    if ($ErrorRecord.ErrorDetails -and $ErrorRecord.ErrorDetails.Message) {
+        $message = $ErrorRecord.ErrorDetails.Message
+    }
+    if ($message.Length -gt 900) {
+        return $message.Substring(0, 900) + "..."
+    }
+    return $message
+}
+
+$models = Invoke-RestMethod -Uri "$BaseUrl/models" -Headers $headers -Method Get -TimeoutSec 10
+$modelIds = @($models.data | ForEach-Object { $_.id })
+$requiredAliases = @(
+    "codex-light",
+    "codex-default",
+    "codex-long",
+    "codex-deep",
+    "codex-qwen-local",
+    "codex-hf-cheap",
+    "codex-hf-fast"
+)
+$missingAliases = @($requiredAliases | Where-Object { $modelIds -notcontains $_ })
+
+$health = $null
+try {
+    $healthUrl = $BaseUrl -replace "/v1$", ""
+    $health = Invoke-RestMethod -Uri "$healthUrl/health" -Headers $headers -Method Get -TimeoutSec $TimeoutSec
+}
+catch {
+    $health = [pscustomobject]@{
+        healthy_count = $null
+        unhealthy_count = $null
+        health_error = ConvertTo-ShortError $_
+    }
+}
+
+$callResult = $null
+if ($Call) {
+    $body = @{
+        model = $Model
+        messages = @(
+            @{
+                role = "user"
+                content = "Reply with exactly: dispatch ok"
+            }
+        )
+        max_tokens = 16
+        temperature = 0
+    } | ConvertTo-Json -Depth 6
+
+    try {
+        $response = Invoke-RestMethod -Uri "$BaseUrl/chat/completions" -Headers $headers -Method Post -Body $body -TimeoutSec $TimeoutSec
+        $callResult = [pscustomobject]@{
+            ok = $true
+            model = $response.model
+            content = $response.choices[0].message.content
+        }
+    }
+    catch {
+        $callResult = [pscustomobject]@{
+            ok = $false
+            error = ConvertTo-ShortError $_
+        }
+    }
+}
+
+[pscustomobject]@{
+    ok = ($missingAliases.Count -eq 0 -and (-not $Call -or ($callResult -and $callResult.ok)))
+    base_url = $BaseUrl
+    aliases_present = @($requiredAliases | Where-Object { $modelIds -contains $_ })
+    aliases_missing = $missingAliases
+    healthy_count = $health.healthy_count
+    unhealthy_count = $health.unhealthy_count
+    health_error = $health.health_error
+    call = $callResult
+} | ConvertTo-Json -Depth 6
@@ -1,9 +1,12 @@
 # Codex cost-routing policy.
 # CLI options still have priority, then environment variables, then this file.
+# Provider choices: auto, openai, gemini, huggingface, qwen.
+# qwen uses a self-hosted OpenAI-compatible endpoint via QWEN_API_BASE.
 
 default_provider: auto
 default_codex_provider: litellm
 open_models_only: false
+avoid_openai: false
 max_cost_usd: 0.0
 
 task_provider_rules: