Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion scripts/python/Install-CodexLocalLiteLLMAssets.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ New-Item -ItemType Directory -Force -Path $target | Out-Null
$files = @(
'litellm-cost-routing.yaml',
'codex_key_session_web.py',
'Start-CodexKeySessionWeb.ps1'
'Start-CodexKeySessionWeb.ps1',
'Start-CodexQwenOllama.ps1',
'Test-CodexLiteLLMDispatch.ps1'
)

foreach ($file in $files) {
Expand Down
11 changes: 7 additions & 4 deletions scripts/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,20 @@ dispatches those aliases across OpenAI and Gemini while keeping API keys in
environment variables. When `HF_TOKEN` is available, it can also route Hugging
Face and multi-provider tasks through the `codex-hf-cheap` and `codex-hf-fast`
LiteLLM aliases, or launch an optional `cost-routing-hf` Codex profile that
points directly at the Hugging Face router. `codex-routing-policy.yaml` keeps
the default provider rules and fallback order editable without changing Python
code.
points directly at the Hugging Face router. `codex-qwen-local` is available as
a local Ollama fallback through `Qwen/Qwen2.5-Coder-7B-Instruct-GGUF`.
`codex-routing-policy.yaml` keeps the default provider rules and fallback order
editable without changing Python code.

See [`README_Codex_Cost_Routing.md`](README_Codex_Cost_Routing.md) for setup,
activation, LiteLLM configuration, and usage instructions.

To enter OpenAI, Gemini, or Hugging Face keys through a local page for one
session, run `Start-CodexKeySessionWeb.ps1` and open
`http://127.0.0.1:8787/`. Keys are kept in memory for the LiteLLM subprocess
and are not written to disk.
and are not written to disk. Use `Test-CodexLiteLLMDispatch.ps1` to verify the
local proxy aliases, or add `-Call -Model codex-hf-cheap` after entering a
provider key to make one minimal dispatch request.

## LLM Review Tools

Expand Down
97 changes: 89 additions & 8 deletions scripts/python/README_Codex_Cost_Routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,52 @@ applies budgets, and selects one of these LiteLLM aliases:
- `codex-default` for normal coding work
- `codex-long` for long-context reads, log review, and synthesis
- `codex-deep` for difficult debugging, security, and architecture decisions
- `codex-no-openai` for Gemini + local Qwen routing when OpenAI quota is low
or exhausted
- `codex-cheap` and `codex-strong` as backward-compatible aliases
- `codex-hf-cheap` for simple Hugging Face / open-model tasks when `HF_TOKEN`
is set
- `codex-hf-fast` for larger Hugging Face / multi-provider tasks when
`HF_TOKEN` is set

OpenAI and Gemini are both configured through LiteLLM model groups. The normal
default keeps most code-generation traffic on OpenAI while letting Gemini absorb
long-context and lower-risk work. This reduces token saturation without sending
high-stakes changes blindly to the cheapest model.
OpenAI, Gemini, and local Qwen are configured through LiteLLM model groups. The
normal default now balances OpenAI with Gemini relief and keeps Qwen as a local
zero-cost fallback. This reduces token saturation without sending high-stakes
changes blindly to the cheapest model.

API keys are never committed or written to a configuration file. `OPENAI_API_KEY`
is required for the default profile; `GEMINI_API_KEY` is optional but recommended
to activate the OpenAI/Gemini dispatching path.

## OpenAI Quota Saver

When OpenAI quota is low or exhausted, use the `codex-no-openai` alias. It routes
through Gemini first and local Qwen second, without OpenAI entries in the model
group:

```powershell
codex --model codex-no-openai
```

For one-shot wrapper calls, either force the provider:

```powershell
python .\scripts\python\codex_cost_router.py run --dry-run `
--provider no-openai `
"Refactor this Python API without using OpenAI quota"
```

or set a temporary session mode:

```powershell
$env:CODEX_ROUTER_OPENAI_MODE = 'avoid'
python .\scripts\python\codex_cost_router.py run --dry-run `
"Refactor this Python API without using OpenAI quota"
```

For a durable default, set `avoid_openai: true` in
`codex-routing-policy.yaml`.

## Hugging Face Integration

Hugging Face can be used in two optional places.
Expand All @@ -34,7 +65,7 @@ provider pool. The local config still includes two optional aliases:

```yaml
codex-hf-cheap -> huggingface/groq/openai/gpt-oss-120b
codex-hf-fast -> huggingface/together/deepseek-ai/DeepSeek-R1
codex-hf-fast -> huggingface/together/openai/gpt-oss-120b
```

Set `HF_TOKEN` in the shell before starting the router. A fine-grained token
Expand All @@ -56,6 +87,35 @@ python .\scripts\python\codex_cost_router.py run --dry-run `
`--provider auto` routes Hugging Face or multi-provider prompts to the HF aliases
only when `HF_TOKEN` is present. Otherwise it keeps the OpenAI-backed aliases.

LiteLLM also uses `HUGGINGFACE_API_KEY` while resolving some Inference Provider
mappings. The local web session exports the submitted `HF_TOKEN` under both
names for the LiteLLM subprocess. If you start LiteLLM manually, set both names
to the same token:

```powershell
$env:HF_TOKEN = 'hf_...'
$env:HUGGINGFACE_API_KEY = $env:HF_TOKEN
```

## Local Ollama Qwen Fallback

The local LiteLLM config includes `codex-qwen-local` as a final fallback for
the main Codex aliases. It uses Ollama's OpenAI-compatible endpoint with the
lighter Qwen2.5 Coder 7B GGUF model:

```powershell
.\scripts\python\Start-CodexQwenOllama.ps1
```

The script starts Ollama if needed and pulls:

```text
hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:latest
```

LiteLLM then reaches it through `http://127.0.0.1:11434/v1`. No provider API key
is required for this local fallback.

Second, Hugging Face can be added as an optional Codex-facing layer. Running
`enable` now installs two managed profiles:

Expand Down Expand Up @@ -100,6 +160,7 @@ Default policy:
default_provider: auto
default_codex_provider: litellm
open_models_only: false
avoid_openai: false
max_cost_usd: 0.0

task_provider_rules:
Expand Down Expand Up @@ -155,9 +216,11 @@ If you prefer entering keys in a local page for one work session, start:
```

Then open `http://127.0.0.1:8787/`, paste `OPENAI_API_KEY`,
`GEMINI_API_KEY`, or `HF_TOKEN`, and submit the form. The page starts the
LiteLLM proxy on `http://127.0.0.1:4000/v1` with those keys only in the proxy
process environment. The keys are not written to disk and the web server
`GEMINI_API_KEY`, `HF_TOKEN`, or optional custom Qwen endpoint fields, and
submit the form. For the default local Qwen/Ollama fallback, run
`Start-CodexQwenOllama.ps1`; no Qwen API key is needed. The page starts the
LiteLLM proxy on `http://127.0.0.1:4000/v1` with submitted values only in the
proxy process environment. The keys are not written to disk and the web server
suppresses request logging.

To launch the optional Hugging Face-facing profile instead of the local LiteLLM
Expand All @@ -183,6 +246,23 @@ python .\scripts\python\codex_cost_router.py doctor
If a browser opened on `http://localhost:4000/health` shows `Unauthorized`,
that is expected: the local proxy is protected by `LITELLM_API_KEY`.

Validate the local proxy aliases without making a paid/model call:

```powershell
.\scripts\python\Test-CodexLiteLLMDispatch.ps1
```

Run a real minimal provider call after entering the relevant key in the local
web page:

```powershell
.\scripts\python\Test-CodexLiteLLMDispatch.ps1 -Model codex-hf-cheap -Call
.\scripts\python\Test-CodexLiteLLMDispatch.ps1 -Model codex-qwen-local -Call
.\scripts\python\Test-CodexLiteLLMDispatch.ps1 -Model codex-default -Call
```

The test prints a compact JSON result and never prints provider tokens.

## Optimized One-Shot Requests

Use the Python wrapper when prompt cleanup and dynamic model routing are needed:
Expand Down Expand Up @@ -225,6 +305,7 @@ Prompts and API keys are not logged.
- `codex_cost_router.py`: prompt optimization and one-shot routing.
- `codex_key_session_web.py`: local-only web form for session keys.
- `Start-CodexKeySessionWeb.ps1`: PowerShell launcher for the local key page.
- `Test-CodexLiteLLMDispatch.ps1`: local proxy alias and optional call test.
- `codex-routing-policy.yaml`: editable routing policy and fallback order.
- `litellm-cost-routing.yaml`: local LiteLLM OSS OpenAI/Gemini model groups,
context-window fallbacks, cooldowns, and compatibility aliases.
Expand Down
55 changes: 55 additions & 0 deletions scripts/python/Start-CodexQwenOllama.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
[CmdletBinding()]
param(
[string]$Model = "hf.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF:latest",
[switch]$SkipPull
)

$ErrorActionPreference = "Stop"

$ollama = Get-Command ollama -ErrorAction SilentlyContinue
if (-not $ollama) {
throw "Ollama is not installed or not available in PATH."
}

try {
Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 3 | Out-Null
}
catch {
Start-Process -WindowStyle Hidden -FilePath $ollama.Source -ArgumentList @("serve")
$ready = $false
for ($i = 0; $i -lt 40; $i++) {
Start-Sleep -Milliseconds 500
try {
Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 2 | Out-Null
$ready = $true
break
}
catch {
$ready = $false
}
}
if (-not $ready) {
throw "Ollama did not become ready on http://127.0.0.1:11434."
}
}

$tags = Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 10
$modelNames = @($tags.models | ForEach-Object { $_.name })
if (($modelNames -notcontains $Model) -and (-not $SkipPull)) {
& $ollama.Source pull $Model
if ($LASTEXITCODE -ne 0) {
throw "ollama pull failed for $Model"
}
}

$tags = Invoke-RestMethod -Uri "http://127.0.0.1:11434/api/tags" -TimeoutSec 10
$modelNames = @($tags.models | ForEach-Object { $_.name })
if ($modelNames -notcontains $Model) {
throw "$Model is not installed. Run without -SkipPull to download it."
}

[pscustomobject]@{
ok = $true
model = $Model
api_base = "http://127.0.0.1:11434/v1"
} | ConvertTo-Json
95 changes: 95 additions & 0 deletions scripts/python/Test-CodexLiteLLMDispatch.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
[CmdletBinding()]
param(
[string]$BaseUrl = "http://127.0.0.1:4000/v1",
[string]$ApiKey = "sk-local-codex",
[string]$Model = "codex-default",
[switch]$Call,
[int]$TimeoutSec = 90
)

$ErrorActionPreference = "Stop"

$headers = @{
"Authorization" = "Bearer $ApiKey"
"Content-Type" = "application/json"
}

function ConvertTo-ShortError {
param([object]$ErrorRecord)

$message = $ErrorRecord.Exception.Message
if ($ErrorRecord.ErrorDetails -and $ErrorRecord.ErrorDetails.Message) {
$message = $ErrorRecord.ErrorDetails.Message
}
if ($message.Length -gt 900) {
return $message.Substring(0, 900) + "..."
}
return $message
}

$models = Invoke-RestMethod -Uri "$BaseUrl/models" -Headers $headers -Method Get -TimeoutSec 10
$modelIds = @($models.data | ForEach-Object { $_.id })
$requiredAliases = @(
"codex-light",
"codex-default",
"codex-long",
"codex-deep",
"codex-qwen-local",
"codex-hf-cheap",
"codex-hf-fast"
)
$missingAliases = @($requiredAliases | Where-Object { $modelIds -notcontains $_ })

$health = $null
try {
$healthUrl = $BaseUrl -replace "/v1$", ""
$health = Invoke-RestMethod -Uri "$healthUrl/health" -Headers $headers -Method Get -TimeoutSec $TimeoutSec
}
catch {
$health = [pscustomobject]@{
healthy_count = $null
unhealthy_count = $null
health_error = ConvertTo-ShortError $_
}
}

$callResult = $null
if ($Call) {
$body = @{
model = $Model
messages = @(
@{
role = "user"
content = "Reply with exactly: dispatch ok"
}
)
max_tokens = 16
temperature = 0
} | ConvertTo-Json -Depth 6

try {
$response = Invoke-RestMethod -Uri "$BaseUrl/chat/completions" -Headers $headers -Method Post -Body $body -TimeoutSec $TimeoutSec
$callResult = [pscustomobject]@{
ok = $true
model = $response.model
content = $response.choices[0].message.content
}
}
catch {
$callResult = [pscustomobject]@{
ok = $false
error = ConvertTo-ShortError $_
}
}
}

[pscustomobject]@{
ok = ($missingAliases.Count -eq 0 -and (-not $Call -or ($callResult -and $callResult.ok)))
base_url = $BaseUrl
aliases_present = @($requiredAliases | Where-Object { $modelIds -contains $_ })
aliases_missing = $missingAliases
healthy_count = $health.healthy_count
unhealthy_count = $health.unhealthy_count
health_error = $health.health_error
call = $callResult
} | ConvertTo-Json -Depth 6
3 changes: 3 additions & 0 deletions scripts/python/codex-routing-policy.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# Codex cost-routing policy.
# CLI options still have priority, then environment variables, then this file.
# Provider choices: auto, openai, gemini, huggingface, qwen.
# qwen uses a self-hosted OpenAI-compatible endpoint via QWEN_API_BASE.

default_provider: auto
default_codex_provider: litellm
open_models_only: false
avoid_openai: false
max_cost_usd: 0.0

task_provider_rules:
Expand Down
Loading