Skip to content

feat: add opt-in model router catalog picker filter#1

Open
BenKalsky wants to merge 2 commits into
mainfrom
ben/model-router-picker-catalog-opt-in-20260506
Open

feat: add opt-in model router catalog picker filter#1
BenKalsky wants to merge 2 commits into
mainfrom
ben/model-router-picker-catalog-opt-in-20260506

Conversation

@BenKalsky
Copy link
Copy Markdown
Member

Summary

Adds an opt-in /model picker catalog filter for Hermes, driven by the shared model-router catalog.

When HERMES_MODEL_ROUTER_CATALOG points to a shared catalog JSON, Hermes filters picker rows to the curated model-router catalog. Without the env var, behavior remains unchanged.

Safety

  • Opt-in only via HERMES_MODEL_ROUTER_CATALOG
  • No default behavior change
  • No live routing/enforcement change
  • Keeps provider switch callback slugs compatible with Hermes runtime

Verification

venv/bin/python -m pytest \
  tests/hermes_cli/test_model_router_picker_catalog.py \
  tests/hermes_cli/test_user_providers_model_switch.py \
  -q -o 'addopts='

Result: 32 passed

Additional smoke with real shared catalog:

  • With HERMES_MODEL_ROUTER_CATALOG=/home/gidon/shared-agents/model-router/catalog.json: catalog_rows=9
  • Without env var: catalog_rows=0

@BenKalsky BenKalsky force-pushed the ben/model-router-picker-catalog-opt-in-20260506 branch from e12e8d6 to d06f7ae Compare May 6, 2026 16:34
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

https://github.com/Digitizers/hermes-agent/blob/e12e8d6feaa219be1984ee2faa509e0e1338676a/hermes_cli/model_switch.py#L1041-L1042
P2 Badge Map common custom Ollama rows to the catalog

When HERMES_MODEL_ROUTER_CATALOG is set and the user has the common local Ollama custom providers named like Ollama — GLM 5.1, section 4 groups them as slug custom:ollama with display name Ollama (see existing grouping tests). This mapping only recognizes custom:ollama-local / Ollama Local, so those rows do not match the catalog's ollama entry and _align_picker_results_with_model_router_catalog() treats them as unmatched user-defined providers, leaving the uncurated local model list visible in the picker despite the opt-in catalog filter.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@BenKalsky
Copy link
Copy Markdown
Member Author

Status update after follow-up push:

  • Added AUTHOR_MAP entry for ben@digitizer.co.il -> BenKalsky in scripts/release.py.
  • Local attribution check now passes against digitizers/main...HEAD.
  • Local focused picker tests pass:
    venv/bin/python -m pytest tests/hermes_cli/test_model_router_picker_catalog.py tests/hermes_cli/test_user_providers_model_switch.py -q -o addopts=
    Result: 32 passed.
  • git diff --check passes.
  • CI rerun: attribution, supply-chain, e2e, and both nix jobs pass.

Remaining red check: full test job still fails with 33 unrelated upstream-suite failures across Bedrock/context, cron prompt, Discord/Telegram gateway tests, update tests, browser chromium checks, delegate tests, dockerfile tests, env sandbox tests, etc. The PR diff is limited to:

  • hermes_cli/model_switch.py
  • tests/hermes_cli/test_model_router_picker_catalog.py
  • scripts/release.py

I do not recommend bundling broad upstream test-suite fixes into this focused picker PR.

BenKalsky pushed a commit that referenced this pull request May 22, 2026
Three issues flagged by the Copilot review on this PR:

1. Double JSON emit on stage failure (Copilot #1, NousResearch#2). When -Stage <name>
   ran a worker that threw, Invoke-Stage's finally emitted a JSON result
   frame AND the entry-point catch emitted a second error frame --
   producing two concatenated JSON objects on stdout and breaking the
   one-line-per-invocation contract that drivers parse against. Same
   issue applied to -Json mode on a full install (every stage's finally
   plus a final error frame missing duration_ms/skipped).

   Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame
   when it emits a failure frame; the entry-point catch checks the flag
   and skips its own emit, still exit 1.

2. $prevEAP uninitialized on early try-block throw (Copilot NousResearch#3). In
   Install-Uv, Test-Python, Test-Node's winget fallback,
   _Run-NpmInstall, and the playwright block, '$prevEAP =
   $ErrorActionPreference' lived as the first statement INSIDE the
   try. If anything between 'try {' and that line threw (Write-Info on
   an unusual host, the npx-finding loop, etc.), the catch's
   'if ($prevEAP) { ... }' restore was a no-op and EAP could remain
   relaxed.

   Fix: hoist '$prevEAP = $ErrorActionPreference' to the line
   immediately before 'try {' in all five sites. Catch's restore is
   now always meaningful regardless of where in the try the throw
   originated.

No change to Invoke-Stage's success path or to the four lint-clean EAP
sites (Test-Node was the only winget-related catch). All 19 metadata
smoke tests still pass.
BenKalsky pushed a commit that referenced this pull request May 22, 2026
Four findings from Copilot's review on PR NousResearch#22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON
   `elements` array only held #1..NousResearch#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant