feat(inference): gate local Ollama models by memory-layer context-window minimum#2122
Conversation
The memory tree embedder (bge-m3) is requested with num_ctx=8192 and the summariser caps output to fit that 8192-token embed ceiling. A local model whose native context is below this floor silently truncates chunks/summaries and corrupts recall. - Expose EMBED_NUM_CTX as pub(crate) so it is the single source of truth - model_requirements.rs: MIN_CONTEXT_TOKENS (re-exports EMBED_NUM_CTX, cannot drift) + evaluate_context -> Ok / BelowMinimum / Unknown - ollama.rs: /api/show client (OllamaShowRequest/Response) + context_length_from_model_info (architecture-agnostic, with fallback) - Unit tests for parsing + eligibility classification + drift guard Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diagnostics() now fetches each installed model's native context window via one bounded concurrent /api/show call and attaches a per-model eligibility verdict. Active chat/embedding models below the floor are surfaced as issues so they are rejected rather than silently used. - ollama_admin.rs: fetch_model_context helper + enriched installed_models (context_length + eligibility), context_requirement, per-expected eligibility, rejecting issues for sub-floor active models - inference/schemas.rs: documented the new diagnostics output fields - about_app/catalog.rs: capability local_ai.model_context_check - TEST-COVERAGE-MATRIX.md: row 3.1.5 - Integration test (axum mock): diagnostics_gates_models_by_context_window Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface the context-window gate in Settings > Local AI Model. Each installed model shows a context badge: green (accepted), red + flagged row (rejected, below the memory-layer minimum), or grey (unknown / could not be determined, not a hard rejection). - localAi.ts: ModelContextEligibility + InstalledModelInfo types; context_requirement + per-expected eligibility on LocalAiDiagnostics - ModelStatusSection.tsx: ContextEligibilityBadge + rejection styling - 4 Vitest cases (accept / reject+flag / unknown / older-core no-badge) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughDefines a memory-layer minimum (8192), extracts native context windows from Ollama /api/show, evaluates per-model eligibility, includes eligibility and min_context_tokens in diagnostics, and renders eligibility badges and rejection styling in the frontend. ChangesLocal Model Context-Window Eligibility Gating
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
app/src/components/settings/panels/local-model/ModelStatusSection.tsx (1)
11-48: ⚡ Quick winConsider internationalizing the badge text.
The badge labels (
"ctx ✓","below X min","ctx unknown") are hardcoded in English, while the rest of the component uses theuseT()i18n hook. For consistency and to support non-English locales, consider extracting these strings to translation keys.♻️ Example refactor to use i18n
const ContextEligibilityBadge = ({ eligibility, }: { eligibility: ModelContextEligibility | null | undefined; }) => { + const { t } = useT(); if (!eligibility) return null; const fmt = (n: number) => n.toLocaleString(); if (eligibility.status === 'ok') { return ( <span className="shrink-0 rounded-full bg-green-100 dark:bg-green-500/15 px-2 py-0.5 text-[10px] font-medium text-green-700 dark:text-green-300" - title={`Context window ${fmt(eligibility.context_length)} tokens — meets the memory-layer minimum`}> - {fmt(eligibility.context_length)} ctx ✓ + title={t('settings.localModel.contextEligibility.okTooltip', { count: fmt(eligibility.context_length) })}> + {t('settings.localModel.contextEligibility.okLabel', { count: fmt(eligibility.context_length) })} </span> ); } // Similar for below_minimum and unknown cases...🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/local-model/ModelStatusSection.tsx` around lines 11 - 48, The badge texts in ContextEligibilityBadge are hardcoded in English; update ContextEligibilityBadge to use the existing i18n hook (call useT() inside the component) and replace the literal strings ("ctx ✓", "below X min", "ctx unknown" and the title text snippets) with translation keys (e.g., "modelStatus.ctxOk", "modelStatus.belowMin", "modelStatus.unknown", and corresponding title keys), passing formatted values (fmt(eligibility.context_length), fmt(eligibility.required)) as interpolation params to t(); keep existing formatting helpers (fmt) and preserve the status-based JSX paths in ContextEligibilityBadge so only the displayed text and titles are swapped to t(...) calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/inference/local/ollama.rs`:
- Around line 198-200: The current fallback scans info.iter() and picks the
first key ending with ".context_length", which can pick an unrelated low value;
instead, collect all entries whose key ends_with(".context_length"), parse each
via as_u64, and take the maximum numeric value (if any) as the fallback context
length so unrelated low entries won't undercut valid models; update the chain
that currently uses info.iter().find(...).and_then(|(_, value)| as_u64(value))
to a filter_map + max approach and ensure the result remains an Option<u64>
(symbols: info.iter(), as_u64, ".context_length").
---
Nitpick comments:
In `@app/src/components/settings/panels/local-model/ModelStatusSection.tsx`:
- Around line 11-48: The badge texts in ContextEligibilityBadge are hardcoded in
English; update ContextEligibilityBadge to use the existing i18n hook (call
useT() inside the component) and replace the literal strings ("ctx ✓", "below X
min", "ctx unknown" and the title text snippets) with translation keys (e.g.,
"modelStatus.ctxOk", "modelStatus.belowMin", "modelStatus.unknown", and
corresponding title keys), passing formatted values
(fmt(eligibility.context_length), fmt(eligibility.required)) as interpolation
params to t(); keep existing formatting helpers (fmt) and preserve the
status-based JSX paths in ContextEligibilityBadge so only the displayed text and
titles are swapped to t(...) calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 261891ef-c701-4f68-976c-e80598fa631e
📒 Files selected for processing (12)
app/src/components/settings/panels/local-model/ModelStatusSection.test.tsxapp/src/components/settings/panels/local-model/ModelStatusSection.tsxapp/src/utils/tauriCommands/localAi.tsdocs/TEST-COVERAGE-MATRIX.mdsrc/openhuman/about_app/catalog.rssrc/openhuman/inference/local/mod.rssrc/openhuman/inference/local/model_requirements.rssrc/openhuman/inference/local/ollama.rssrc/openhuman/inference/local/service/ollama_admin.rssrc/openhuman/inference/local/service/ollama_admin_tests.rssrc/openhuman/inference/schemas.rssrc/openhuman/memory/tree/score/embed/ollama.rs
… key absent Without `general.architecture`, the previous fallback took the first `.context_length` key found in `model_info`. Multimodal models can carry a low secondary value (e.g. `clip.context_length: 77`) that appears before the primary architecture's key, causing the model to be incorrectly flagged as below-minimum. Switch to `.filter_map().max()` so the largest reported context window is used when the architecture-scoped lookup fails. Adds a regression test covering this exact scenario (clip=77, llama=32768 → returns 32768). Addresses CodeRabbit review comment (major). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
Nitpick acknowledged (i18n badge text): The labels are hardcoded English. Deferring i18n to a follow-up: the badge text includes dynamic numeric values (, ) that cannot be stored as plain translation keys in the current system (which takes a single key, returns a plain string with no interpolation). Adding these would require either extending with param interpolation or manually doing calls, plus adding keys to all 10+ locale files — meaningful scope for a separate PR. The diagnostic panel itself is an advanced/power-user surface with low localization priority.
sanil-23
left a comment
There was a problem hiding this comment.
Nitpick acknowledged (i18n badge text): The ContextEligibilityBadge labels are hardcoded English. Deferring i18n to a follow-up: the badge text includes dynamic numeric values (context_length, required) that cannot be stored as plain translation keys in the current useT() system (which takes a single key and returns a plain string with no interpolation support). Properly internationalizing these would require either extending useT() with param interpolation or manually doing .replace('{count}', ...) calls, plus adding keys to all 10+ locale files - meaningful scope for a separate PR. The diagnostics panel is an advanced power-user surface with low localization priority relative to main chat flows.
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
Solid feature PR — gates local Ollama models by the memory layer's minimum context window (8192 tokens, derived from the embedder's own EMBED_NUM_CTX). The single-source-of-truth approach is well-designed: the acceptance floor literally re-exports the embedder constant, so it can't drift. Good test coverage across Rust (10 tests) and frontend (4 Vitest cases), including a drift-guard test and the CodeRabbit-suggested .max() regression test.
Change Summary
| File | Change type | Description |
|---|---|---|
model_requirements.rs |
New | MIN_CONTEXT_TOKENS, ContextEligibility enum, evaluate_context() + 5 unit tests |
ollama.rs |
Modified | OllamaShowRequest/Response, context_length_from_model_info() + 6 unit tests |
ollama_admin.rs |
Modified | fetch_model_context(), diagnostics integration with per-model eligibility |
ollama_admin_tests.rs |
Modified | Integration test with mock /api/show |
mod.rs |
Modified | Declares + re-exports model_requirements module |
embed/ollama.rs |
Modified | EMBED_NUM_CTX visibility → pub(crate) |
schemas.rs |
Modified | Enriched diagnostics output description |
catalog.rs |
Modified | New capability entry local_ai.model_context_check |
ModelStatusSection.tsx |
Modified | ContextEligibilityBadge component + model card redesign |
ModelStatusSection.test.tsx |
Modified | 4 new test cases |
localAi.ts |
Modified | New types ModelContextEligibility, InstalledModelInfo, extended LocalAiDiagnostics |
TEST-COVERAGE-MATRIX.md |
Modified | Added row 3.1.5 |
Per-file Analysis
model_requirements.rs — Clean, well-documented module. Drift-guard test is a nice touch. Pure logic with no side effects, so the absence of logging is appropriate.
ollama.rs — context_length_from_model_info resolution order (architecture-prefixed key → fallback max) is sound. The as_u64 helper correctly handles integer/float encodings. Tests are thorough.
ollama_admin.rs — fetch_model_context has proper error handling (.inspect_err + .ok()?) and debug logging with [local_ai:ollama_admin] prefix. The concurrent join_all for /api/show calls is the right call — bounded by installed model count.
Frontend — ContextEligibilityBadge is well-structured with null-safe guard. All new TS types are optional/additive for backward compatibility with older cores. Tests cover all three badge states + the legacy (no eligibility) path.
No critical or major issues found. Two minor notes below.
…ent capabilities field (tinyhumansai#2122) - `src/openhuman/inference/local/mod.rs`: narrow `pub mod model_requirements` to `pub(crate) mod model_requirements`, consistent with sibling modules `lm_studio`, `install_piper`, `install_whisper` which are already `pub(crate)`. The re-exports on line 40 are also `pub(crate) use`, so the module itself does not need to be `pub`. - `src/openhuman/inference/local/ollama.rs`: add an inline comment above `capabilities` explaining why the field is retained despite `#[allow(dead_code)]` — it documents the Ollama wire shape, not yet consumed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@graycyrus ready for re-review 🙏 Both your review comments are addressed and resolved (commit
Status: all CI green (22/22, only the always-skipped macOS/Windows e2e remaining), 0 unresolved threads, MERGEABLE, no conflicts. Blocked only on an approving review. Thanks! |
graycyrus
left a comment
There was a problem hiding this comment.
Looks good, nice work!
Empty commit to kick a fresh CI run; fork contributors cannot re-run upstream workflow runs via the API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ction + ollama_admin) Resolved two conflicts from 4 upstream commits: - ModelStatusSection.tsx: keep both OllamaConnectionTestResult (this branch) and ModelContextEligibility (upstream, from tinyhumansai#2122 memory-layer gate) - ollama_admin.rs: merge both import sets — our validate_ollama_url + ollama_base_url_from_config with upstream OllamaShowRequest + OllamaShowResponse
…dow minimum (tinyhumansai#2122) Co-authored-by: Cyrus Gray <cyrus@tinyhumans.ai>
Summary
num_ctx=8192; summariser output is capped at 5000 to fit that embed ceiling). Sub-floor models silently truncate chunks/summaries and corrupt recall.EMBED_NUM_CTX(nowpub(crate)), so the acceptance gate can never drift from what the pipeline actually requests./api/showclient resolves each model's realcontext_length(architecture-agnostic).inference.diagnosticsnow reports per-modelcontext_length+ aneligibilityverdict, acontext_requirement, and rejectingissuesfor sub-floor active models.Problem
The memory tree's embedder over-requests
num_ctx=8192and the whole summary-budget logic exists to keep embed input ≤ 8192. Nothing validated that a user-selected local model actually supports that context — a short-context model (e.g. a 2048-ctx embedder) was accepted and then silently truncated input at embed time, corrupting recall with no signal to the user./api/tags(the only Ollama metadata fetched) does not expose context length, so the gate needs/api/show.Solution
memory::tree::score::embed::ollama::EMBED_NUM_CTXmadepub(crate);inference::local::model_requirements::MIN_CONTEXT_TOKENSre-exports it (single source of truth, drift-guarded by a test).evaluate_context(Option<u64>)→Ok/BelowMinimum/Unknown.Unknown(/api/showunreachable or metadata absent) is not a hard rejection — only a conclusive sub-floor report is.OllamaAdmin::fetch_model_contextissues one bounded (5s)/api/showper installed model, concurrently, only on the diagnostics path.eligibilityis absent).Submission Checklist
/api/showparse incl. missing/float, axum-mock diagnostics gate) + 4 Vitest cases (accept / reject+flag / unknown / older-core). Fullpnpm test:coverage/pnpm test:rustmerge deferred to CI (see Validation Run).3.1.5indocs/TEST-COVERAGE-MATRIX.md## Related/api/show; no new crates/deps; tests use an in-process axum mockCloses #NNN)Impact
/api/showround-trip per installed model, only when diagnostics runs, fetched concurrently with a 5s per-call timeout. No effect on hot paths.issue(ok:false) and a red badge instead of being silently used.Related
3.1.5 Model Context-Window Requirement Gatelocal_ai.model_context_checkAI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
feat/ollama-context-gate(offupstream/main)b8af174cd18620e5881a2d29a4e4cafcee1bfb37Validation Run
pnpm --filter openhuman-app format:check— Prettier run scoped to the 3 changed app files (clean); repo-wide skipped (Windows CRLF churn rewrites ~600 unrelated files — established--no-verifypattern)pnpm typecheck— passedcargo test --lib -- model_requirements:: context_length_ diagnostics_gates_models_by_context_window(10 passed); VitestModelStatusSection.test.tsx(18 passed)cargo fmt+cargo check --manifest-path Cargo.toml— clean, no new warnings on changed filesapp/src-taurinot touched (changes are core Rust + app TS); pre-pushpnpm rust:checkskipped via--no-verifyper established Windows patternValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
/api/tagslisting unchanged; older cores (noeligibility) render exactly as before (no badge);Unknowndoes not block/api/showfailure →Unknown(soft), never a hard reject; LM Studio diagnostics path untouchedDuplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Tests
Documentation
Chores