feat(daemon): add llama.cpp as default fallback runtime provider#499
feat(daemon): add llama.cpp as default fallback runtime provider#499NicholaiVogel merged 29 commits intomainfrom
Conversation
Replace Ollama as the default extraction, embedding, and synthesis provider with llama.cpp, which exposes an OpenAI-compatible /v1/chat/completions endpoint. Ollama remains available as a fallback. - Add createLlamaCppProvider using /v1/chat/completions (OpenAI-compatible) - Add llama.cpp embedding provider using /v1/embeddings - Add llama.cpp model discovery in registry (GET /v1/models) - Default extraction model: qwen3.5:4b (via llama.cpp, port 8080) - Dynamic fallback chain uses extractionFallbackProvider config - Summary worker synthesis supports llama-cpp with proper fallbacks - Dashboard embedding and extraction provider dropdowns include llama-cpp - SDK EmbeddingStatusResponse type updated - docs/PIPELINE.md provider section updated Co-authored-by: Aaf2tbz <averyfelts@aol.com>
This comment was marked as outdated.
This comment was marked as outdated.
8 similar comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
12 similar comments
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Hi @aaf2tbz - I'm an automated code reviewer powered by pr-reviewer. I'm taking a look at the feature work in |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Hi @aaf2tbz - I'm an automated code reviewer powered by pr-reviewer. I'm taking a look at the feature work in |
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
c3aec754
[Automated Review — pr-reviewer bot] PR #499 adds llama.cpp as the default runtime provider. The CLI, dashboard, and type changes are internally consistent. Three issues are worth discussing before merge: an unexplained regeneration of a generated bundle file with no corresponding source change, a doc inconsistency in PIPELINE.md, and an implicit behavior change for users who relied on the previous Ollama default without explicitly configuring it.
Confidence: Medium [missing_cross_module_context] - The diff omits the core daemon files described in the PR summary (provider.ts containing createLlamaCppProvider(), the updated DEFAULT_PIPELINE_V2 constant, checkEmbeddingProvider changes, discoverLlamaCppModels(), and summary-worker fallback edits). The CLI, dashboard, and type-level changes in the diff are consistent and correct, but the primary implementation surface cannot be verified. Confidence is medium because the visible changes are low-risk and the stated design decisions are sound, but correctness of the new provider code is unverifiable from this diff.
…rd, revert bundle
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
8df15c4a
[Automated Review — pr-reviewer bot] PR #499 adds llama.cpp as the default extraction, embedding, and synthesis provider across CLI, daemon, dashboard, SDK, and docs. After 14 prior review rounds, the major correctness and usability issues have been addressed. No blocking issues found in the current diff.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The CLI, dashboard, and setup-shared changes are fully visible and internally consistent. The daemon-side createLlamaCppProvider, DEFAULT_PIPELINE_V2, fallback chain, and checkEmbeddingProvider changes are described in the PR but not present in the diff — correctness of those paths (e.g. getEmbeddingDimensions mapping for llama-cpp model names, actual /v1/chat/completions request shape) cannot be verified from the current diff alone.
…setup Both llama-cpp and ollama now offer the same model picker (nomic-embed-text, all-minilm, mxbai-embed-large). getEmbeddingDimensions already covers all three model names.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
1c6dd95f
[Automated Review — pr-reviewer bot] PR #499 adds llama.cpp as the default runtime provider across daemon, CLI, and dashboard. After 14 review rounds the implementation is well-shaped; no blocking bugs found. Two warnings are flagged below.
Confidence: Medium [sufficient_diff_evidence, missing_cross_module_context] - The CLI configure/setup paths and dashboard changes are fully visible in the diff and appear correct. The daemon-side provider routing and fallback chain are referenced in the PR description (R10: 7 new tests, 119→126 passing) but the actual test file contents are not in the diff, so correctness of edge cases in nativeFallbackProvider routing cannot be independently verified from this context.
…providers Surfaces unhandled provider at config time rather than silently writing defaults.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
fad49bd6
[Automated Bot Review — pr-reviewer] PR adds llama.cpp as the default extraction and embedding provider with Ollama as a configured fallback. The implementation aligns with the stated goals across daemon, CLI, dashboard, SDK, and docs. One convention violation (as type casts) persists in the added code despite the PR description claiming removal in R12.
Confidence: High [sufficient_diff_evidence] - The residual as casts in queryLlamaCppModels (setup-providers.ts) are directly visible in the added lines of the diff. The PR description states R12 'Removed residual as cast in queryLlamaCppModels — replaced with flatMap narrowing,' but the diff still contains (raw.data as unknown[]), (item as Record<string, unknown>).id, and (item as { id: string }).id.trim(). No cross-module context is needed to verify this discrepancy.
Use explicit type annotation on flatMap callback parameter instead of
raw.data as unknown[]. Single remaining as is on { id?: unknown } which
is a narrowing step, not a bypass.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: claude-sonnet-4-6 | commit:
490d0b31
Harness output could not be parsed as structured JSON.
You're out of extra usage · resets 10pm (America/Denver)
[stderr]
SessionEnd hook [signet hook session-end -H claude-code] failed: /bin/sh: line 1: signet: command not found
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
490d0b31
[Automated Review] I found one non-blocking but important review concern and one metadata note for UI verification.
Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_cross_module_context] - The diff clearly shows a very large replacement of the generated EXTENSION_BUNDLE string in packages/connector-oh-my-pi/src/extension-bundle.ts, but the corresponding source-level change is not present in this PR context. That makes correctness verification of those runtime changes impossible from the reviewed diff alone.
The oh-my-pi extension bundle was inadvertently regenerated with non-deterministic variable renaming during build, producing thousands of lines of opaque diff unrelated to the llama.cpp provider migration. Reverted to match main branch version.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
2589c14d
[Automated Review by pr-reviewer] I found a blocking scope/safety issue: the PR includes a large regenerated runtime bundle that diverges from the stated PR scope and claimed cleanup.
Confidence: Medium [sufficient_diff_evidence, targeted_context_included, missing_cross_module_context] - The diff clearly includes a massive rewrite of packages/connector-opencode/src/plugin-bundle.ts while the PR description centers on llama.cpp provider migration and claims unrelated bundle churn was reverted. Because this is an opaque generated runtime blob, it is not practically reviewable here for behavioral equivalence and introduces avoidable merge risk unless explicitly intended and validated.
Same class of issue as R15 extension-bundle.ts — the opencode plugin-bundle was regenerated with non-deterministic minification during build, producing opaque diff unrelated to the llama.cpp provider migration scope. Restored to main branch version.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
7b49a159
[Automated Review by pr-reviewer] The PR mostly implements the llama.cpp default-provider migration as described, but the current diff still contains a large unrelated generated bundle change that contradicts the PR’s stated scope/review history and should be removed before merge.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The diff includes a massive rewrite of packages/connector-pi/src/extension-bundle.ts while the PR description states unrelated regenerated bundle changes were reverted (R15/R16). This is directly observable in the changed file and is materially outside the described llama.cpp migration scope.
Third bundle file with non-deterministic minification drift during build. Same class of issue as R15 (connector-oh-my-pi) and R16 (connector-opencode). Reverted to main branch version. All bundle files now match main.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
2c6a5415
[Automated Review by pr-reviewer] I found two blocking correctness regressions: llama.cpp native-fallback parsing is incompatible with the OpenAI response shape, and legacy embedding.provider: local compatibility appears broken despite the PR claiming no breaking config changes.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both issues are directly visible in changed code paths. In embedding-fetch.ts, llama.cpp fallback branches parse { embedding: [...] } while the same PR documents/uses OpenAI-compatible /v1/embeddings ({ data: [{ embedding }] }). Also, packages/core/src/types.ts removes local while loadMemoryConfig still blindly casts provider strings; fetchEmbedding has no local branch, creating a concrete migration/runtime mismatch for existing configs.
…vider compat
1. Native->llama.cpp fallback parsed { embedding: [...] } but llama.cpp
/v1/embeddings returns OpenAI shape { data: [{ embedding: [...] }].
Fixed both pre-check and error-path fallbacks to use
data.data?.[0]?.embedding. Updated test mocks.
2. Legacy embeddings.provider: 'local' now maps to 'native' during
config loading for backward compatibility.
Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
72636cae
[Automated Review by pr-reviewer] Found a backward-compatibility regression that conflicts with the PR’s “no breaking changes” claim.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The diff explicitly removes "local" from a public exported union in @signet/core while runtime code still maps legacy "local" values for compatibility. This creates a compile-time API break for TypeScript consumers even though the PR claims existing configs are non-breaking.
… unions R18 added runtime mapping for legacy 'local' -> 'native' but removed 'local' from the exported TypeScript type unions, causing compile errors for TS consumers. Restored 'local' to both PipelineConfig.embedding.provider and AgentConfig.embeddings.provider. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
750ecafb
[Automated Review by pr-reviewer] I found one blocking correctness issue and one non-blocking detection accuracy issue in the llama.cpp fallback flow.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The regression is directly visible in the changed fallback branch in packages/daemon/src/routes/utils.ts: llama.cpp fallback marks status as available but does not set the runtime fallback provider, while the ollama branch does. This creates a concrete mismatch between reported health and actual embedding routing.
…pty model list
1. checkEmbeddingProvider llama.cpp fallback set status.available=true
but didn't call setNativeFallbackProvider('llama-cpp'), so embeddings
could route to a stale ollama fallback. Added the missing call.
2. queryLlamaCppModels() returned available:true with empty model list
when server was reachable but had no models loaded. Now returns
available:false with descriptive error.
Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
03ebbc0e
[Automated Review by pr-reviewer] I found a fallback-behavior regression that can ignore an explicit fallbackProvider: none configuration.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - Both synthesis fallback paths now compute fallback as llama-cpp else ollama, with no branch for none. This is directly visible in the changed lines and conflicts with the stated dynamic fallback behavior based on configured fallback provider.
Synthesis fallback in daemon.ts and summary-worker.ts always resolved to llama-cpp or ollama, ignoring fallbackProvider: 'none'. Now when extractionFallbackProvider is 'none', synthesis fallback becomes null and all fallback assignment sites (opencode, anthropic, openrouter, claude-code, codex) set effectiveSynthesisProvider to 'none' with a log message instead of forcing a local provider. Co-authored-by: Aaf2tbz <averyfelts@aol.com>
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
fcc7308c
[Automated Review by pr-reviewer] The PR largely delivers the described llama.cpp default-provider migration, but I found one blocking fallback-routing bug and one compatibility gap.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The blocking issue is directly visible in the changed fallback logic in routes/utils.ts and is behaviorally provable from the new control flow: native fallback is pinned to llama.cpp based only on /v1/models HTTP 200, without validating embedding readiness, which can suppress a working Ollama fallback path. The compatibility issue is directly visible in cli.ts where legacy provider alias handling diverges from daemon-side mapping.
NicholaiVogel
left a comment
There was a problem hiding this comment.
i like the direction here overall. if we want llama.cpp as the default local runtime, this feels much more in line with the architecture than some of the other provider-side work we’ve been looking at. it’s threaded through the right places, and the general shape makes sense to me.
i do think there’s one thing i’d want fixed before merge though.
right now the native embedding fallback path to llama.cpp is still hardcoded to nomic-embed-text, while the cli/setup flow now lets people configure llama.cpp embeddings with nomic-embed-text, all-minilm, or mxbai-embed-large.
so we end up with a mismatch where:
- setup says those llama.cpp models are supported
- native fallback can decide “llama.cpp is available”
- but the actual fallback request still only tries
nomic-embed-text
that means if someone only has one of the other supported llama.cpp embedding models loaded, the system can look healthy but the native -> llama.cpp fallback path can still fail.
i think we should make that fallback behavior line up with the supported llama.cpp embedding model story before this lands. either:
- track and use a real fallback embedding model for llama.cpp, or
- only treat llama.cpp fallback as available when the specific model the fallback will use is actually present
so overall i’m supportive of this, but i’d like to tighten that fallback path first.
Native→llama.cpp fallback now only declares available when a supported embedding model (nomic-embed-text, all-minilm, mxbai-embed-large) is actually loaded. Health check uses findLlamaCppEmbeddingModel() instead of just probing /v1/models reachability. Fallback state tracks the discovered model alongside the provider.
PR-Reviewer-Ant
left a comment
There was a problem hiding this comment.
Automated review by pr-reviewer | model: gpt-5.3-codex | commit:
32ebe67a
[Automated Review by pr-reviewer] I found a description-to-implementation mismatch that should be clarified before merge.
Confidence: High [sufficient_diff_evidence, targeted_context_included] - The PR description claims llama.cpp becomes the default embedding provider, but the changed setup/configure UI still presents and recommends native first and does not establish llama.cpp as the default path in those flows. This divergence is directly visible in the modified setup/configure/dashboard files.
Summary
Add llama.cpp as a first-class pipeline provider and make it the default fallback for extraction, embedding, and synthesis when the configured primary provider (typically
native) is unavailable. The setup wizard still recommendsnativeas the primary embedding provider; llama.cpp replaces Ollama as the automatic fallback chain for local inference.Scope: Extraction and synthesis pipeline providers + native embedding fallback. The setup/configure UI flow is intentionally unchanged — it still presents
nativeas the recommended primary choice and llama.cpp as the fallback when native is unreachable.llama.cpp exposes an OpenAI-compatible
/v1/chat/completionsendpoint, which is fundamentally different from Ollama's native/api/generateAPI. This required a newcreateLlamaCppProvider()rather than reusing the Ollama provider.Changes
Core (
@signet/core)PipelineProviderChoice: added\"llama-cpp\"with default modelqwen3.5:4bPipelineExtractionConfig.provider: added\"llama-cpp\"to unionPipelineExtractionConfig.fallbackProvider: added\"llama-cpp\"to unionPipelineSynthesisConfig.provider: added\"llama-cpp\"to unionDaemon (
@signet/daemon)createLlamaCppProvider()— new provider using/v1/chat/completions(OpenAI-compatible), no auth required, sendsnum_ctxwhenmaxContextTokensconfiguredDEFAULT_PIPELINE_V2— extraction provider defaults to\"llama-cpp\", model defaults to\"qwen3.5:4b\", fallbackProvider defaults to\"llama-cpp\"\"llama-cpp\"with/v1/embeddingsendpoint (OpenAI-compatible)discoverLlamaCppModels()usingGET /v1/models, known models seeded withqwen3.5:4b,qwen3:8b,llama-3.1-8bextractionFallbackProviderconfig instead of hardcoding\"ollama\"ProviderRuntimeResolution—RuntimeProviderNameandRuntimeSynthesisProviderNametypes include\"llama-cpp\"checkEmbeddingProvider— added llama-cpp health check branch usingGET /v1/modelsCLI (
@signet/cli)EmbeddingProviderChoiceandExtractionProviderChoicetypes updatedDETECTED_EXTRACTION_PROVIDER_ORDER: llama-cpp is first priority\"llama-cpp\"FreshSetupConfigdeduplicated — uses canonicalEmbeddingProviderChoice/ExtractionProviderChoicetypes instead of inline unionspreflightOllamaEmbeddingrenamed topreflightLocalEmbedding(now handles llama-cpp + ollama + native)Dashboard
\"llama-cpp\", model presets seeded withqwen3.5:4b\"llama-cpp\", base URL field shown for llama-cpp, defaulthttp://localhost:8080SDK (
@signet/sdk)EmbeddingStatusResponse.providertype updated to include\"llama-cpp\"Docs
provider: llama-cppandmodel: qwen3.5:4bDesign decisions
/v1/chat/completions(OpenAI-compatible) while Ollama uses/api/generate(proprietary). The APIs are incompatible at the wire level.qwen3.5:4b: Updated fromqwen3:4b— newer model with better reasoning.fallbackProviderconfig value. Existing configs withfallbackProvider: ollamastill fall back to ollama.provider: ollamakeep Ollama. Only the default (no provider set) changes.Type
feat— new user-facing feature (bumps minor)sqmd review rounds
provider: ollama, setup wizard ordered Codex before llama.cpp, extension-bundle.ts had non-deterministic re-minification. Fixed all three.qwen3:4b(updated toqwen3.5:4b),preflightOllamaEmbeddingname implies Ollama-only (renamed topreflightLocalEmbedding),FreshSetupConfigduplicated type unions inline (deduplicated using canonical types from setup-shared.ts). Fixed all three.preflightLocalEmbeddingbody never probed llama.cpp (addedqueryLlamaCppModels()probe),DETECTED_EXTRACTION_PROVIDER_ORDERhad no health check for llama.cpp (addedhasLlamaCppServer()with live probe), configure wizard fell through to OpenAI model prompt for llama-cpp/native (changedelsetoelse if (provider === "openai")),nativeFallbackToOllamaboolean caused incorrect routing after llama.cpp fallback (replaced with discriminatednativeFallbackProvider),DEFAULT_LLAMACPP_BASE_URLused but not imported inroutes/utils.ts(added import). Fixed all five. All 119 daemon tests pass.setNativeFallbackProviderimport in daemon.ts (removed), no explicit llama-cpp handler in setup.ts embedding provider branch (added), preflightLocalEmbedding silently overrides user ollama choice with llama.cpp (added log message), 2s probe timeout stall on setup startup (reduced from 5s to 2s). Fixed all four.preflightLocalEmbeddingsilently overrode user explicit ollama choice with llama.cpp when server detected on port 8080 (removed llama.cpp probe from ollama preflight path — llama-cpp has its own explicit handler), EXTRACTION_SAFETY_TEXT still referencedqwen3:4b(updated toqwen3.5:4b). Fixed both.nomic-embed-text(llama-cpp default) instead ofnomic-embed-text-v1.5(native default used everywhere else). Split the branch so native explicitly sets the correct model.preflightLocalEmbeddingback topreflightOllamaEmbedding(function only handles Ollama after R5). AddedhasLlamaCppServer()probe to llama-cpp embedding branch so users are warned if no server is running.hasLlamaCppServer()probe result to avoid duplicate 2s timeout — was called once for extraction detection and again for embedding health check.resolveSynthesisProviderfallback sentinel change is intentional.astype assertion inqueryLlamaCppModelswith narrowing guard (AGENTS.md convention). Reordered configure embedding choices: native first (zero-dep). Added 7 regression tests fornativeFallbackProviderdiscriminated union routing (119→126 tests passing).ascast in queryLlamaCppModels — replaced with flatMap narrowing.Checklist
Spec alignment validated (
INDEX.md+dependencies.yaml)Agent scoping verified on all new/changed data queries
Input/config validation and bounds checks added
Error handling and fallback paths tested (no silent swallow)
Security checks applied to admin/mutation endpoints
Docs updated for API/spec/status changes
Regression tests added for each bug fix
Lint/typecheck/tests pass locally
R15: Reverted unrelated extension-bundle.ts minification changes. The oh-my-pi bundle was inadvertently regenerated with non-deterministic variable renaming during build, producing opaque diff unrelated to the llama.cpp provider migration scope. Restored to main branch version.
R16: Bot flagged regenerated plugin-bundle.ts (connector-opencode) — same class of issue as R15. Opaque minified blob diverged from main due to build non-determinism, not intentional change. Reverted to main branch version.
R17: Bot flagged regenerated extension-bundle.ts (connector-pi) — same class of issue as R15/R16. Third bundle file with non-deterministic minification drift. Reverted to main branch version. All bundle files now match main.
R18: Two correctness fixes: (1) Native→llama.cpp fallback response parsing used
{ embedding: [...] }but llama.cpp/v1/embeddingsreturns OpenAI shape{ data: [{ embedding: [...] }] }. Fixed both the pre-check fallback and the error-path fallback to usedata.data?.[0]?.embedding. Updated test mocks to match. (2) Legacyembeddings.provider: "local"is now mapped to"native"during config loading for backward compatibility — existing configs withprovider: localwon't break.R19: Bot flagged
"local"removed from@signet/coreexported type unions, causing compile-time break for TS consumers even though runtime mapping was added. Restored"local"to bothPipelineConfig.embedding.providerandAgentConfig.embeddings.providerunions inpackages/core/src/types.ts. Marked as deprecated alias — runtime maps to"native".R20: Two fixes: (1)
checkEmbeddingProviderllama.cpp fallback setstatus.available = truebut didn't callsetNativeFallbackProvider("llama-cpp"), so embeddings could still route to a stale ollama fallback. Added the missing call. (2)queryLlamaCppModels()returnedavailable: truewith empty model list — server reachable but no models loaded. Now returnsavailable: falsewith descriptive error when model list is empty.R21: Bot flagged synthesis fallback paths ignoring
fallbackProvider: "none". Bothdaemon.tsstartup andsummary-worker.tscomputed fallback asllama-cpp else ollamawith nononebranch. Fixed both: whenextractionFallbackProvider === "none", synthesis fallback isnullinstead of forcing a local provider. All synthesis fallback assignments (opencode, anthropic, openrouter, claude-code, codex) now check for null and set"none"with a log message instead of falling back.R22: Human review: native→llama.cpp fallback hardcodes
nomic-embed-textbut setup wizard supportsnomic-embed-text,all-minilm,mxbai-embed-large. Fixed by making fallback model-aware: (1) AddedLLAMACPP_FALLBACK_EMBEDDING_MODELSconstant andfindLlamaCppEmbeddingModel()inembedding-fetch.ts— probes/v1/modelsand returns first match from the supported list. (2) AddednativeFallbackModelstate alongsidenativeFallbackProvider, updatedsetNativeFallbackProvider()to accept optional model parameter. (3) Pre-cached and error-path fallbacks now use the discovered model. (4) Health check inroutes/utils.tsnow usesfindLlamaCppEmbeddingModel()instead of just checking/v1/modelsis reachable — only declares llama.cpp available when a supported model is actually loaded. Removed now-unusedDEFAULT_LLAMACPP_BASE_URLimport fromroutes/utils.ts. Updated test mock for model-discovery-aware fallback path. 11 tests passing.R23: Bot review: PR description overclaimed "Replace Ollama as the default extraction, embedding, and synthesis provider" — setup wizard still recommends
nativeas primary. Clarified PR summary to accurately scope: llama.cpp is the default fallback for extraction/synthesis and native embedding fallback when primary provider is unavailable. Setup/configure UI intentionally unchanged — still presentsnativeas recommended primary. No code changes; description-only fix.