Skip to content

fix: keep MCP server alive when optional semantic dependency is missing (0.28.1)#524

Merged
mohanagy merged 4 commits into
mainfrom
fix/semantic-optional-dep-resilience
Jun 10, 2026
Merged

fix: keep MCP server alive when optional semantic dependency is missing (0.28.1)#524
mohanagy merged 4 commits into
mainfrom
fix/semantic-optional-dep-resilience

Conversation

@mohanagy

@mohanagy mohanagy commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Root cause

The retrieve tool's semantic/rerank path returned retrieval.then(...) with no .catch (src/runtime/stdio/tools.ts). When the optional @huggingface/transformers peer dependency is not installed — which is the default state of every npx-launched or globally installed madar, since npm never auto-installs optional peers — loadPipeline rejected, the rejection sailed past handleStdioRequest's synchronous try/catch (the promise is returned, not awaited there) and hit the unprotected await in serveGraphStdio's request loop, killing the entire MCP server process mid-call. Agent clients (Claude Code) rendered the dead server as an infinite spinner.

Compounding gaps: the tool schema advertised rerank/semantic unconditionally, so agents would occasionally pass rerank: true on machines that cannot serve it; a project-local npm install @huggingface/transformers was invisible to npx-launched servers because bare import resolution starts at the npx cache, not the project; and a failed load stayed cached in pipelineCache, so even a corrected environment kept replaying the original rejection.

Behavior before / after

Scenario Before After
retrieve with rerank: true, package missing Unhandled rejection kills the server; client spins forever JSON-RPC result with isError: true and a readable install hint; server keeps serving the same connection
Any handler rejection in the stdio loop Process exit JSON-RPC error response, loop continues
npm install @huggingface/transformers in the project No effect for npx/global installs Resolved from the project root (graph dir); rerank works
Retry after installing the package Replays the cached rejection until restart Failed loads are evicted; retry succeeds in-place
tools/list on a machine without the package Advertises semantic/rerank params anyway Omits semantic, semantic_model, rerank, rerank_model
Stalled first-use model download Blocks the serial request loop forever Bounded by MADAR_MODEL_LOAD_TIMEOUT_MS (default 120s)
madar doctor No mention of semantic capability Reports semantic/rerank: available/unavailable (...); never affects healthy

Verification

Clean install without @huggingface/transformers (replayed against the built server, repo has the package absent):

  • retrieve without semantic flags → normal result payload.
  • retrieve with rerank: true followed by a second plain retrieve on the same stdio connection → first returns isError: true with the install hint, second returns a normal payload, process stays alive. Before the fix this exact sequence terminated the server after the first request (reproduced on macOS and Windows with @lubab/madar@0.27.7).

Project-local install (stub @huggingface/transformers in <project>/node_modules):

  • tools/call retrieve with rerank: true through the stdio handler resolves the project-local package via the graph-path-derived root and returns reranked results — the npx-cache execution scenario.
  • Direct unit coverage for projectRoot resolution and for cache eviction (fail without the package, install it, retry succeeds without restart).
  • tools/list includes the semantic fields with the package present and omits them without it.

Tests / typecheck:

  • New tests/unit/stdio-semantic-resilience.test.ts (8 tests) + updated tests/unit/stdio-semantic.test.ts schema tests.
  • Full suite: 189 files, 2548 passed, 1 skipped, 0 failures.
  • tsc --noEmit: clean.

Scope

Hotfix scoped to semantic/rerank optional-dependency resilience only. Includes the 0.28.1 version bump, package-lock update, and changelog entry. No README claim or benchmark doc changes.

Summary by CodeRabbit

  • Bug Fixes

    • Server no longer crashes when the optional semantic/rerank runtime is missing
    • Failed semantic model loads no longer poison the pipeline cache
    • Retrieve errors are returned cleanly without terminating the server loop
  • New Features

    • Retrieve tool schema hides semantic/rerank fields when capability is unavailable
    • Project-local installs enable semantic/rerank resolution (supports npx/global scenarios)
    • madar doctor reports semantic/rerank availability
    • Bounded semantic model load timeout (default 120s)
  • Documentation

    • Release notes, README and registry updated for v0.28.1

mohanagy added 2 commits June 10, 2026 20:10
A retrieve call with rerank/semantic enabled on a machine without the
optional @huggingface/transformers package rejected without a handler,
killing the whole stdio MCP server mid-call — agents saw an infinite
spinner instead of an error.

- catch retrieve rejections and return an MCP isError tool result so
  agents can read the failure and retry without rerank
- wrap the stdio serve loop in try/catch so no handler rejection can
  tear down the server
- resolve @huggingface/transformers from the project root (graph dir)
  as a fallback, so a project-local install works under npx-launched
  or globally installed madar
- evict failed pipeline loads from pipelineCache so a retry after
  installing the package succeeds instead of replaying the rejection
- add a load timeout (MADAR_MODEL_LOAD_TIMEOUT_MS, default 120s) so a
  stalled model download cannot block the serial request loop forever
- hide semantic/rerank fields from the retrieve tool schema when the
  package is not resolvable, so agents never request the capability
- report semantic/rerank availability in madar doctor without
  affecting overall health
- update the install hint to instructions that work for npx installs
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a7284926-02b8-4dee-8baa-5391bcd7d4ef

📥 Commits

Reviewing files that changed from the base of the PR and between 652a6ea and 7f70382.

📒 Files selected for processing (1)
  • src/runtime/semantic.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/runtime/semantic.ts

📝 Walkthrough

Walkthrough

This PR makes semantic and reranking capabilities optional and resilient by resolving @huggingface/transformers from project-local installations, implementing timeout-bounded pipeline loading with failure recovery, conditionally gating MCP tool schemas based on availability, hardening the stdio server against missing dependencies, and reporting semantic status in doctor diagnostics.

Changes

Optional Semantic/Rerank Feature

Layer / File(s) Summary
Semantic runtime with project-root resolution and timeout
src/runtime/semantic.ts, src/runtime/retrieve.ts
Core semantic pipeline loader now resolves transformers from either madar's dependencies or project-local installs under projectRoot, caches in-flight promises keyed by task/model/root, wraps loads with MADAR_MODEL_LOAD_TIMEOUT_MS (default 120s), removes failed cache entries, and improves missing-dependency detection via isSemanticRuntimeAvailable.
MCP tool schema conditional on semantic availability
src/runtime/stdio/definitions.ts, src/runtime/stdio-server.ts
activeMcpTools accepts ActiveMcpToolsOptions to conditionally filter semantic/rerank fields from the retrieve tool schema; tools/list detects semantic availability and gates advertised fields based on whether transformers can be resolved.
Error handling and tool result formatting
src/runtime/stdio/tools.ts, src/runtime/stdio-server.ts
Adds errorToolResult helper for standardized MCP error payloads, passes projectRoot through retrieve invocation, catches retrieve rejections to prevent unhandled crashes, and wraps stdio request loop in try/catch to convert handler failures into JSON-RPC server error responses.
Doctor reporting of semantic availability
src/infrastructure/doctor.ts
buildDoctorReport detects semantic availability via isSemanticRuntimeAvailable and includes semantic field (informational, not affecting healthy status); runDoctorCommand outputs semantic availability line.
Comprehensive test coverage
tests/unit/stdio-semantic-resilience.test.ts, tests/unit/stdio-semantic.test.ts
New resilience suite tests missing transformers handling, error recovery across requests, project-local installs, cache non-poisoning, schema gating, and doctor reporting; existing tests updated to cover both available and unavailable scenarios.
Version and changelog
package.json, CHANGELOG.md, README.md, docs/mcp-registry/server.json
Bumps version to 0.28.1 and documents fixes for missing dependency crashes, project-local resolution, cache poisoning prevention, schema gating, doctor reporting, and model load timeout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • mohanagy/madar#506: Related changes to README/version assertions in tests/unit/why-madar-doc.test.ts.
  • mohanagy/madar#438: Registry/manifest/version alignment changes that interact with the bumped package and registry JSON.

Poem

🐰 A transformers package, once required outright,
Now hops to the project when installed just right,
With timeouts and caches we load with care,
Errors turned gentle, the stdio stays fair,
Schema gates open only when it's there.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main fix: keeping the MCP server alive when the optional semantic dependency is missing, with the version bump (0.28.1) clearly noted.
Description check ✅ Passed The description provides comprehensive context: root cause analysis, detailed before/after behavior table, verification steps with testing results, scope clarification, and all required testing checklist items are addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/semantic-optional-dep-resilience

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/stdio/tools.ts (1)

1333-1356: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Incorrect project root resolution prevents semantic runtime from finding project-local packages.

Line 1346 computes projectRoot as dirname(resolve(graphPath)), which resolves to the out directory (e.g., /project/out), not the project root where project-local node_modules/@huggingface/transformers would be installed.

This inconsistency with the pr_impact tool (lines 1250-1251) breaks project-local semantic package resolution:

// pr_impact (correct):
const graphDir = dirname(validateGraphPath(graphPath))
const projectRoot = dirname(graphDir)

// retrieve (incorrect):
projectRoot: dirname(resolve(graphPath))  // yields out/, not project root
🔧 Proposed fix
       const retrieveLevelTyped = retrieveLevelOverride === null ? null : (retrieveLevelOverride as 0 | 1 | 2 | 3 | 4 | 5)
+      const projectRoot = dirname(dirname(resolve(graphPath)))
       const retrieval = retrieveSemantic || retrieveRerank ? retrieveContextAsync(graph, {
         question,
         budget: retrieveBudget,
         ...(strictBenchmarkOverrides ? { taskKind: strictBenchmarkOverrides.taskKind } : {}),
         ...(strictBenchmarkOverrides ? { runtimeProofProfile: strictBenchmarkOverrides.runtimeProofProfile } : {}),
         ...(retrieveCommunity !== null ? { community: retrieveCommunity } : {}),
         ...(retrieveFileType ? { fileType: retrieveFileType } : {}),
         ...(retrieveSemantic ? { semantic: true } : {}),
         ...(retrieveSemanticModel ? { semanticModel: retrieveSemanticModel } : {}),
         ...(retrieveRerank ? { rerank: true } : {}),
         ...(retrieveRerankModel ? { rerankerModel: retrieveRerankModel } : {}),
         ...(retrieveLevelTyped !== null ? { retrievalLevel: retrieveLevelTyped } : {}),
         ...(effectiveRetrieveStrategy ? { retrievalStrategy: effectiveRetrieveStrategy } : {}),
-        projectRoot: dirname(resolve(graphPath)),
+        projectRoot,
       }) : Promise.resolve(retrieveContext(graph, {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 1333 - 1356, The projectRoot is
computed incorrectly for semantic retrieval (currently using
dirname(resolve(graphPath))), causing resolution to point at the build/output
folder; update the projectRoot computation used in the
retrieveContextAsync/retrieveContext calls to mirror pr_impact: first derive
graphDir from validateGraphPath(graphPath) (via
dirname(validateGraphPath(graphPath))) and then set projectRoot to
dirname(graphDir), and use that projectRoot in both the retrieveContextAsync and
retrieveContext call sites so project-local node_modules (e.g.,
`@huggingface/transformers`) can be resolved correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/semantic.ts`:
- Around line 115-125: The cache key in loadPipeline currently omits projectRoot
causing cross-project reuse; update the cacheKey computation in loadPipeline to
incorporate a normalized project root (e.g., use projectRoot ?? process.cwd()
and normalize/resolve it) so that pipelineCache keys are unique per project;
ensure you reference the same normalized root when checking and storing in
pipelineCache and keep importTransformersModule(projectRoot ?? process.cwd())
unchanged.

In `@src/runtime/stdio-server.ts`:
- Around line 639-643: The code calls
isSemanticRuntimeAvailable(dirname(graphPath)) which uses the compiled/out
directory rather than the project root; instead compute the project root from
the validated graph path and pass that to isSemanticRuntimeAvailable (use
validateGraphPath(graphPath) then take dirname twice to get project root), so
replace the dirname(graphPath) argument with the derived projectRoot; keep the
existing variables activeMcpTools(profile, { semanticAvailable }) and function
isSemanticRuntimeAvailable to locate the check.

---

Outside diff comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 1333-1356: The projectRoot is computed incorrectly for semantic
retrieval (currently using dirname(resolve(graphPath))), causing resolution to
point at the build/output folder; update the projectRoot computation used in the
retrieveContextAsync/retrieveContext calls to mirror pr_impact: first derive
graphDir from validateGraphPath(graphPath) (via
dirname(validateGraphPath(graphPath))) and then set projectRoot to
dirname(graphDir), and use that projectRoot in both the retrieveContextAsync and
retrieveContext call sites so project-local node_modules (e.g.,
`@huggingface/transformers`) can be resolved correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 92c3a2d2-ee66-49a8-9067-9833a1db1884

📥 Commits

Reviewing files that changed from the base of the PR and between 8d6e607 and da857a4.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (10)
  • CHANGELOG.md
  • package.json
  • src/infrastructure/doctor.ts
  • src/runtime/retrieve.ts
  • src/runtime/semantic.ts
  • src/runtime/stdio-server.ts
  • src/runtime/stdio/definitions.ts
  • src/runtime/stdio/tools.ts
  • tests/unit/stdio-semantic-resilience.test.ts
  • tests/unit/stdio-semantic.test.ts

Comment thread src/runtime/semantic.ts
Comment thread src/runtime/stdio-server.ts
mohanagy added 2 commits June 10, 2026 20:33
- bump docs/mcp-registry/server.json manifest and package entry versions
- point the README What's New section at the 0.28.1 changelog entry and
  note the hotfix; 0.28.0 benchmark claims unchanged
- derive the README version assertion in why-madar-doc.test.ts from
  package.json so version bumps cannot break it again
Addresses CodeRabbit review: without the root in the key, a pipeline
resolved for one project root could be reused for a different root in
the same process, bypassing per-project resolution.
@mohanagy mohanagy merged commit 4e3897d into main Jun 10, 2026
7 checks passed
@mohanagy mohanagy deleted the fix/semantic-optional-dep-resilience branch June 10, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant