Skip to content

fix(provider-swift): map defective chat-template render to 422, not cascading 500 (#242)#248

Open
shwniscool wants to merge 2 commits into
Layr-Labs:masterfrom
shwniscool:fix/242-swift-jinja-template-render-4xx
Open

fix(provider-swift): map defective chat-template render to 422, not cascading 500 (#242)#248
shwniscool wants to merge 2 commits into
Layr-Labs:masterfrom
shwniscool:fix/242-swift-jinja-template-render-4xx

Conversation

@shwniscool
Copy link
Copy Markdown

@shwniscool shwniscool commented May 29, 2026

What

Fixes the 500 upper filter requires string thrown by provider-swift whenever a request carrying tool definitions is routed to a model whose chat_template.jinja is not portable to swift-jinja — specifically mlx-community/gemma-4-26b-a4b-it-8bit (#242).

Why

The model's chat_template.jinja renders X | upper against an Undefined/None value. CPython's jinja2 (used by oMLX and other backends) is permissive and propagates Undefined/"" through the filter, so the template "works" everywhere else. swift-jinja is stricter and raises upper filter requires string.

The render happens inside MultiModelBatchSchedulerEngine.streamChatCompletion (and applyTemplate), which rethrew the raw error verbatim. It then fell through mapInferenceErrorToStatus to a generic 500. The coordinator reads a 500 as a provider fault and reroutes the request — so a deterministic, request-shaped failure turns into a cascading model load failed across every provider serving the model, impacting all agent harnesses that send tools.

What this PR changes (provider-side defensive guard — the "should" item in the issue)

  • New typed error MultiModelBatchSchedulerEngineError.templateRenderingFailed(String).
  • Defensive try/catch around applyChatTemplate in both streamChatCompletion and applyTemplate: any non-typed render failure is wrapped into .templateRenderingFailed (the verbatim underlying message is preserved for operator debugging); already-typed engine errors pass through unchanged.
  • Status mapping .templateRenderingFailed → 422 in ProviderLoop+ErrorMapping. A 422 fails the request cleanly without marking the provider faulty or triggering a reroute.
  • Tests (MultiModelBatchSchedulerEngineTests): the 422 mapping, verbatim-message preservation, and a tokenizer-driven applyTemplate test that sends a tool definition and asserts the swift-jinja failure surfaces as .templateRenderingFailed → 422 (no live model required).
  • CHANGELOG entry under Provider (Swift) → Bug Fixes.

Remaining items from the issue (NOT in this PR — they are publish/infra, not repo code)

These require artifacts/access outside this repo and the actual re-vended template:

  • Patch chat_template.jinja — replace X | upper with (X | default('')) | upper throughout. The template is not in this repo; it ships inside the model snapshot on R2.
  • Re-vend the patched template from a new R2 prefix.
  • Bump aggregate_sha256 in the coordinator catalog — this lives in the model_versions Postgres table (r2_prefix + aggregate_sha256), populated at publish time, not in a static repo file. It must be set to the real digest produced by the re-vend.
  • (Optional) Upstream the template patch to mlx-community/gemma-4-26b-a4b-it-8bit.

Once the patched template is re-vended, the 422 guard here remains valuable as defense-in-depth against any future non-portable template.

Verification note

provider-swift is a macOS/MLX Swift package and its dependencies (MLXLMServer, mlx-swift-lm) do not build in this Linux CI sandbox, so I could not run swift test here. The new tests are written against the verified upstream type signatures (ApplyTemplateRequest, OpenAITool, MLXLMCommon.Tokenizer) and mirror existing test patterns; please confirm they pass in the macOS test job.

Resolves #242 (provider-side guard).


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

…ascading 500 (Layr-Labs#242)

When a model's chat_template.jinja throws while rendering a request that
carries tool definitions (e.g. mlx-community/gemma-4-26b-a4b-it-8bit's
`X | upper` on an Undefined value, which CPython jinja2 tolerates but
swift-jinja rejects with "upper filter requires string"),
MultiModelBatchSchedulerEngine rethrew the raw error. It fell through
mapInferenceErrorToStatus to a generic 500, which the coordinator reads
as a provider fault and reroutes -- cascading into "model load failed"
across every provider that serves the model.

Add a typed MultiModelBatchSchedulerEngineError.templateRenderingFailed
case and wrap applyChatTemplate failures in both streamChatCompletion
and applyTemplate. Map it to 422 (unprocessable) so the request fails
cleanly and the provider stays healthy. Add unit + tokenizer-driven
tests covering the tool-definition render path.

Note: the underlying template defect, R2 re-vend, and coordinator
aggregate_sha256 bump are publish/infra steps tracked separately in the
issue -- this PR is the provider-side defensive guard.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

Someone is attempting to deploy a commit to the EigenLabs Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown
Contributor

@hankbobtheresearchoor hankbobtheresearchoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Clean, well-scoped PR. The root cause is clear (swift-jinja is stricter than CPython jinja2), the fix is defensive in the right place (engine level, before the error hits the coordinator), and the 422 mapping is semantically correct (the request is unprocessable given this model's template, not a transient provider fault).

Build + test results (macOS, M3 Ultra):

  • swift build --product darkbloom -c debug — clean (1674 steps)
  • All 3 new unit tests pass (status mapping, message preservation, applyTemplate integration)
  • All 3 existing error-map tests still pass
  • Full ProviderCore test suite passes

One inoffensive observation below — nothing blocking.

LGTM 🚢

messages: messages, tools: toolSpecs, additionalContext: nil
)
} catch {
} catch let error as MultiModelBatchSchedulerEngineError {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Observation (non-blocking): This catch let error as MultiModelBatchSchedulerEngineError catches every typed engine error, not just template-related ones — queueFull, tokenBudgetExhausted, requestRejected, etc. Today that's harmless because applyChatTemplate won't throw capacity/congestion errors (those come from the scheduler, not the tokenizer). But a future maintainer adding a new typed engine error that could be thrown from tokenization path — e.g. a tokenizer-warming-timeout — would have it silently pass through here and get the wrong status code downstream. Consider narrowing this catch to only .templateRenderingFailed (the one case that can actually originate from the tokenizer), or adding a comment noting the trust boundary between tokenizer errors and scheduler errors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — narrowed in ece91e6. The pass-through now matches only .templateRenderingFailed (so its message isn't double-wrapped), and everything else out of the render block is wrapped as .templateRenderingFailed. Added a comment at both call sites documenting the tokenizer↔scheduler trust boundary you flagged, so a future typed error on the tokenizer path can't silently slip out with the wrong status.

…ngFailed (review Layr-Labs#248)

Address hankbob's review note on Layr-Labs#248: the broad
`catch as MultiModelBatchSchedulerEngineError` passed through every typed
engine error, so a future typed error thrown from the tokenizer path could
silently slip out with the wrong status. Narrow the pass-through to only
.templateRenderingFailed (avoids double-wrapping its message) and wrap
everything else from the render block as .templateRenderingFailed. Document
the tokenizer<->scheduler trust boundary at both call sites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] mlx-community/gemma-4-26b-a4b-it-8bit chat_template is not portable to swift-jinja

2 participants