fix(provider-swift): map defective chat-template render to 422, not cascading 500 (#242)#248
Conversation
…ascading 500 (Layr-Labs#242) When a model's chat_template.jinja throws while rendering a request that carries tool definitions (e.g. mlx-community/gemma-4-26b-a4b-it-8bit's `X | upper` on an Undefined value, which CPython jinja2 tolerates but swift-jinja rejects with "upper filter requires string"), MultiModelBatchSchedulerEngine rethrew the raw error. It fell through mapInferenceErrorToStatus to a generic 500, which the coordinator reads as a provider fault and reroutes -- cascading into "model load failed" across every provider that serves the model. Add a typed MultiModelBatchSchedulerEngineError.templateRenderingFailed case and wrap applyChatTemplate failures in both streamChatCompletion and applyTemplate. Map it to 422 (unprocessable) so the request fails cleanly and the provider stays healthy. Add unit + tokenizer-driven tests covering the tool-definition render path. Note: the underlying template defect, R2 re-vend, and coordinator aggregate_sha256 bump are publish/infra steps tracked separately in the issue -- this PR is the provider-side defensive guard.
|
Someone is attempting to deploy a commit to the EigenLabs Team on Vercel. A member of the Team first needs to authorize it. |
hankbobtheresearchoor
left a comment
There was a problem hiding this comment.
Review Summary
Clean, well-scoped PR. The root cause is clear (swift-jinja is stricter than CPython jinja2), the fix is defensive in the right place (engine level, before the error hits the coordinator), and the 422 mapping is semantically correct (the request is unprocessable given this model's template, not a transient provider fault).
Build + test results (macOS, M3 Ultra):
swift build --product darkbloom -c debug— clean (1674 steps)- All 3 new unit tests pass (status mapping, message preservation, applyTemplate integration)
- All 3 existing error-map tests still pass
- Full ProviderCore test suite passes
One inoffensive observation below — nothing blocking.
LGTM 🚢
| messages: messages, tools: toolSpecs, additionalContext: nil | ||
| ) | ||
| } catch { | ||
| } catch let error as MultiModelBatchSchedulerEngineError { |
There was a problem hiding this comment.
🔵 Observation (non-blocking): This catch let error as MultiModelBatchSchedulerEngineError catches every typed engine error, not just template-related ones — queueFull, tokenBudgetExhausted, requestRejected, etc. Today that's harmless because applyChatTemplate won't throw capacity/congestion errors (those come from the scheduler, not the tokenizer). But a future maintainer adding a new typed engine error that could be thrown from tokenization path — e.g. a tokenizer-warming-timeout — would have it silently pass through here and get the wrong status code downstream. Consider narrowing this catch to only .templateRenderingFailed (the one case that can actually originate from the tokenizer), or adding a comment noting the trust boundary between tokenizer errors and scheduler errors.
There was a problem hiding this comment.
Good call — narrowed in ece91e6. The pass-through now matches only .templateRenderingFailed (so its message isn't double-wrapped), and everything else out of the render block is wrapped as .templateRenderingFailed. Added a comment at both call sites documenting the tokenizer↔scheduler trust boundary you flagged, so a future typed error on the tokenizer path can't silently slip out with the wrong status.
…ngFailed (review Layr-Labs#248) Address hankbob's review note on Layr-Labs#248: the broad `catch as MultiModelBatchSchedulerEngineError` passed through every typed engine error, so a future typed error thrown from the tokenizer path could silently slip out with the wrong status. Narrow the pass-through to only .templateRenderingFailed (avoids double-wrapping its message) and wrap everything else from the render block as .templateRenderingFailed. Document the tokenizer<->scheduler trust boundary at both call sites.
What
Fixes the
500 upper filter requires stringthrown byprovider-swiftwhenever a request carrying tool definitions is routed to a model whosechat_template.jinjais not portable to swift-jinja — specificallymlx-community/gemma-4-26b-a4b-it-8bit(#242).Why
The model's
chat_template.jinjarendersX | upperagainst anUndefined/Nonevalue. CPython's jinja2 (used by oMLX and other backends) is permissive and propagatesUndefined/""through the filter, so the template "works" everywhere else. swift-jinja is stricter and raisesupper filter requires string.The render happens inside
MultiModelBatchSchedulerEngine.streamChatCompletion(andapplyTemplate), which rethrew the raw error verbatim. It then fell throughmapInferenceErrorToStatusto a generic 500. The coordinator reads a 500 as a provider fault and reroutes the request — so a deterministic, request-shaped failure turns into a cascadingmodel load failedacross every provider serving the model, impacting all agent harnesses that send tools.What this PR changes (provider-side defensive guard — the "should" item in the issue)
MultiModelBatchSchedulerEngineError.templateRenderingFailed(String).try/catcharoundapplyChatTemplatein bothstreamChatCompletionandapplyTemplate: any non-typed render failure is wrapped into.templateRenderingFailed(the verbatim underlying message is preserved for operator debugging); already-typed engine errors pass through unchanged..templateRenderingFailed → 422inProviderLoop+ErrorMapping. A 422 fails the request cleanly without marking the provider faulty or triggering a reroute.MultiModelBatchSchedulerEngineTests): the 422 mapping, verbatim-message preservation, and a tokenizer-drivenapplyTemplatetest that sends a tool definition and asserts the swift-jinja failure surfaces as.templateRenderingFailed→ 422 (no live model required).Remaining items from the issue (NOT in this PR — they are publish/infra, not repo code)
These require artifacts/access outside this repo and the actual re-vended template:
chat_template.jinja— replaceX | upperwith(X | default('')) | upperthroughout. The template is not in this repo; it ships inside the model snapshot on R2.aggregate_sha256in the coordinator catalog — this lives in themodel_versionsPostgres table (r2_prefix+aggregate_sha256), populated at publish time, not in a static repo file. It must be set to the real digest produced by the re-vend.mlx-community/gemma-4-26b-a4b-it-8bit.Once the patched template is re-vended, the 422 guard here remains valuable as defense-in-depth against any future non-portable template.
Verification note
provider-swiftis a macOS/MLX Swift package and its dependencies (MLXLMServer,mlx-swift-lm) do not build in this Linux CI sandbox, so I could not runswift testhere. The new tests are written against the verified upstream type signatures (ApplyTemplateRequest,OpenAITool,MLXLMCommon.Tokenizer) and mirror existing test patterns; please confirm they pass in the macOS test job.Resolves #242 (provider-side guard).
Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.