This document covers the major breaking upgrade paths.
No public API break, but Android arm64 native packaging defaults changed.
- Shorthand config such as
android-arm64: [vulkan]is still supported. - If no CPU policy is set, Android arm64 now defaults to
cpu_profile: full(all CPU variants). - If you want smaller baseline-only packaging, set
cpu_profile: compactexplicitly. cpu_variants: [...](when provided) overridescpu_profile.
Example (preserve compact baseline-style packaging):
hooks:
user_defines:
llamadart:
llamadart_native_backends:
platforms:
android-arm64:
backends: [vulkan]
cpu_profile: compactThe legacy custom handler/override registry APIs were removed:
ChatTemplateEngine.registerHandler(...)ChatTemplateEngine.unregisterHandler(...)ChatTemplateEngine.clearCustomHandlers(...)ChatTemplateEngine.registerTemplateOverride(...)ChatTemplateEngine.unregisterTemplateOverride(...)ChatTemplateEngine.clearTemplateOverrides(...)
Legacy per-call handler routing fields were also removed:
- render param:
customHandlerId - parse param:
handlerId
Template render/parse paths no longer silently downgrade to content-only fallback when a handler/parser fails. Failures are now surfaced to the caller.
Audit call sites that previously relied on silent fallback behavior and handle exceptions explicitly.
- Old pattern (string-in, string-out helpers):
session.chat(...)session.chatText(...)
- New pattern:
session.create(List<LlamaContentPart> ...)- stream
LlamaCompletionChunk
Example migration:
// Before
await for (final token in session.chat('Hello')) {
stdout.write(token);
}
// After
await for (final chunk in session.create([LlamaTextContent('Hello')])) {
stdout.write(chunk.choices.first.delta.content ?? '');
}LlamaChatMessage.text(...)->LlamaChatMessage.fromText(...)LlamaChatMessage.multimodal(...)->LlamaChatMessage.withContent(...)
Example migration:
// Before
LlamaChatMessage.text(role: LlamaChatRole.user, content: 'Hi');
// After
LlamaChatMessage.fromText(role: LlamaChatRole.user, text: 'Hi');- Removed:
ModelParams(logLevel: ...) - Use engine-level controls instead:
await engine.setDartLogLevel(...)await engine.setNativeLogLevel(...)- or
await engine.setLogLevel(...)to set both
Example migration:
// Before
await engine.loadModel(path, modelParams: ModelParams(logLevel: LlamaLogLevel.info));
// After
await engine.setNativeLogLevel(LlamaLogLevel.info);
await engine.loadModel(path);loadModel(...)now throws if a model is already loaded.- Call
await engine.unloadModel()(ordispose()) before loading another model.
The package root (package:llamadart/llamadart.dart) no longer exports some
previous internals. In particular:
ToolRegistryLlamaTokenizerChatTemplateProcessor
Use LlamaEngine, ChatSession, ToolDefinition, and the template APIs as
the supported surface.
If you maintain your own LlamaBackend implementation, update it to match the
current interface:
- Add
getVramInfo(). - Update
applyChatTemplate(...)signature/return type (string-based prompt rendering input/output).
Template/render/parse behavior is now strict llama.cpp parity:
customTemplateremains supported for per-call template overrides.- Legacy
customHandlerId/parsehandlerIdrouting was removed. ChatTemplateEngine.registerHandler(...)andChatTemplateEngine.registerTemplateOverride(...)were removed.- Render/parse paths no longer silently downgrade to content-only fallback when a handler/parser fails; failures are surfaced to the caller.
- Replace old
ChatSessionchat helpers withcreate(...)streaming. - Rename
LlamaChatMessagenamed constructors. - Remove
ModelParams.logLevelusage. - Audit imports that depended on removed root exports.
- For custom backends, implement the latest
LlamaBackendinterface.