Releases: Flemma-Dev/flemma.nvim
v0.10.0
The gist…
Flemma v0.10.0 opens the door to external tooling. MCP support via MCPorter lets you connect any Model Context Protocol server — Linear, Slack, GitHub, databases — and use their tools natively inside your .chat buffers. The read tool now handles binary files: hand the model an image or PDF and it sends it natively to providers that support multimodal input (Anthropic, OpenAI, Vertex), with text fallbacks for the rest. Three features that were previously experimental have graduated to stable — LSP hover/go-to-definition, and the find, grep, and ls exploration tools are now enabled by default. On the reliability side, a new tool output overflow system catches runaway output before it floods the context window, saving the full content to a temp file the model can read on demand. Smaller additions: @~/path file references for home-directory paths, a User-Agent header on all API requests for debugging, and an auto-refreshing status buffer.
New Feature: MCP Support via MCPorter
Flemma now supports the Model Context Protocol through MCPorter, a standalone CLI that handles server discovery, OAuth, and transport. Install MCPorter, configure your servers (or let it auto-import from Claude Code, Cursor, and VS Code), then enable in Flemma:
require("flemma").setup({
tools = {
mcporter = {
enabled = true,
include = { "linear:*", "slack:*" },
},
},
})Flemma discovers servers at startup, fetches their tool schemas with concurrency-controlled fanout, and registers each as a native tool definition — the model sees them alongside bash, read, and your other tools. Include/exclude glob patterns control which tools are enabled. The status buffer shows discovery progress and auto-refreshes when loading completes.
See docs/mcp.md for setup, configuration, and troubleshooting.
New Feature: Binary Content in Tool Results
The read tool now detects binary files — images, PDFs, and other non-text content — and sends them natively to the model instead of dumping raw bytes. Providers that support multimodal input (Anthropic, OpenAI Responses, Vertex) send images as image content blocks and PDFs as document blocks. Providers that don't (OpenAI Chat, Moonshot) fall back to a text placeholder with a diagnostic warning.
This means you "tell" an LLM to "see" a screenshot or a PDF in your message (./diagram.png, ~/Documents/spec.pdf, no @ needed) and the model will actually attempt to read it, without you needing to convert or encode anything.
New Feature: LSP and Exploration Tools Graduate to Stable
The in-process LSP server and the three exploration tools (find, grep, ls) have graduated from experimental.
- LSP is now configured via top-level
lsp = { enabled = true }(previouslyexperimental = { lsp = true }). - find, grep, ls are enabled by default — no configuration needed.
Breaking change: The experimental config section is now empty and strict. Passing any key to it (e.g., experimental = { lsp = true }) will produce a validation error. Move LSP config to the top-level lsp key.
Polish and Bug Fixes
Tool output overflow handling. When bash or MCP tool results exceed 2000 lines or 50KB, the full output is now saved to a temp file and the model receives truncated content with a pointer to the full file. This prevents runaway commands from flooding the context window. The overflow path format is configurable via tools.truncate.output_path_format.
@~/path file references. You can now use @~/Documents/notes.txt alongside the existing @./ and @../ syntax. The tilde is expanded at evaluation time, keeping .chat files portable across machines.
User-Agent header. All API requests now include User-Agent: flemma.nvim/X.Y.Z Neovim/A.B.C, useful for debugging request issues with providers.
Tool name encoding. Internal tool names now use colon as separator (e.g., mcporter:slack:channels_list), encoded to double underscore on the wire for LLM API compatibility.
Status buffer auto-refresh. The status buffer now auto-refreshes when async tool sources (like MCPorter) finish loading, replacing "loading" with a "finished" confirmation.
Tool preview preserved during execution. The virtual line preview (e.g., bash: print Hello — $ sleep 5 && echo Hello) now stays visible while a tool is executing, not just while pending approval.
Autopilot throttled tool fix. Fixed autopilot skipping auto-approved tools when non-auto-approved tools coexisted in the same response.
Vertex AI HTTP 417 fix. Suppressed cURL's default Expect: 100-continue header, which was causing HTTP 417 errors on Vertex AI.
Minor Changes
- 72eeb7a: Added binary content support in tool results. The read tool now detects binary files (images, PDFs) and emits file references instead of raw bytes. Providers that support mixed content (Anthropic, OpenAI Responses, Vertex) send images and PDFs natively; providers that don't (OpenAI Chat, Moonshot) fall back to text placeholders with a diagnostic warning.
- 65f80df: Added mcporter tool integration: dynamically discovers MCP servers and registers their tools as Flemma tool definitions. Configure via
tools.mcporterwith include/exclude glob patterns. Disabled by default. - 5ddd354: Added
mime.detect(filepath)as the single public entry point for MIME detection — tries extension-based lookup first, falls back to thefilecommand. Addedmime.is_binary(mime_type)for classifying MIME types as binary vs textual. The previousget_mime_type()andget_mime_by_extension()methods are now internal. - f921664: Promoted LSP and exploration tools (find, grep, ls) out of experimental. LSP is now configured via
lsp = { enabled = true }(top-level). The three exploration tools are enabled by default. Theexperimentalconfig section is now empty and strict — any keys passed to it will produce a validation error. - 5ddd354: Added
@~/pathfile reference syntax for home-directory relative paths, alongside the existing@./and@../. The~is expanded at evaluation time, keeping.chatfiles portable across machines. - ad7227e: Use colon as internal tool name separator with wire encoding to double underscore for LLM APIs
- e3f6e0e: Added shared tool output overflow handling: when bash or MCP tool results exceed 2000 lines or 50KB, the full output is saved to a configurable temp file and the model receives truncated content with instructions to read the full output. The overflow path format is configurable via
tools.truncate.output_path_format. - 1e20943: Added
User-Agent: flemma.nvim/X.Y.Z Neovim/A.B.Cheader to all API requests, backed by a version module that is automatically kept in sync with releases via CI
Patch Changes
- e86eafe: Fixed autopilot skipping throttled auto-approved tools when pending (non-auto-approved) tools coexist in the same response
- 2cdde26: Fixed HTTP 417 errors from Vertex AI caused by cURL's default
Expect: 100-continueheader - 0de4dd0: Status buffer now auto-refreshes when async tool sources finish loading, replacing the "loading" indicator with a "finished" confirmation
- e698820: Fixed tool preview disappearing during execution. The virtual line preview (e.g.,
bash: print Hello — $ sleep 5 && echo Hello) now remains visible while a tool is executing, not just while pending approval.
v0.9.0
The gist…
Flemma v0.9.0 brings conversation structure to the surface. Turn indicators draw box-drawing arcs in the gutter marking where each request/response cycle begins and ends, with distinct styles for complete turns, mid-tool-call turns, and active streaming — the kind of thing you didn't know you needed until a 200-message conversation suddenly becomes navigable. A new Moonshot AI (Kimi) provider gives you access to the Kimi K2.5 model family with thinking, tool calling, and 256K context, built atop a reusable Chat Completions base class that makes adding future OpenAI-compatible providers dramatically easier. Presets have been unified into a single system that can switch provider, model, parameters, and tool approval in one command — :Flemma switch $explore can now mean "use GPT-4o-mini with full tool access." On the day-to-day side: <Space> now toggles the entire message fold (not individual sub-folds), the modeline parser handles quoted values and comma-separated lists, templates gain os.date()/os.time() and a proper print() function, and temperature is no longer forced to 0.7 — reasoning models that reject explicit temperature finally just work.
🧩 New Feature: Turn Indicators
Turn indicators draw visual boundaries in the gutter showing where each conversation turn begins and ends. A "turn" is one complete request/response cycle — starting from your @You: message, through any tool use exchanges, to the final @Assistant: response.
Three visual states communicate turn progress at a glance:
- Complete turns (
╭│╰) — rounded arcs for a finished exchange - Incomplete turns (
╭┊└) — dotted lines when the assistant is mid-tool-call, waiting for results - Streaming — the indicator extends in real-time as the response arrives
When the ruler is enabled and right padding is configured, the top arc connects seamlessly to the ruler line for a polished visual join.
Configure via turns.enabled, turns.padding (an integer or { left, right } table), and turns.hl (highlight group, defaults to FlemmaTurn → FlemmaRuler).
Breaking change: The signs feature has been removed. Replace any existing signs configuration with the new turns config.
See docs/ui.md for configuration and highlight customization.
🌙 New Feature: Moonshot AI (Kimi) Provider
Flemma now supports Moonshot AI as a first-class provider, giving access to the Kimi model family — including the flagship kimi-k2.5 with optional thinking, tool calling, multimodal input, and 256K context. Set provider = "moonshot" and export MOONSHOT_API_KEY and you're running.
Models range from kimi-k2.5 (thinking-optional, multimodal) to dedicated reasoning models (kimi-k2-thinking, kimi-k2-thinking-turbo) and legacy moonshot-v1-* endpoints. All support tool calling; K2.5 and newer models include prompt caching with no separate write fee.
Under the hood, Moonshot ships on a new Chat Completions base class that implements the OpenAI-compatible wire format as a reusable layer. Future providers that speak Chat Completions (Groq, DeepSeek, Ollama, and others) can now be added with roughly a third of the previous boilerplate.
See docs/configuration.md for provider setup and model options.
🔄 New Feature: Unified Presets
Provider presets and tool approval presets were previously two separate systems. They're now a single top-level presets table where each preset can carry provider, model, parameters, and auto_approve — enabling composite presets that switch everything in one :Flemma switch call:
presets = {
["$explore"] = {
provider = "openai",
model = "gpt-4o-mini",
auto_approve = { "read", "write", "edit", "bash", "find", "grep" },
},
}Two built-in presets ship: $standard (approves read, write, edit, find, grep, ls) and $readonly (approves read, find, grep, ls).
Breaking change: config.tools.presets has moved to top-level presets. The built-in $default has been renamed to $standard.
See docs/configuration.md for preset formats and docs/tools.md for approval mechanics.
Polish and Bug Fixes
Temperature is now optional. Flemma previously sent temperature: 0.7 on every request, which caused reasoning-native models (gpt-5-mini, o-series) to reject requests outright. Temperature is now omitted unless explicitly set, letting each API use its own default. If you relied on the implicit 0.7, add temperature = 0.7 to your setup config.
<Space> now toggles the entire message fold instead of the fold under the cursor — nested folds (thinking, tool use) close along the way so the message reopens cleanly. Use za for previous per-fold behavior. The modeline parser gained quote-aware tokenization with type coercion, escape sequences, and comma-separated lists, making :Flemma switch arguments more expressive (e.g., tags=python,rust,"C++"). Templates can now use os.date(), os.time(), and friends in expressions, and {% print("text") %} emits directly into template output instead of stdout. Template expressions also no longer break on }} inside Lua string literals and comments.
On the provider side: Anthropic PDF blocks now include the document title so Claude can see filenames, and a content block reordering fix resolves API rejections when text appeared after tool_use blocks. A crash when provider requests completed while the command-line window (q:) was open has been fixed, along with :Flemma switch ignoring key= syntax for clearing parameters.
Model definitions have been updated with gpt-5.4-mini, gpt-5.4-nano, and gpt-5.4-2026-03-05; context windows for claude-opus-4-6 and claude-sonnet-4-6 now reflect 1M; o4-mini cache pricing has been corrected; and several retired models have been removed. Internally, the monolithic models.lua has been split into per-provider modules, and pricing.high_cost_threshold is now configurable (default $30/M output).
Minor Changes
-
1d9b496: Auto-generate EmmyLua config types from the schema DSL via
make types -
568f684: Added Moonshot AI (Kimi) provider with support for kimi-k2.5 thinking, tool calling, and all Kimi/Moonshot models. Introduced a reusable Chat Completions base class (openai_chat.lua) for OpenAI-compatible APIs.
-
f4714f9: Temperature is now optional with no default. Previously Flemma always sent
temperature: 0.7to provider APIs, which caused reasoning-native models (gpt-5-mini, o-series) to reject requests entirely. Temperature is now omitted unless explicitly set by the user, letting each API use its own default (typically 1.0).If you previously relied on the implicit 0.7 default for less random responses, add
temperature = 0.7to your setup config or chat frontmatter.Note: temperature is no longer silently stripped when set alongside reasoning/thinking. If you explicitly set both, the API will reject the request — correct this by removing the temperature setting.
-
c5aac07: Split monolithic models.lua into per-provider data modules under lua/flemma/models/, allowing providers to declare their own model data via metadata.models. Added pricing.high_cost_threshold config option (default 30) replacing the hardcoded constant.
-
3aa501b: Removed the signs feature and replaced it with a
turnsconfig schema (turns.enabled,turns.padding,turns.hl) and aFlemmaTurnhighlight group linked toFlemmaRuler. -
2bb0d2a: Expose
os.date,os.time,os.clock, andos.difftimein the template sandbox, enabling date/time formatting in expressions (e.g.,{{ os.date("%B %d, %Y") }}). Dangerousos.*functions (execute,exit,getenv,remove, etc.) remain excluded. -
6278037: Extended the modeline parser with quote-aware tokenization, type coercion for positional arguments, single and double quote support with backslash escaping, comma-separated list values, and empty value handling (
key=→ nil,key=""→ empty string). -
0371511:
<Space>now toggles the entire message fold instead of the fold under the cursor. Nested folds (thinking, tool use/result) are closed along the way so the message reopens cleanly. Frontmatter folds are also toggled when the cursor is outside any message. Usezafor the previous per-fold toggle behavior. -
d7cea2e: Added turn detection and statuscolumn rendering module for visual turn boundaries in the gutter
-
fcf28d7: Template expressions now handle
}}and%}inside Lua string literals, comments, and table constructors without breaking. Previously,{{ "email={{ customer.email }}" }}would crash because the parser matched the first}}it found regardless of context. -
0ba2eba: Added
print()support in template code blocks —{% print("text") %}now emits directly into the template output instead of going to stdout. Arguments are concatenated with no separators and no trailing newline, giving full whitespace control to the template author. -
ccd9646: Unified presets:
config.tools.presetsmerged into top-levelpresets. Presets can now carryprovider,model,parameters, andauto_approvefields — enabling composite presets like$explorethat switch both model and tool approval in one:Flemma switchcall. Built-in$defaultrenamed to$standard(approves read, write, edit, find, grep, ls);$readonlyupdated to include find, grep, ls. Read-only tools (find, grep, ls) are now approved via the$standardpreset instead of the sandbox auto-approval path. Schema validates preset key$prefix at finalize via newMapNodedeferred key validation.:Flemma statusnow shows (R) icon for runtime-sourced tool approvals.
Patch Changes
- 8b4b516: Send document title metadata on Anthropic PDF blocks so Claude can see the filename
- 5d...
v0.8.0
The gist…
Flemma v0.8.0 is primarily an infrastructure release. Flemma's configuration had grown into a tangle of mutable sources — setup, runtime switches, frontmatter, provider-internal parameter merges — each managed separately, overriding each other without clear precedence. It was becoming genuinely hard to reason about what the user actually intended. This release replaces all of that with a layered copy-on-write store where each source (defaults → setup → runtime → frontmatter) is an immutable layer with explicit priority. Nothing is mutated in place, so it's always clear which layer a value came from and why it won. The practical result: flemma.opt is now on par with the full configuration — anything expressible in setup() can be overridden per-buffer through frontmatter, including MongoDB-style list operators ($set, $append, $remove, $prepend) in JSON. Providers have been completely decoupled from global state — there's no longer a single shared instance. Each provider is constructed fresh for its request, scoped to the buffer's resolved config, so per-buffer provider and model overrides in frontmatter just work. :Flemma status was rebuilt to surface all of this — layer source indicators show exactly why Flemma is making the decisions it is. Tool previews now use structured label/detail formatting, and the secrets system emits resolver diagnostics when a key can't be found. Smaller additions: .chat buffers auto-prepend @You: on open, devicons integration, lualine format override, and "did you mean?" for unknown commands.
🧩 Enhancement: Per-Buffer Configuration
flemma.opt previously supported only a subset of what setup() could express. With the new layered store behind it, that gap is closed — anything you can configure globally can now be overridden per-buffer through frontmatter, with clear precedence over every other source.
In Lua frontmatter, the flemma.opt proxy gives you direct access:
```lua
flemma.opt.provider = "openai"
flemma.opt.thinking = "medium"
flemma.opt.tools:append("bash")
flemma.opt.tools:remove("write")
```In JSON frontmatter, MongoDB-style operators control list mutations precisely:
```json
{
"flemma": {
"tools": {
"$append": ["bash", "grep"],
"$remove": "write",
"auto_approve": { "$append": ["grep"] }
}
}
}
```Only options you touch are written — everything else falls through to your global config. Frontmatter is evaluated passively on every edit, so your lualine component and :Flemma status reflect changes as you type. Errors during editing are silently preserved until you send.
Tool name typos are caught with "did you mean?" suggestions (e.g., bahs → "Did you mean 'bash'?").
See docs/templates.md for the full frontmatter reference and docs/configuration.md for config aliases and the layer model.
🔄 Enhancement: Request-Scoped Providers
Previously, a single global provider instance served every buffer in the Neovim session. Runtime switches affected everything, and there was no reliable way to pin a specific provider or model to a specific chat. That's gone — providers are now constructed per-request from the buffer's resolved config and discarded when the request completes. Set flemma.opt.provider = "openai" or flemma.opt.model = "o3" in frontmatter and it stays locked to that buffer regardless of what you do elsewhere. No runtime overrides leak in, no other buffer's :Flemma switch affects it.
This also lays groundwork for sub-agent workflows in future releases, where each agent buffer will need its own pinned provider and model.
🔍 Improvement: Redesigned :Flemma status
Because the layered store tracks where every value comes from, :Flemma status can now show you exactly why Flemma is making the decisions it is. The window uses a box-drawing tree layout with extmark-based highlighting, and every config value shows a layer source indicator (🆂 setup, 🆁 runtime, 🅵 frontmatter) — so if a frontmatter override is winning over your setup, you'll see it at a glance.
Thinking budget resolution is shown inline (e.g., "minimal → low"), and frontmatter diagnostics (parse errors, validation failures) appear directly in the status view.
✨ Improvement: Structured Tool Previews
Tool fold text now separates the LLM's stated intent from the raw parameters. A folded tool call might read:
Finding Python files — glob: "**/*.py"
The label (intent) and detail (parameters) use distinct highlight groups — FlemmaToolLabel (italic) and FlemmaToolDetail (dimmed, defaults to Comment) — so you can scan a long conversation and immediately see what each tool call was doing without unfolding it.
Custom tools can return { label = "...", detail = "..." } from their format_preview function to take advantage of this. Plain string returns still work as before.
See docs/ui.md for highlight group customization.
🔐 Improvement: Secrets Resolver Diagnostics
When an API key can't be found, Flemma now tells you exactly why. Each secret resolver (environment variables, macOS Keychain, gcloud, secret-tool) emits structured diagnostics explaining what it checked and what went wrong. The result is a single notification listing all attempted resolvers and their failure reasons, instead of a generic "key not found" message.
The gcloud binary path is also now configurable via secrets.gcloud.path for non-standard installations.
Minor Changes
-
4ca6f8e: Added
editing.auto_promptoption (defaulttrue) that prepends@You:to empty.chatbuffers on open, giving new users a clear starting point. -
d8a1187: Replaced the configuration system with a layered, schema-backed copy-on-write store.
The new system introduces a schema DSL for declarative config shape definition, a four-layer store (DEFAULTS, SETUP, RUNTIME, FRONTMATTER) with separate scalar (top-down first-set-wins) and list (bottom-up accumulation) resolution, read/write proxy metatables for ergonomic access, and a DISCOVER callback pattern that lets tool, provider, and sandbox modules register their own config schemas at load time without coupling the schema definition to heavy modules.
All configuration access now goes through a single public facade (
require("flemma.config")). The legacy flat merge (vim.tbl_deep_extendinconfig.lua), the global config cache (state.get_config/state.set_config), and the per-buffer opt overlay (buffer/opt.lua) have all been removed. Frontmatter evaluation writes directly to the FRONTMATTER layer of the store, andflemma.optis now a write proxy into that layer.Providers are now request-scoped — constructed inline per
send_to_provider()call with per-buffer parameters, captured in closures, and GC'd after the request completes. The global mutable provider instance, the parameter override diffing machinery, andconfig_manager.luahave been dissolved intocore.lua(orchestration) andprovider/normalize.lua(pure parameter normalization functions).The approval system is unified into a single config resolver that reads the resolved
tools.auto_approvefrom the layer store, replacing the previous two-resolver pattern (config + frontmatter at separate priorities). Presetdenylists have been removed — an auto-approve policy that denies is a contradiction.:Flemma statusnow shows right-aligned layer source indicators (D/S/R/F) on provider, model, parameter, and tool lines, and a verbose view with per-layer ops and a schema-walked resolved config tree.Test coverage includes 9 new config test suites (store, proxy, schema, definition, alias, list ops, DISCOVER, lens, integration) alongside migration of ~30 existing test files to the new facade.
-
1cda981: Add deferred semantic validation to config schema nodes. Tool names in frontmatter and setup config are now validated against the tool registry at finalize time, with "did you mean?" suggestions for typos.
-
fb5f241: Added devicons integration that auto-registers a .chat file icon with nvim-web-devicons (or other compatible devicons plugins). Enabled by default — configure via
integrations.devicons.enabledandintegrations.devicons.icon. -
3fcb594: Fold previews now show tool labels (the LLM's stated intent) prominently, with raw technical detail visually subordinate.
Tool
format_previewfunctions can now return{ label?, detail? }instead of a plain string, wheredetailmay be astring[](joined with double-space upstream for uniform display). Built-in tools (bash, read, write, edit, grep, find, ls) have been updated to use the structured return. String-returningformat_previewfunctions are fully backward-compatible. New highlight groupsFlemmaToolLabel(italic) andFlemmaToolDetail(default: Comment) style the two pieces independently. Label and detail are separated by an em-dash (—) in both folds and tool preview virtual lines. -
2c7661e: JSON frontmatter now supports MongoDB-style operators ($set, $append, $remove, $prepend) for config writes via the
flemmakey -
4248502: The lualine component now accepts a
formatoption directly in the section config, which takes precedence overstatusline.formatin the Flemma config:{ "flemma", format = "#{provider}:#{model}" } -
8d5b6a6: Passively evaluate frontmatter on InsertLeave, TextChanged, and BufEnter so integrations like lualine see up-to-date config values without waiting for a request send. On error, the last successful frontmatter parse is preserved.
Refactored
config.finalize()to return validation failures as data instead of accepting a reporter callback, making codeblock parsers pure data functi...
v0.7.0
The gist…
Flemma v0.7.0 is the extensibility release. Your system prompts are now a full-blown template engine — {% if %}, {% for %}, parameterized includes, the works. A new personality system lets tools describe themselves so the LLM gets a tailored, project-aware system prompt out of the box. Credentials are managed by a new secrets module that resolves API keys from environment variables, macOS Keychain, GNOME Keyring, or gcloud CLI with zero configuration. An experimental LSP server brings hover inspection and go-to-definition to .chat buffers, and three new exploration tools (grep, find, ls) give the LLM the ability to navigate your codebase. For plugin authors, a hooks module emits lifecycle events at every stage of a request, and a preprocessor pipeline enables custom AST transforms. Day-to-day UX improves with a redesigned progress indicator (floating, phase-aware, always visible), a cursor engine that prevents focus-stealing during agent loops, and tmux-style statusline format strings for full control over your lualine component. On the reliability side: the parser no longer breaks on role markers inside fenced code blocks, AST parsing during streaming is now incremental (O(new content) instead of O(total)), and the provider layer shed ~370 lines of duplicated code.
🧩 New Feature: Template Engine
System and user messages now support {% lua code %} blocks for full control flow — conditionals, loops, variable assignment — alongside the existing {{ expression }} syntax. Whitespace trimming ({%- -%}, {{- -}}) keeps your output clean. Includes are now parameterized: {{ include('persona.md', { style = "brief" }) }} passes variables into the included file, where they're available as top-level identifiers. Included files support full template syntax at any nesting depth.
The template environment is extensible via templating.modules in your setup config — register custom populator modules that add globals to the Lua sandbox. Two built-in populators ship: stdlib (the standard library you already know) and iterators (providing values() and each() for concise array iteration with loop metadata like index, first, last).
Breaking change: Binary include mode now uses symbol keys ([symbols.BINARY], [symbols.MIME]) instead of the reserved strings "binary" and "mime", so those names are free to use as template variables.
See docs/templates.md for the full syntax reference, examples, and error behavior.
🎭 New Feature: Personality System
Personalities generate dynamic, tool-aware system prompts. Include one with {{ include('urn:flemma:personality:coding-assistant') }} in your @System: message and Flemma assembles a complete prompt that lists every enabled tool with descriptions, collects behavioral guidelines contributed by each tool, adds environment context (cwd, current file, git branch, date/time), and appends auto-discovered project files like CLAUDE.md, AGENTS.md, or .cursorrules.
Tool definitions contribute personality-scoped content via a new personalities field — snippets, guidelines, or any custom part names. The system is open: create your own personality module by implementing a single render(opts) function and registering it.
See docs/personalities.md for usage and authoring details.
🔐 New Feature: Secrets Module
Providers no longer manage their own credential lookup. They declare what they need (kind + service) and the secrets module resolves it through a chain of platform-aware resolvers tried in priority order:
- Environment variables — convention-based (
ANTHROPIC_API_KEY) with alias support - GNOME Keyring (Linux) — via
secret-tool - macOS Keychain — via
security - gcloud CLI — derives access tokens, with or without a service account
Results are cached with TTL awareness — configurable freshness scaling lets short-lived tokens (like gcloud's 1-hour access tokens) refresh before expiry. You can register custom resolvers (Vault, 1Password, team-specific stores) at runtime. Existing keyring entries stored under the previous scheme are still found via legacy fallback.
See the "Credential Resolution" section in docs/extending.md.
🔎 New Feature: Experimental LSP Server
Flemma now ships an in-process LSP server that attaches to every .chat buffer. Hover (K) returns structured information for every buffer position: expressions show their parsed AST, tool use/result blocks show IDs and metadata, thinking blocks show full untruncated content, role markers show message summaries, and frontmatter shows the parsed configuration. Go-to-definition (gd) navigates between tool use and tool result siblings, jumps to @./file references, and resolves {{ include() }} expressions to their target files.
Enabled by default when vim.lsp is available. Disable with experimental = { lsp = false } in setup.
See the experimental section in docs/configuration.md.
🗺️ New Feature: Exploration Tools
Three new tools give the LLM the ability to search and navigate your codebase: grep (content search with ripgrep/grep fallback), find (file discovery with fd/git-ls-files/find fallback), and ls (directory listing with depth control). All three respect the sandbox, auto-detect the best available backend, and truncate output to prevent context overflow.
Gated behind experimental = { tools = true } in setup. When sandbox auto-approval is active (the default), these tools run without manual confirmation.
See docs/tools.md for configuration options and backend details.
🪝 New Feature: Hooks Module
Flemma now emits User autocmds at key lifecycle points, enabling external plugins and custom integrations:
FlemmaRequestSending/FlemmaRequestFinished(with status: completed, cancelled, or errored)FlemmaToolExecuting/FlemmaToolFinished(with tool name, ID, and status)FlemmaBootComplete(when async tool sources finish loading)
The built-in bufferline.nvim integration is the first consumer — it shows a busy icon on .chat tabs while requests or tools are in-flight. The hooks module is the foundation for a growing plugin ecosystem.
See the "Hooks & Events" section in docs/extending.md and docs/integrations.md for the bufferline setup.
✨ New Feature: Progress Indicator & Cursor Engine
The streaming indicator has been completely redesigned. A persistent floating window shows character count, elapsed time, and a phase-specific animation throughout the full request lifecycle — including tool input buffering, which previously showed no progress for OpenAI and Vertex providers. The float appears at the bottom of the chat window when the progress extmark scrolls off-screen, with the spinner icon placed in the sign column to match the notification bar layout. Configurable via progress.highlight and progress.zindex.
Alongside this, a new cursor engine centralizes all cursor movement with focus-stealing prevention: system-initiated moves (tool results, response completion, autopilot transitions) are deferred until the user is idle, so the cursor no longer jumps away while you're reading or editing during an agent loop.
Minor Changes
-
d36de50: Added
ast:diffcommand for side-by-side comparison of raw and rewritten ASTs, with syntax highlighting, folding, and cursor-aware scrolling. LSP hover now uses the same tree dump format for consistent AST inspection. -
ba903a8: Add booting indicator for async tool sources:
#{booting}lualine variable,FlemmaBootCompleteautocmd, and ⏳ indicator in:Flemma status -
464a909: Added optional bufferline.nvim integration that shows a busy icon on
.chattabs while a request is in-flight. Configure withget_element_icon = require("flemma.integrations.bufferline").get_element_iconin your bufferline setup. Custom icons supported viaget_element_icon({ icon = "+" }). -
235b8e1: Added centralized cursor engine with focus-stealing prevention. System-initiated cursor moves (tool results, response completion, autopilot) are now deferred until user idle, preventing cursor hijacking during agent loops. User-initiated moves (send, navigation) execute immediately.
-
0c6e6cb: Added experimental in-process LSP server for chat buffers with hover and goto-definition support. Enable with
experimental = { lsp = true }in setup. Every buffer position returns a hover result: segments (expressions, thinking blocks, tool use/result, text) show structured dumps, role markers show message summaries with segment breakdowns, and frontmatter shows language and code. Goto-definition (gd,<C-]>, etc.) on@./filereferences and{{ include() }}expressions jumps to the referenced file, reusing the navigation module's path resolution. -
92bd667: Added three exploration tools for LLM-powered codebase navigation:
grep(content search with rg/grep fallback, --json match counting, per-line truncation),find(file discovery with fd/git-ls-files/find fallback, recursive patterns, configurable excludes), andls(directory listing with depth control). All tools use existing truncation, sink, and sandbox infrastructure. Executor cwd resolution generalized from bash-specific to per-tool. -
cf30657: Added file drift detection: warns when
@./filereferences change between requests, helping identify cache breaks and potential LLM confusion from stale conversation context -
393e18d: Added
<Space>keymap to toggle folds in.chatbuffers. Configurable viakeymaps.normal.fold_toggle; automatically skipped when the key conflicts withmapleader. -
749c1c7: Added hooks module for external plugin integration. Fl...
v0.6.0
The gist…
Flemma v0.6.0 is a big visual and structural release. Tool blocks now fold independently — Tool Use and Tool Result each get their own fold at level 2, and completed tools auto-fold after execution so your buffer stays clean. Fold text is syntax-highlighted per segment, showing tool names, input previews, and line counts in distinct colors. Role markers moved to their own line with the ruler integrated directly into the @Role: line (─ Assistant ─────…), giving conversations a cleaner visual rhythm; old-format files are auto-migrated on load. The notification bar was rewritten with a priority-based layout engine, compact Unicode symbols (Σ, #N, ↑↓), WCAG-contrast color tiers, and a gutter icon that frees up column space. A new diagnostics mode (diagnostics = { enabled = true }) lets contributors debug prompt caching by comparing consecutive API requests and warning when the prefix diverges — complete with byte-level diff view via :Flemma diagnostics:open. Model metadata enrichment adds per-model thinking budgets and cache pricing so thinking parameters are silently clamped to valid ranges instead of hitting API errors, and :Flemma status now displays context window, pricing, and thinking budget info. Under the hood, a deterministic JSON encoder sorts request keys for better prompt cache hit rates, and range extmarks replaced per-line highlights, dropping Neovim API calls from ~500 to ~20 per update. On the reliability side: fold auto-close race conditions that left blocks unfolded during streaming are fixed, API errors (non-SSE responses, HTML error pages) are now properly surfaced instead of silently swallowed, and UTF-8 content no longer breaks in fold previews.
Minor Changes
- 6546355: Aligned all registry modules to a consistent API contract: every registry now exposes register(), unregister(), get(), get_all(), has(), clear(), and count(). Extracted shared name validation into a new flemma.registry utility module. Renamed tools registry define() to register() (define() kept as deprecated alias).
- dea4561: Notification bar background is now a blend of Normal bg (base), StatusLine bg (30%), and DiffChange fg (20%), producing a subtly tinted bar that's easier to read against the editor background
- 568fb63: Compact notification bar format: token arrows now follow numbers (129↑ 117↓), session request count is merged into the Σ label (Σ3), and the bar automatically uses relaxed double-spacing when width allows
- bb15c08: Restore CursorLine visibility on line-highlighted chat buffer lines. Blended overlay highlights preserve role-specific backgrounds while showing the cursor line, with smart toggling via OptionSet and a fg-only thinking fold preview group.
- 9459e97: Add deterministic key-ordered JSON encoder for prompt caching. API request bodies now serialize with sorted keys and provider-specific trailing keys (messages, tools) placed last, maximizing prefix-based cache hits across all providers.
- 9c0f873: Added diagnostics mode for debugging prompt caching issues. When enabled via
diagnostics = { enabled = true }, Flemma compares consecutive API requests per buffer and warns when the prefix diverges (breaking caching). Includes byte-level analysis, structural change detection, and a side-by-side diff view (:Flemma diagnostics:open). - a6618bd: Notification bar now derives all colors from DiffChange with three foreground tiers (primary, secondary, muted) and WCAG contrast enforcement on semantic cache colors. Added
^contrast operator to highlight expressions and extracted color utilities intoflemma.utilities.colorfor reuse. - bae5026: Extracted folding logic into dedicated
ui/foldingmodule with registry-based fold rules, O(1) cached fold map, and configurableauto_closeper fold type (thinking, tool_use, tool_result, frontmatter) - c56f356: Added independent folding for Tool Use and Tool Result blocks at fold level 2. Completed and terminal tool blocks auto-fold after execution, reducing visual noise. In-flight tools (pending, approved, executing) remain visible. Fold summaries reuse the same preview format as pending tool extmarks.
- 77cb82b: Added per-segment syntax highlighting to fold text lines. Fold lines now return
{text, hl_group}tuples so each part (icon, title, tool name, preview, line count) uses its own highlight group. New config keys:tool_icon,tool_name,fold_preview,fold_meta. Renamedtool_usetotool_use_titleandtool_resulttotool_result_titlefor 1:1 correspondence with highlight groups. Added sharedroles.luautility for centralised role name mapping. - 0fc8bea: Merged ruler into role marker lines:
@Role:now renders as─ Role ─────...with the ruler extending to the window edge, replacing the separate virtual line above each message - 078a3a2: Enriched model metadata matrix with per-model thinking budgets, cache pricing, and cache minimum thresholds. Thinking parameters are now silently clamped to model-specific bounds instead of hitting runtime API errors. Cache percentage indicator is suppressed when input tokens are below the model's minimum cacheable threshold. Session pricing now uses per-model absolute cache costs where available, with provider-level multipliers as fallback.
- b46f3ea: Rewrite notification bar with a priority-based layout engine and gutter icon. The 💬 prefix now renders in the gutter when space allows, freeing 3 columns for content. Renamed all FlemmaNotify_ highlight groups to FlemmaNotifications_ for consistency.
- 5d646e1: Added configurable
notifications.highlightandnotifications.borderoptions, and fixed notification misalignment when async plugins (git-signs, LSP) change gutter width after positioning - fe71464: Line highlights now use per-message range extmarks instead of per-line extmarks, reducing API calls from ~500 to ~20 per update. New lines created by pressing Return in insert mode are highlighted immediately via Neovim's gravity system instead of waiting for CursorHoldI.
- 652e9f6: Reprioritized notification bar segments: session cost and request input tokens now survive truncation at narrow widths. Replaced word labels with compact Unicode symbols (Σ for session totals, #N for request count, bare percentage for cache).
- 0c6e898: Role markers (
@System:,@You:,@Assistant:) now occupy their own line in.chatbuffers. Old-format files are automatically migrated on load, and a new:Flemma formatcommand is available for manual migration. Insert-mode colon auto-newline moves the cursor to a new content line after completing a role marker. - 29ba841:
:Flemma statusnow shows model metadata (context window, pricing, thinking budget range) in the Provider section for known models. Verbose mode includes a full Model Info dump. Syntax highlighting updated with model version suffixes, dollar amounts, and token count suffixes. - 46e6b25: Move shared utility modules to
flemma.utilities.*namespace and introduceflemma.utilities.bufferfor common buffer manipulation patterns
Patch Changes
- b109b62: Cancel both Space and Enter after role marker auto-newline to prevent unwanted blank lines from muscle memory
- acc51d0: Fixed spurious "A request is already in progress" warning during autopilot tool execution loops with sync tools
- a870175: Fixed CursorLine overlay flashing on every keystroke when blink-cmp completion menu is open
- 5de6e77: Fixed spurious "Cache break detected" diagnostics warning when switching between providers
- b60a533: Fix diagnostics false positive when messages grow between turns. Cache-break warnings now only fire for actual prefix-breaking changes (tools, config, system prompt), not for normal message appends at the document tail.
- 9cc706d: Fixed fold auto-close race condition where thinking blocks and tool blocks would remain unfolded ~10% of the time due to silent foldclose failures being permanently marked as successful. Also fixed folds not being applied when returning to a chat buffer after switching tabs during streaming.
- 5c87b26: Fixed fold_completed_blocks firing redundantly on every cursor movement, spamming the debug log
- 6bf2ed9: Fixed tool fold previews falling back to generic key=value format for tools registered via
config.tools.modules(e.g. extras) by ensuring lazy modules are loaded before registry lookup - a57a6dc: Fixed preview truncation (fold text, tool indicators) using byte length instead of display width, which caused incorrect truncation and potential UTF-8 splitting with multibyte content (CJK, accented characters, Unicode symbols)
- e098341: Fixed notification bar icon flickering during scrolling by replacing the 💬 emoji prefix with ℹ (U+2139), which renders reliably across terminal emulators
- 720ddab: Fixed extra space in notification bar caused by stale item width alignment from dismissed notifications
- 8686997: Fixed role_style attributes (e.g., underline) bleeding into ruler characters on role marker lines
- 84442f0: Fixed self-closing thinking tags (
<thinking .../>) creating unclosed folds that swallowed subsequent buffer content - 300525a: Fixed missing warning when pressing
<C-]>while a request is already in progress — the keypress was silently ignored instead of showing the "Use<C-c>to cancel" message - f59d94f: Fixed silent failure when API returns non-SSE error responses (plain JSON, HTML error pages, or plain text). Errors are now properly surfaced via vim.notify instead of being silently swallowed.
- 46da4a0: Fixed thinking blocks not auto-folding after the first response in a session
- 932dc68: Tool block folds now absorb trailing blank lines when the next adjacent tool block is also foldable, producing a cleaner collapsed view without vertical gaps between folded blocks
- 9386d8f: Notification recall now derives segments from ses...
v0.5.0
The gist…
Flemma v0.5.0 is smarter about the things you shouldn't have to think about. Sandboxed bash commands are now auto-approved — if your sandbox backend is available, tool calls run without prompts, so agentic workflows feel seamless out of the box. Cancelling mid-stream is finally clean: hit <C-c> and orphaned tool calls resolve themselves instead of leaving you stuck in approval limbo. max_tokens defaults to "50%" of the model's output limit and percentages are a first-class config value, so you get sensible defaults without knowing every model's context window. Anthropic prompt caching switches to the auto-caching API, eliminating fragile edge cases when the conversation tail lands on an unusual message shape. Usage notifications have been redesigned with a compact two-column layout that foregrounds cost and cache hit rate (color-coded green/yellow), and rate-limit errors now surface retry-after timing and remaining quota straight from the API headers. Under the hood, a new per-buffer write queue eliminates the E565 textlock crashes that occurred when visual-mode plugins collided with streaming callbacks.
Minor Changes
- 2350bd7: Added automatic handling of aborted responses: when a user cancels (
<C-c>) mid-stream after tool_use blocks, orphaned tool calls are now automatically resolved with error results instead of triggering the approval flow. The abort marker (<!-- flemma:aborted: message -->) is preserved for the LLM on the last text-only assistant message so it can continue contextually. - 5c3aee7: Added max_input_tokens and max_output_tokens to all model definitions, enabling future context window awareness and cost prediction features
- 681ebbf: Added
flemma.sinkmodule — a buffer-backed data accumulator that replaces in-memory string/table accumulators across the codebase. Sinks handle line framing, write batching, and lifecycle management behind an opaque API. Migrated cURL streaming, bash tool output, provider response buffering, thinking accumulation, and tool input accumulation to use sinks. - 2d24104: Use Anthropic's auto-caching API for the conversation tail breakpoint, replacing manual last-user-message walking with a more robust top-level cache_control field
- 9aff386: Redesigned usage notifications with compact dotted-leader layout, cache hit percentage with conditional color highlighting, and arrow-based token display
- c574d43: Show rate limit details (retry-after, remaining quota headers) in error notifications when API returns HTTP 429, with a fallback "Try again in a moment" hint when headers are unavailable
- ee19164: Auto-approve bash tool when sandbox is enabled and a backend is available. A new resolver at priority 25 approves bash calls when sandboxing is active, so sandboxed sessions run without manual approval prompts by default. Users can opt out via
tools.auto_approve_sandboxed = falsein config, or by excluding bash from auto-approval in frontmatter (auto_approve:remove("bash")). - 8758bdd: Smart max_tokens: default is now "50%" (half the model's max output), percentage strings are resolved automatically, and integers exceeding the model limit are clamped with a warning.
:Flemma statusshows the resolved value alongside the percentage.
Patch Changes
- 1991273: Fixed auto_write not consistently writing the buffer after tool execution, denied/rejected tool processing, and
:Flemma import - 8058909: Fixed bwrap sandbox breaking nix commands on NixOS by using
--symlinkinstead of--ro-bindfor/run/current-systemand/run/booted-system, preserving their symlink nature so nix can detect store paths correctly - e4afad6: Fixed role marker highlights losing foreground color when the base highlight group only defines background, and fixed spinner background not inheriting line highlight colors
- b767a0d: Fixed pending tool blocks with user-provided content being silently discarded. When a user pastes output into a
flemma:tool status=pendingblock and presses<C-]>, the content is now accepted as the tool result and sent to the provider instead of being replaced by a synthetic error. - 80eb9fc: Fixed E565 textlock errors when visual-mode plugins (e.g., targets.vim) hold textlock while streaming responses complete. All async buffer modifications now go through a per-buffer FIFO write queue that retries on textlock.
- 0c333ef: Added FlemmaSinkCreated and FlemmaSinkDestroyed user autocmd events for observing sink lifecycle
- 2d24104: Fixed non-deterministic tool ordering in Vertex provider that was causing implicit cache misses on every request
v0.4.0
The gist…
Flemma v0.4.0 turns the plugin into a proper extensible platform. Tool approval presets ($default, $readonly) let you go from zero config to a working agent loop — tools like read, write, and edit auto-approve out of the box while bash stays gated behind manual confirmation, and you can override any of it per-buffer via frontmatter. Third-party extensions are now first-class: drop a Lua module path (e.g., "3rd.tools.todos") into your config for providers, tools, approval resolvers, or sandbox backends — no require() boilerplate needed. Rich fold previews show what's hiding inside collapsed messages (tool names, commands, results) so you can skim long conversations at a glance. :Flemma status gained a full Approval section that shows you exactly which tools are auto-approved, denied, or awaiting manual confirmation, with frontmatter overrides clearly marked. Tools now resolve relative paths against the .chat file's directory instead of Neovim's cwd, matching how @./file references and {{ include() }} already work. On the reliability side, frontmatter evaluation was optimized from 2N+2 executions per dispatch down to exactly one, and the bwrap sandbox on NixOS no longer hides system packages. This release also removes all remaining Claudius-era compatibility shims — if you haven't migrated your config from require("claudius") to require("flemma") yet, now's the time.
Minor Changes
-
ffe72b3:
tools.auto_approvenow accepts astring[]of module paths (and mixed module paths + tool names). Internal approval resolver names useurn:flemma:approval:*convention; module-sourced resolvers are addressable by their module path directly. -
fae1e16: Added dynamic module resolution for third-party extensions. Lua module paths (dot-notation strings like "3rd.tools.todos") can now be used in config.provider, config.tools.modules, config.tools.auto_approve, config.sandbox.backend, and flemma.opt.tools to reference third-party modules without explicit require() calls. Modules are validated at setup time and lazily loaded on first use.
-
3cf9fe3: Refactor tool definitions to use ExecutionContext SDK — tools now code against
ctx.path,ctx.sandbox,ctx.truncate, andctx:get_config()instead of requiring internal Flemma modules directly -
75e34c8: Moved calculator and calculator_async tools from built-in definitions to lua/extras (dev-only); production builds no longer ship calculator tools
-
974eac1: Auto-approve policy now expands $-prefixed preset references, allowing
auto_approve = { "$default", "$readonly" }to union approve/deny lists from the preset registry. Config-level resolvers defer to frontmatter when it sets auto_approve, enabling per-buffer override of global presets. -
ef6a932: Removed all backwards-compatibility layers from the Claudius-to-Flemma migration. This is a breaking change for users who still rely on any of the following:
Removed:
require("claudius")module fallback. Thelua/claudius/shim that forwarded torequire("flemma")has been deleted. Update your config torequire("flemma").Removed: legacy
:Flemma*commands. The individual commands:FlemmaSend,:FlemmaCancel,:FlemmaImport,:FlemmaSendAndInsert,:FlemmaSwitch,:FlemmaNextMessage,:FlemmaPrevMessage,:FlemmaEnableLogging,:FlemmaDisableLogging,:FlemmaOpenLog, and:FlemmaRecallNotificationhave been removed. Use the unified:Flemma <subcommand>tree instead (e.g.,:Flemma send,:Flemma cancel,:Flemma message:next).Removed:
"claude"provider alias. Configs specifyingprovider = "claude"will no longer resolve to"anthropic". Update your configuration to use"anthropic"directly.Removed:
reasoning_formatconfig field. The deprecatedreasoning_formattype annotation (alias forthinking_format) has been removed fromflemma.config.Statusline.Removed:
resolve_all_awaiting_execution()internal API. This backwards-compatibility wrapper inflemma.tools.contexthas been removed. Useresolve_all_tool_blocks()and filter for the"pending"status group instead. -
50eea2b: Rich fold text previews for message blocks. Folded
@Assistantmessages now show tool use previews (e.g.bash: $ free -h | bash: $ cat /proc/meminfo (+1 tool)), and folded@Youmessages show tool result previews with resolved tool names (e.g.calculator_async: 4 | calculator_async: 8). Expression segments are included in fold previews, consecutive text segments are merged, and runs of whitespace are collapsed to keep previews compact. -
5b637d2: Added an Approval section to
:Flemma statusshowing auto-approve, deny, and require-approval classification per tool with preset expansion. Frontmatter overrides are marked with ✲ on individual items across Tools, Approval, Parameters, and Autopilot sections, with a conditional legend at the bottom. -
cd97ff5: Added tool approval presets for zero-config agent loops. Flemma now ships with
$readonlyand$defaultpresets. The defaultauto_approveis{ "$default" }, which auto-approvesread,write, andeditwhile keepingbashgated behind manual approval. Users can define custom presets intools.presetsand reference them inauto_approve. Frontmatter supportsflemma.opt.tools.auto_approve:remove("$default")and:remove("read")for per-buffer overrides. -
0617d2c: Changed tool execute function signature from
(input, callback, ctx)to(input, ctx, callback?)— sync tools no longer need a placeholder_argument, and callback-last ordering matches Node.js conventions -
5de4f32: Tools now resolve relative paths against the .chat buffer's directory (
__dirname) instead of Neovim's working directory, matching the behavior of@./filereferences and{{ include() }}expressions. Thetools.bash.cwdconfig defaults to"$FLEMMA_BUFFER_PATH"(set tonilto restore the previous cwd behavior). -
ff794c4: Added tool approval presets configuration field and wired preset registry into plugin initialization with
{ "$default" }as the default auto_approve policy
Patch Changes
-
5035b41: Fixed
flemma.opt.tools.auto_approve:append()failing when auto_approve was not explicitly assigned first in frontmatter -
4062653: Fixed bwrap sandbox hiding NixOS system packages by re-binding
/run/current-systemread-only after the/runtmpfs mount -
93b79e8: Frontmatter is now evaluated exactly once per dispatch cycle instead of 2N+2 times (where N = number of tool calls), reducing redundant sandbox executions and preventing potential side-effects from repeated evaluation.
-
ec0072b: Updated model definitions with latest pricing and availability data from all three providers.
Anthropic: Removed retired Claude Sonnet 3.7 and Claude Haiku 3.5 models (retired Feb 19, 2026). Updated Claude Haiku 3 deprecation comment to reflect April 2026 retirement date.
Vertex AI: Added Gemini 3.1 Pro Preview (
gemini-3.1-pro-preview). Removed superseded preview-dated aliasesgemini-2.5-flash-preview-09-2025andgemini-2.5-flash-lite-preview-09-2025.OpenAI: No changes — all existing models and pricing confirmed current against official documentation.
v0.3.0
The gist…
Tool approval gets a major UX upgrade — pending tool calls now show inline virtual-line previews so you can see exactly what you're approving or rejecting without unfolding anything, and every built-in tool (bash, read, edit, write, calculator) ships a tailored preview formatter. A new :Flemma status command gives you a one-glance dashboard of your runtime state: provider, model, merged parameters, autopilot, sandbox, and enabled tools. Config gets simpler too: model = "$preset-name" lets you point your default at an existing preset instead of duplicating provider/model/parameters at the top level. Under the hood, tool execution has been unified into a single three-phase algorithm with explicit status semantics (pending → approved/denied/rejected), replacing the old split between autopilot and manual flows — the result is more predictable behavior and cleaner .chat files. Model defaults have been refreshed: Claude Sonnet 4.6 is now the default Anthropic model and o3-pro has been added to the OpenAI roster. On the stability side, the bash tool no longer chokes on heredoc commands, JSON null values from LLM responses no longer crash tool definitions, and cross-provider parameter merges correctly preserve provider-specific keys like project_id when switching via presets.
Minor Changes
- e5a9b6f: Added
:Flemma statuscommand that displays comprehensive runtime status (provider, model, merged parameters, autopilot state, sandbox state, enabled tools) in a read-only scratch buffer. Use:Flemma status verbosefor full config dump.:Flemma autopilot:statusand:Flemma sandbox:statusnow open the same status view with cursor positioned at the relevant section. - 9fc147c: Tool definitions can now provide an optional
format_previewfunction for custom preview text in tool status blocks. All built-in tools (calculator, bash, read, edit, write) include tailored previews showing the most relevant input at a glance. - 6f8b455: Added support for
model = "$preset-name"in config to use a preset as the startup default, avoiding duplication of provider/model/parameters at the top level - f20492f: Added virtual line previews inside tool status blocks showing a compact summary of the tool call, so users can see what they are approving or rejecting
- 9bd2785: Unified tool execution into a three-phase advance algorithm with explicit status semantics (
flemma:tool status=pending|approved|rejected|denied), replacing the oldflemma:pendingmarker and separate autopilot/manual flows - 299702f: Added Claude Sonnet 4.6 as the new default Anthropic model, removed retired chatgpt-4o-latest, added o3-pro snapshot, and updated Gemini 2.0 retirement dates
Patch Changes
- 6a5cb12: Fixed Sonnet 4.6 to use adaptive thinking instead of deprecated budget_tokens, clamped
maxeffort tohighon non-Opus models, and added budget_tokens < max_tokens guard for budget-based models - e4933aa: Preview text for tool blocks and folded messages now sizes dynamically to the editor width instead of using a fixed 72-character limit
- 41c130b: Fixed bash tool failing with heredoc commands by replacing
{ cmd; } 2>&1group wrapping withexec 2>&1prefix - 1ca55b2: Fixed cross-provider parameter merge bug where provider-specific config keys (e.g.,
project_id) were silently dropped when switching providers via presets - e4ddd0b: Fixed JSON null values decoding as vim.NIL (truthy userdata) instead of Lua nil, causing crashes in tool definitions when LLMs send null for optional parameters like offset, limit, timeout, and delay
- f88449f: Fixed thinking preview counter disappearing when models emit whitespace-only text before thinking blocks (e.g. Opus 4.6 with adaptive thinking)
- 0af66ea: Moved session reset API from
require("flemma.state").reset_session()torequire("flemma.session").get():reset()
v0.2.0
The gist…
Flemma v0.2.0 is the first semver release and a major step up from the initial preview. The headline feature is autopilot – an autonomous agent loop that executes approved tool calls, feeds results back to the model, and repeats until the task is done or a tool needs manual approval. Tool execution is now sandboxed by default on Linux: shell commands run inside a read-only rootfs with write access limited to your project directory, keeping your system safe from runaway agents. A new unified thinking parameter lets you set thinking effort once (e.g., thinking = "high") and have it work across Anthropic, OpenAI, and Vertex AI, with five levels from minimal to max. The approval registry gives fine-grained control over which tools auto-approve, configurable globally, per-buffer via frontmatter, or through custom plugin resolvers. On the provider side, this release adds Gemini 3 support with native thinking levels, adaptive thinking for Claude Opus 4.6+, proactive OAuth2 token refresh for Vertex AI, and proper error surfacing for safety-filtered responses and stream errors across all providers.
Minor Changes
- 7cccfc6: Adopted semantic versioning (semver) and changesets for automated version management and changelog generation. The project transitions from the previous CalVer (
vYY.MM-N) scheme to standard semver, starting at0.1.0. - c22dd05: Added Anthropic stop reason handling (max_tokens warns, refusal/sensitive surface as errors) and adaptive thinking for Opus 4.6+ models (auto-detected, sends effort level instead of deprecated budget_tokens)
- 4471a07: Added autopilot: an autonomous tool execution loop that transforms Flemma into a fully autonomous agent. After each LLM response containing tool calls, autopilot executes approved tools, collects results, and re-sends the conversation automatically – repeating until the model stops calling tools or a tool requires manual approval. Includes per-buffer frontmatter override (
flemma.opt.tools.autopilot), runtime toggle commands (:Flemma autopilot:enable/disable/status), configurable turn limits, conflict detection for user-edited pending blocks, and full cancellation safety via Ctrl-C. - 05809d5: Added
minimalandmaxthinking levels, expanding from 3 to 5 gradations (minimal | low | medium | high | max). Budget values forlow(1024 → 2048) andhigh(32768 → 16384) were adjusted to align with upstream defaults and make room for the new levels. Each provider maps the canonical levels to its API: Anthropic mapsminimal→lowand passesmaxon Opus 4.6; OpenAI mapsmax→xhighfor GPT-5.2+; Vertex mapsminimal→MINIMAL(Flash) orLOW(Pro) and clampsmaxtoHIGH. - 907b787: Added filesystem sandboxing for tool execution. Shell commands now run inside a read-only rootfs with write access limited to configurable paths (project directory, .chat file directory, /tmp by default). Enabled by default with auto-detection of available backends; silently degrades on platforms without one. Includes Bubblewrap backend (Linux), pluggable backend registry for custom/future backends, per-buffer overrides via frontmatter, runtime toggle via :Flemma sandbox:enable/disable/status, and comprehensive documentation.
- 76c635e: Added Gemini 3 model support: uses
thinkingLevelenum (LOW/MEDIUM/HIGH) instead of numericthinkingBudgetfor gemini-3-pro and gemini-3-flash models - e6b53e2: Added approval resolver registry and per-buffer approval via frontmatter. Tool approval is now driven by a priority-based chain of named resolvers – global config, per-buffer frontmatter (
flemma.opt.tools.auto_approve), and custom plugin resolvers are all evaluated in order. Consolidated tool documentation intodocs/tools.md. - 629dfda: Sandbox enforcement for write and edit tools – both now check
sandbox.is_path_writable()before modifying files and refuse operations outsiderw_paths - dcaa5be: Add unified
thinkingparameter that works across all providers – setthinking = "high"once instead of provider-specificthinking_budgetorreasoning. The default is"high"so all providers use maximum thinking out of the box. Provider-specific parameters still take priority when set. Also promotescache_retentionto a general parameter, consolidatesoutput_has_thoughtsinto the capabilities registry, clamps sub-minimum thinking budgets instead of disabling, and supportsflemma.opt.thinkingin frontmatter for provider-agnostic overrides. - 93f4b68: Added proactive token refresh and reactive auth-error recovery for Vertex AI provider, eliminating the need to manually run
:Flemma switchwhen OAuth2 tokens expire
Patch Changes
- c22dd05: Fixed OpenAI top-level stream error events being silently discarded; they now properly surface as errors
- a59da49: Fixed tool completion indicators being prematurely dismissed during concurrent execution and autopilot
- 784fe5a: Fixed Vertex AI safety-filtered responses silently appearing as successful completions; SAFETY, RECITATION, and other error finish reasons now properly surface as errors
- 5b6b5af: Fixed Vertex AI thinking signature retention during streaming; empty or non-string
thoughtSignaturechunks no longer overwrite a valid cached signature - 784fe5a: Fixed Vertex AI tool response format to use
outputkey instead ofresult, matching the Google SDK convention - 7bf8d64: Fixed Vertex AI tool declarations rejecting nullable types by switching to
parametersJsonSchemaon v1beta1 API - 9995605: Flash a brief "● Pending" indicator on tool result headers awaiting user approval