Skip to content

Releases: Flemma-Dev/flemma.nvim

v0.10.0

09 Apr 10:47
fef0e5c

Choose a tag to compare

The gist…

Flemma v0.10.0 opens the door to external tooling. MCP support via MCPorter lets you connect any Model Context Protocol server — Linear, Slack, GitHub, databases — and use their tools natively inside your .chat buffers. The read tool now handles binary files: hand the model an image or PDF and it sends it natively to providers that support multimodal input (Anthropic, OpenAI, Vertex), with text fallbacks for the rest. Three features that were previously experimental have graduated to stable — LSP hover/go-to-definition, and the find, grep, and ls exploration tools are now enabled by default. On the reliability side, a new tool output overflow system catches runaway output before it floods the context window, saving the full content to a temp file the model can read on demand. Smaller additions: @~/path file references for home-directory paths, a User-Agent header on all API requests for debugging, and an auto-refreshing status buffer.

New Feature: MCP Support via MCPorter

Flemma now supports the Model Context Protocol through MCPorter, a standalone CLI that handles server discovery, OAuth, and transport. Install MCPorter, configure your servers (or let it auto-import from Claude Code, Cursor, and VS Code), then enable in Flemma:

require("flemma").setup({
  tools = {
    mcporter = {
      enabled = true,
      include = { "linear:*", "slack:*" },
    },
  },
})

Flemma discovers servers at startup, fetches their tool schemas with concurrency-controlled fanout, and registers each as a native tool definition — the model sees them alongside bash, read, and your other tools. Include/exclude glob patterns control which tools are enabled. The status buffer shows discovery progress and auto-refreshes when loading completes.

See docs/mcp.md for setup, configuration, and troubleshooting.

New Feature: Binary Content in Tool Results

The read tool now detects binary files — images, PDFs, and other non-text content — and sends them natively to the model instead of dumping raw bytes. Providers that support multimodal input (Anthropic, OpenAI Responses, Vertex) send images as image content blocks and PDFs as document blocks. Providers that don't (OpenAI Chat, Moonshot) fall back to a text placeholder with a diagnostic warning.

This means you "tell" an LLM to "see" a screenshot or a PDF in your message (./diagram.png, ~/Documents/spec.pdf, no @ needed) and the model will actually attempt to read it, without you needing to convert or encode anything.

New Feature: LSP and Exploration Tools Graduate to Stable

The in-process LSP server and the three exploration tools (find, grep, ls) have graduated from experimental.

  • LSP is now configured via top-level lsp = { enabled = true } (previously experimental = { lsp = true }).
  • find, grep, ls are enabled by default — no configuration needed.

Breaking change: The experimental config section is now empty and strict. Passing any key to it (e.g., experimental = { lsp = true }) will produce a validation error. Move LSP config to the top-level lsp key.

Polish and Bug Fixes

Tool output overflow handling. When bash or MCP tool results exceed 2000 lines or 50KB, the full output is now saved to a temp file and the model receives truncated content with a pointer to the full file. This prevents runaway commands from flooding the context window. The overflow path format is configurable via tools.truncate.output_path_format.

@~/path file references. You can now use @~/Documents/notes.txt alongside the existing @./ and @../ syntax. The tilde is expanded at evaluation time, keeping .chat files portable across machines.

User-Agent header. All API requests now include User-Agent: flemma.nvim/X.Y.Z Neovim/A.B.C, useful for debugging request issues with providers.

Tool name encoding. Internal tool names now use colon as separator (e.g., mcporter:slack:channels_list), encoded to double underscore on the wire for LLM API compatibility.

Status buffer auto-refresh. The status buffer now auto-refreshes when async tool sources (like MCPorter) finish loading, replacing "loading" with a "finished" confirmation.

Tool preview preserved during execution. The virtual line preview (e.g., bash: print Hello — $ sleep 5 && echo Hello) now stays visible while a tool is executing, not just while pending approval.

Autopilot throttled tool fix. Fixed autopilot skipping auto-approved tools when non-auto-approved tools coexisted in the same response.

Vertex AI HTTP 417 fix. Suppressed cURL's default Expect: 100-continue header, which was causing HTTP 417 errors on Vertex AI.

Minor Changes

  • 72eeb7a: Added binary content support in tool results. The read tool now detects binary files (images, PDFs) and emits file references instead of raw bytes. Providers that support mixed content (Anthropic, OpenAI Responses, Vertex) send images and PDFs natively; providers that don't (OpenAI Chat, Moonshot) fall back to text placeholders with a diagnostic warning.
  • 65f80df: Added mcporter tool integration: dynamically discovers MCP servers and registers their tools as Flemma tool definitions. Configure via tools.mcporter with include/exclude glob patterns. Disabled by default.
  • 5ddd354: Added mime.detect(filepath) as the single public entry point for MIME detection — tries extension-based lookup first, falls back to the file command. Added mime.is_binary(mime_type) for classifying MIME types as binary vs textual. The previous get_mime_type() and get_mime_by_extension() methods are now internal.
  • f921664: Promoted LSP and exploration tools (find, grep, ls) out of experimental. LSP is now configured via lsp = { enabled = true } (top-level). The three exploration tools are enabled by default. The experimental config section is now empty and strict — any keys passed to it will produce a validation error.
  • 5ddd354: Added @~/path file reference syntax for home-directory relative paths, alongside the existing @./ and @../. The ~ is expanded at evaluation time, keeping .chat files portable across machines.
  • ad7227e: Use colon as internal tool name separator with wire encoding to double underscore for LLM APIs
  • e3f6e0e: Added shared tool output overflow handling: when bash or MCP tool results exceed 2000 lines or 50KB, the full output is saved to a configurable temp file and the model receives truncated content with instructions to read the full output. The overflow path format is configurable via tools.truncate.output_path_format.
  • 1e20943: Added User-Agent: flemma.nvim/X.Y.Z Neovim/A.B.C header to all API requests, backed by a version module that is automatically kept in sync with releases via CI

Patch Changes

  • e86eafe: Fixed autopilot skipping throttled auto-approved tools when pending (non-auto-approved) tools coexist in the same response
  • 2cdde26: Fixed HTTP 417 errors from Vertex AI caused by cURL's default Expect: 100-continue header
  • 0de4dd0: Status buffer now auto-refreshes when async tool sources finish loading, replacing the "loading" indicator with a "finished" confirmation
  • e698820: Fixed tool preview disappearing during execution. The virtual line preview (e.g., bash: print Hello — $ sleep 5 && echo Hello) now remains visible while a tool is executing, not just while pending approval.

v0.9.0

31 Mar 21:02
ed53988

Choose a tag to compare

The gist…

Flemma v0.9.0 brings conversation structure to the surface. Turn indicators draw box-drawing arcs in the gutter marking where each request/response cycle begins and ends, with distinct styles for complete turns, mid-tool-call turns, and active streaming — the kind of thing you didn't know you needed until a 200-message conversation suddenly becomes navigable. A new Moonshot AI (Kimi) provider gives you access to the Kimi K2.5 model family with thinking, tool calling, and 256K context, built atop a reusable Chat Completions base class that makes adding future OpenAI-compatible providers dramatically easier. Presets have been unified into a single system that can switch provider, model, parameters, and tool approval in one command — :Flemma switch $explore can now mean "use GPT-4o-mini with full tool access." On the day-to-day side: <Space> now toggles the entire message fold (not individual sub-folds), the modeline parser handles quoted values and comma-separated lists, templates gain os.date()/os.time() and a proper print() function, and temperature is no longer forced to 0.7 — reasoning models that reject explicit temperature finally just work.

🧩 New Feature: Turn Indicators

Turn indicators draw visual boundaries in the gutter showing where each conversation turn begins and ends. A "turn" is one complete request/response cycle — starting from your @You: message, through any tool use exchanges, to the final @Assistant: response.

Three visual states communicate turn progress at a glance:

  • Complete turns (╭│╰) — rounded arcs for a finished exchange
  • Incomplete turns (╭┊└) — dotted lines when the assistant is mid-tool-call, waiting for results
  • Streaming — the indicator extends in real-time as the response arrives

When the ruler is enabled and right padding is configured, the top arc connects seamlessly to the ruler line for a polished visual join.

Configure via turns.enabled, turns.padding (an integer or { left, right } table), and turns.hl (highlight group, defaults to FlemmaTurnFlemmaRuler).

Breaking change: The signs feature has been removed. Replace any existing signs configuration with the new turns config.

See docs/ui.md for configuration and highlight customization.

🌙 New Feature: Moonshot AI (Kimi) Provider

Flemma now supports Moonshot AI as a first-class provider, giving access to the Kimi model family — including the flagship kimi-k2.5 with optional thinking, tool calling, multimodal input, and 256K context. Set provider = "moonshot" and export MOONSHOT_API_KEY and you're running.

Models range from kimi-k2.5 (thinking-optional, multimodal) to dedicated reasoning models (kimi-k2-thinking, kimi-k2-thinking-turbo) and legacy moonshot-v1-* endpoints. All support tool calling; K2.5 and newer models include prompt caching with no separate write fee.

Under the hood, Moonshot ships on a new Chat Completions base class that implements the OpenAI-compatible wire format as a reusable layer. Future providers that speak Chat Completions (Groq, DeepSeek, Ollama, and others) can now be added with roughly a third of the previous boilerplate.

See docs/configuration.md for provider setup and model options.

🔄 New Feature: Unified Presets

Provider presets and tool approval presets were previously two separate systems. They're now a single top-level presets table where each preset can carry provider, model, parameters, and auto_approve — enabling composite presets that switch everything in one :Flemma switch call:

presets = {
  ["$explore"] = {
    provider = "openai",
    model = "gpt-4o-mini",
    auto_approve = { "read", "write", "edit", "bash", "find", "grep" },
  },
}

Two built-in presets ship: $standard (approves read, write, edit, find, grep, ls) and $readonly (approves read, find, grep, ls).

Breaking change: config.tools.presets has moved to top-level presets. The built-in $default has been renamed to $standard.

See docs/configuration.md for preset formats and docs/tools.md for approval mechanics.

Polish and Bug Fixes

Temperature is now optional. Flemma previously sent temperature: 0.7 on every request, which caused reasoning-native models (gpt-5-mini, o-series) to reject requests outright. Temperature is now omitted unless explicitly set, letting each API use its own default. If you relied on the implicit 0.7, add temperature = 0.7 to your setup config.

<Space> now toggles the entire message fold instead of the fold under the cursor — nested folds (thinking, tool use) close along the way so the message reopens cleanly. Use za for previous per-fold behavior. The modeline parser gained quote-aware tokenization with type coercion, escape sequences, and comma-separated lists, making :Flemma switch arguments more expressive (e.g., tags=python,rust,"C++"). Templates can now use os.date(), os.time(), and friends in expressions, and {% print("text") %} emits directly into template output instead of stdout. Template expressions also no longer break on }} inside Lua string literals and comments.

On the provider side: Anthropic PDF blocks now include the document title so Claude can see filenames, and a content block reordering fix resolves API rejections when text appeared after tool_use blocks. A crash when provider requests completed while the command-line window (q:) was open has been fixed, along with :Flemma switch ignoring key= syntax for clearing parameters.

Model definitions have been updated with gpt-5.4-mini, gpt-5.4-nano, and gpt-5.4-2026-03-05; context windows for claude-opus-4-6 and claude-sonnet-4-6 now reflect 1M; o4-mini cache pricing has been corrected; and several retired models have been removed. Internally, the monolithic models.lua has been split into per-provider modules, and pricing.high_cost_threshold is now configurable (default $30/M output).

Minor Changes

  • 1d9b496: Auto-generate EmmyLua config types from the schema DSL via make types

  • 568f684: Added Moonshot AI (Kimi) provider with support for kimi-k2.5 thinking, tool calling, and all Kimi/Moonshot models. Introduced a reusable Chat Completions base class (openai_chat.lua) for OpenAI-compatible APIs.

  • f4714f9: Temperature is now optional with no default. Previously Flemma always sent temperature: 0.7 to provider APIs, which caused reasoning-native models (gpt-5-mini, o-series) to reject requests entirely. Temperature is now omitted unless explicitly set by the user, letting each API use its own default (typically 1.0).

    If you previously relied on the implicit 0.7 default for less random responses, add temperature = 0.7 to your setup config or chat frontmatter.

    Note: temperature is no longer silently stripped when set alongside reasoning/thinking. If you explicitly set both, the API will reject the request — correct this by removing the temperature setting.

  • c5aac07: Split monolithic models.lua into per-provider data modules under lua/flemma/models/, allowing providers to declare their own model data via metadata.models. Added pricing.high_cost_threshold config option (default 30) replacing the hardcoded constant.

  • 3aa501b: Removed the signs feature and replaced it with a turns config schema (turns.enabled, turns.padding, turns.hl) and a FlemmaTurn highlight group linked to FlemmaRuler.

  • 2bb0d2a: Expose os.date, os.time, os.clock, and os.difftime in the template sandbox, enabling date/time formatting in expressions (e.g., {{ os.date("%B %d, %Y") }}). Dangerous os.* functions (execute, exit, getenv, remove, etc.) remain excluded.

  • 6278037: Extended the modeline parser with quote-aware tokenization, type coercion for positional arguments, single and double quote support with backslash escaping, comma-separated list values, and empty value handling (key= → nil, key="" → empty string).

  • 0371511: <Space> now toggles the entire message fold instead of the fold under the cursor. Nested folds (thinking, tool use/result) are closed along the way so the message reopens cleanly. Frontmatter folds are also toggled when the cursor is outside any message. Use za for the previous per-fold toggle behavior.

  • d7cea2e: Added turn detection and statuscolumn rendering module for visual turn boundaries in the gutter

  • fcf28d7: Template expressions now handle }} and %} inside Lua string literals, comments, and table constructors without breaking. Previously, {{ "email={{ customer.email }}" }} would crash because the parser matched the first }} it found regardless of context.

  • 0ba2eba: Added print() support in template code blocks — {% print("text") %} now emits directly into the template output instead of going to stdout. Arguments are concatenated with no separators and no trailing newline, giving full whitespace control to the template author.

  • ccd9646: Unified presets: config.tools.presets merged into top-level presets. Presets can now carry provider, model, parameters, and auto_approve fields — enabling composite presets like $explore that switch both model and tool approval in one :Flemma switch call. Built-in $default renamed to $standard (approves read, write, edit, find, grep, ls); $readonly updated to include find, grep, ls. Read-only tools (find, grep, ls) are now approved via the $standard preset instead of the sandbox auto-approval path. Schema validates preset key $ prefix at finalize via new MapNode deferred key validation. :Flemma status now shows (R) icon for runtime-sourced tool approvals.

Patch Changes

  • 8b4b516: Send document title metadata on Anthropic PDF blocks so Claude can see the filename
  • 5d...
Read more

v0.8.0

23 Mar 16:24
1fb8bb0

Choose a tag to compare

The gist…

Flemma v0.8.0 is primarily an infrastructure release. Flemma's configuration had grown into a tangle of mutable sources — setup, runtime switches, frontmatter, provider-internal parameter merges — each managed separately, overriding each other without clear precedence. It was becoming genuinely hard to reason about what the user actually intended. This release replaces all of that with a layered copy-on-write store where each source (defaults → setup → runtime → frontmatter) is an immutable layer with explicit priority. Nothing is mutated in place, so it's always clear which layer a value came from and why it won. The practical result: flemma.opt is now on par with the full configuration — anything expressible in setup() can be overridden per-buffer through frontmatter, including MongoDB-style list operators ($set, $append, $remove, $prepend) in JSON. Providers have been completely decoupled from global state — there's no longer a single shared instance. Each provider is constructed fresh for its request, scoped to the buffer's resolved config, so per-buffer provider and model overrides in frontmatter just work. :Flemma status was rebuilt to surface all of this — layer source indicators show exactly why Flemma is making the decisions it is. Tool previews now use structured label/detail formatting, and the secrets system emits resolver diagnostics when a key can't be found. Smaller additions: .chat buffers auto-prepend @You: on open, devicons integration, lualine format override, and "did you mean?" for unknown commands.

🧩 Enhancement: Per-Buffer Configuration

flemma.opt previously supported only a subset of what setup() could express. With the new layered store behind it, that gap is closed — anything you can configure globally can now be overridden per-buffer through frontmatter, with clear precedence over every other source.

In Lua frontmatter, the flemma.opt proxy gives you direct access:

```lua
flemma.opt.provider = "openai"
flemma.opt.thinking = "medium"
flemma.opt.tools:append("bash")
flemma.opt.tools:remove("write")
```

In JSON frontmatter, MongoDB-style operators control list mutations precisely:

```json
{
  "flemma": {
    "tools": {
      "$append": ["bash", "grep"],
      "$remove": "write",
      "auto_approve": { "$append": ["grep"] }
    }
  }
}
```

Only options you touch are written — everything else falls through to your global config. Frontmatter is evaluated passively on every edit, so your lualine component and :Flemma status reflect changes as you type. Errors during editing are silently preserved until you send.

Tool name typos are caught with "did you mean?" suggestions (e.g., bahs → "Did you mean 'bash'?").

See docs/templates.md for the full frontmatter reference and docs/configuration.md for config aliases and the layer model.

🔄 Enhancement: Request-Scoped Providers

Previously, a single global provider instance served every buffer in the Neovim session. Runtime switches affected everything, and there was no reliable way to pin a specific provider or model to a specific chat. That's gone — providers are now constructed per-request from the buffer's resolved config and discarded when the request completes. Set flemma.opt.provider = "openai" or flemma.opt.model = "o3" in frontmatter and it stays locked to that buffer regardless of what you do elsewhere. No runtime overrides leak in, no other buffer's :Flemma switch affects it.

This also lays groundwork for sub-agent workflows in future releases, where each agent buffer will need its own pinned provider and model.

🔍 Improvement: Redesigned :Flemma status

Because the layered store tracks where every value comes from, :Flemma status can now show you exactly why Flemma is making the decisions it is. The window uses a box-drawing tree layout with extmark-based highlighting, and every config value shows a layer source indicator (🆂 setup, 🆁 runtime, 🅵 frontmatter) — so if a frontmatter override is winning over your setup, you'll see it at a glance.

Thinking budget resolution is shown inline (e.g., "minimal → low"), and frontmatter diagnostics (parse errors, validation failures) appear directly in the status view.

✨ Improvement: Structured Tool Previews

Tool fold text now separates the LLM's stated intent from the raw parameters. A folded tool call might read:

Finding Python files — glob: "**/*.py"

The label (intent) and detail (parameters) use distinct highlight groups — FlemmaToolLabel (italic) and FlemmaToolDetail (dimmed, defaults to Comment) — so you can scan a long conversation and immediately see what each tool call was doing without unfolding it.

Custom tools can return { label = "...", detail = "..." } from their format_preview function to take advantage of this. Plain string returns still work as before.

See docs/ui.md for highlight group customization.

🔐 Improvement: Secrets Resolver Diagnostics

When an API key can't be found, Flemma now tells you exactly why. Each secret resolver (environment variables, macOS Keychain, gcloud, secret-tool) emits structured diagnostics explaining what it checked and what went wrong. The result is a single notification listing all attempted resolvers and their failure reasons, instead of a generic "key not found" message.

The gcloud binary path is also now configurable via secrets.gcloud.path for non-standard installations.

Minor Changes

  • 4ca6f8e: Added editing.auto_prompt option (default true) that prepends @You: to empty .chat buffers on open, giving new users a clear starting point.

  • d8a1187: Replaced the configuration system with a layered, schema-backed copy-on-write store.

    The new system introduces a schema DSL for declarative config shape definition, a four-layer store (DEFAULTS, SETUP, RUNTIME, FRONTMATTER) with separate scalar (top-down first-set-wins) and list (bottom-up accumulation) resolution, read/write proxy metatables for ergonomic access, and a DISCOVER callback pattern that lets tool, provider, and sandbox modules register their own config schemas at load time without coupling the schema definition to heavy modules.

    All configuration access now goes through a single public facade (require("flemma.config")). The legacy flat merge (vim.tbl_deep_extend in config.lua), the global config cache (state.get_config / state.set_config), and the per-buffer opt overlay (buffer/opt.lua) have all been removed. Frontmatter evaluation writes directly to the FRONTMATTER layer of the store, and flemma.opt is now a write proxy into that layer.

    Providers are now request-scoped — constructed inline per send_to_provider() call with per-buffer parameters, captured in closures, and GC'd after the request completes. The global mutable provider instance, the parameter override diffing machinery, and config_manager.lua have been dissolved into core.lua (orchestration) and provider/normalize.lua (pure parameter normalization functions).

    The approval system is unified into a single config resolver that reads the resolved tools.auto_approve from the layer store, replacing the previous two-resolver pattern (config + frontmatter at separate priorities). Preset deny lists have been removed — an auto-approve policy that denies is a contradiction.

    :Flemma status now shows right-aligned layer source indicators (D/S/R/F) on provider, model, parameter, and tool lines, and a verbose view with per-layer ops and a schema-walked resolved config tree.

    Test coverage includes 9 new config test suites (store, proxy, schema, definition, alias, list ops, DISCOVER, lens, integration) alongside migration of ~30 existing test files to the new facade.

  • 1cda981: Add deferred semantic validation to config schema nodes. Tool names in frontmatter and setup config are now validated against the tool registry at finalize time, with "did you mean?" suggestions for typos.

  • fb5f241: Added devicons integration that auto-registers a .chat file icon with nvim-web-devicons (or other compatible devicons plugins). Enabled by default — configure via integrations.devicons.enabled and integrations.devicons.icon.

  • 3fcb594: Fold previews now show tool labels (the LLM's stated intent) prominently, with raw technical detail visually subordinate.

    Tool format_preview functions can now return { label?, detail? } instead of a plain string, where detail may be a string[] (joined with double-space upstream for uniform display). Built-in tools (bash, read, write, edit, grep, find, ls) have been updated to use the structured return. String-returning format_preview functions are fully backward-compatible. New highlight groups FlemmaToolLabel (italic) and FlemmaToolDetail (default: Comment) style the two pieces independently. Label and detail are separated by an em-dash () in both folds and tool preview virtual lines.

  • 2c7661e: JSON frontmatter now supports MongoDB-style operators ($set, $append, $remove, $prepend) for config writes via the flemma key

  • 4248502: The lualine component now accepts a format option directly in the section config, which takes precedence over statusline.format in the Flemma config:

    { "flemma", format = "#{provider}:#{model}" }
  • 8d5b6a6: Passively evaluate frontmatter on InsertLeave, TextChanged, and BufEnter so integrations like lualine see up-to-date config values without waiting for a request send. On error, the last successful frontmatter parse is preserved.

    Refactored config.finalize() to return validation failures as data instead of accepting a reporter callback, making codeblock parsers pure data functi...

Read more

v0.7.0

16 Mar 08:12
1af711e

Choose a tag to compare

The gist…

Flemma v0.7.0 is the extensibility release. Your system prompts are now a full-blown template engine — {% if %}, {% for %}, parameterized includes, the works. A new personality system lets tools describe themselves so the LLM gets a tailored, project-aware system prompt out of the box. Credentials are managed by a new secrets module that resolves API keys from environment variables, macOS Keychain, GNOME Keyring, or gcloud CLI with zero configuration. An experimental LSP server brings hover inspection and go-to-definition to .chat buffers, and three new exploration tools (grep, find, ls) give the LLM the ability to navigate your codebase. For plugin authors, a hooks module emits lifecycle events at every stage of a request, and a preprocessor pipeline enables custom AST transforms. Day-to-day UX improves with a redesigned progress indicator (floating, phase-aware, always visible), a cursor engine that prevents focus-stealing during agent loops, and tmux-style statusline format strings for full control over your lualine component. On the reliability side: the parser no longer breaks on role markers inside fenced code blocks, AST parsing during streaming is now incremental (O(new content) instead of O(total)), and the provider layer shed ~370 lines of duplicated code.

🧩 New Feature: Template Engine

System and user messages now support {% lua code %} blocks for full control flow — conditionals, loops, variable assignment — alongside the existing {{ expression }} syntax. Whitespace trimming ({%- -%}, {{- -}}) keeps your output clean. Includes are now parameterized: {{ include('persona.md', { style = "brief" }) }} passes variables into the included file, where they're available as top-level identifiers. Included files support full template syntax at any nesting depth.

The template environment is extensible via templating.modules in your setup config — register custom populator modules that add globals to the Lua sandbox. Two built-in populators ship: stdlib (the standard library you already know) and iterators (providing values() and each() for concise array iteration with loop metadata like index, first, last).

Breaking change: Binary include mode now uses symbol keys ([symbols.BINARY], [symbols.MIME]) instead of the reserved strings "binary" and "mime", so those names are free to use as template variables.

See docs/templates.md for the full syntax reference, examples, and error behavior.

🎭 New Feature: Personality System

Personalities generate dynamic, tool-aware system prompts. Include one with {{ include('urn:flemma:personality:coding-assistant') }} in your @System: message and Flemma assembles a complete prompt that lists every enabled tool with descriptions, collects behavioral guidelines contributed by each tool, adds environment context (cwd, current file, git branch, date/time), and appends auto-discovered project files like CLAUDE.md, AGENTS.md, or .cursorrules.

Tool definitions contribute personality-scoped content via a new personalities field — snippets, guidelines, or any custom part names. The system is open: create your own personality module by implementing a single render(opts) function and registering it.

See docs/personalities.md for usage and authoring details.

🔐 New Feature: Secrets Module

Providers no longer manage their own credential lookup. They declare what they need (kind + service) and the secrets module resolves it through a chain of platform-aware resolvers tried in priority order:

  1. Environment variables — convention-based (ANTHROPIC_API_KEY) with alias support
  2. GNOME Keyring (Linux) — via secret-tool
  3. macOS Keychain — via security
  4. gcloud CLI — derives access tokens, with or without a service account

Results are cached with TTL awareness — configurable freshness scaling lets short-lived tokens (like gcloud's 1-hour access tokens) refresh before expiry. You can register custom resolvers (Vault, 1Password, team-specific stores) at runtime. Existing keyring entries stored under the previous scheme are still found via legacy fallback.

See the "Credential Resolution" section in docs/extending.md.

🔎 New Feature: Experimental LSP Server

Flemma now ships an in-process LSP server that attaches to every .chat buffer. Hover (K) returns structured information for every buffer position: expressions show their parsed AST, tool use/result blocks show IDs and metadata, thinking blocks show full untruncated content, role markers show message summaries, and frontmatter shows the parsed configuration. Go-to-definition (gd) navigates between tool use and tool result siblings, jumps to @./file references, and resolves {{ include() }} expressions to their target files.

Enabled by default when vim.lsp is available. Disable with experimental = { lsp = false } in setup.

See the experimental section in docs/configuration.md.

🗺️ New Feature: Exploration Tools

Three new tools give the LLM the ability to search and navigate your codebase: grep (content search with ripgrep/grep fallback), find (file discovery with fd/git-ls-files/find fallback), and ls (directory listing with depth control). All three respect the sandbox, auto-detect the best available backend, and truncate output to prevent context overflow.

Gated behind experimental = { tools = true } in setup. When sandbox auto-approval is active (the default), these tools run without manual confirmation.

See docs/tools.md for configuration options and backend details.

🪝 New Feature: Hooks Module

Flemma now emits User autocmds at key lifecycle points, enabling external plugins and custom integrations:

  • FlemmaRequestSending / FlemmaRequestFinished (with status: completed, cancelled, or errored)
  • FlemmaToolExecuting / FlemmaToolFinished (with tool name, ID, and status)
  • FlemmaBootComplete (when async tool sources finish loading)

The built-in bufferline.nvim integration is the first consumer — it shows a busy icon on .chat tabs while requests or tools are in-flight. The hooks module is the foundation for a growing plugin ecosystem.

See the "Hooks & Events" section in docs/extending.md and docs/integrations.md for the bufferline setup.

✨ New Feature: Progress Indicator & Cursor Engine

The streaming indicator has been completely redesigned. A persistent floating window shows character count, elapsed time, and a phase-specific animation throughout the full request lifecycle — including tool input buffering, which previously showed no progress for OpenAI and Vertex providers. The float appears at the bottom of the chat window when the progress extmark scrolls off-screen, with the spinner icon placed in the sign column to match the notification bar layout. Configurable via progress.highlight and progress.zindex.

Alongside this, a new cursor engine centralizes all cursor movement with focus-stealing prevention: system-initiated moves (tool results, response completion, autopilot transitions) are deferred until the user is idle, so the cursor no longer jumps away while you're reading or editing during an agent loop.


Minor Changes

  • d36de50: Added ast:diff command for side-by-side comparison of raw and rewritten ASTs, with syntax highlighting, folding, and cursor-aware scrolling. LSP hover now uses the same tree dump format for consistent AST inspection.

  • ba903a8: Add booting indicator for async tool sources: #{booting} lualine variable, FlemmaBootComplete autocmd, and ⏳ indicator in :Flemma status

  • 464a909: Added optional bufferline.nvim integration that shows a busy icon on .chat tabs while a request is in-flight. Configure with get_element_icon = require("flemma.integrations.bufferline").get_element_icon in your bufferline setup. Custom icons supported via get_element_icon({ icon = "+" }).

  • 235b8e1: Added centralized cursor engine with focus-stealing prevention. System-initiated cursor moves (tool results, response completion, autopilot) are now deferred until user idle, preventing cursor hijacking during agent loops. User-initiated moves (send, navigation) execute immediately.

  • 0c6e6cb: Added experimental in-process LSP server for chat buffers with hover and goto-definition support. Enable with experimental = { lsp = true } in setup. Every buffer position returns a hover result: segments (expressions, thinking blocks, tool use/result, text) show structured dumps, role markers show message summaries with segment breakdowns, and frontmatter shows language and code. Goto-definition (gd, <C-]>, etc.) on @./file references and {{ include() }} expressions jumps to the referenced file, reusing the navigation module's path resolution.

  • 92bd667: Added three exploration tools for LLM-powered codebase navigation: grep (content search with rg/grep fallback, --json match counting, per-line truncation), find (file discovery with fd/git-ls-files/find fallback, recursive patterns, configurable excludes), and ls (directory listing with depth control). All tools use existing truncation, sink, and sandbox infrastructure. Executor cwd resolution generalized from bash-specific to per-tool.

  • cf30657: Added file drift detection: warns when @./file references change between requests, helping identify cache breaks and potential LLM confusion from stale conversation context

  • 393e18d: Added <Space> keymap to toggle folds in .chat buffers. Configurable via keymaps.normal.fold_toggle; automatically skipped when the key conflicts with mapleader.

  • 749c1c7: Added hooks module for external plugin integration. Fl...

Read more

v0.6.0

08 Mar 13:13
5069d6d

Choose a tag to compare

The gist…

Flemma v0.6.0 is a big visual and structural release. Tool blocks now fold independently — Tool Use and Tool Result each get their own fold at level 2, and completed tools auto-fold after execution so your buffer stays clean. Fold text is syntax-highlighted per segment, showing tool names, input previews, and line counts in distinct colors. Role markers moved to their own line with the ruler integrated directly into the @Role: line (─ Assistant ─────…), giving conversations a cleaner visual rhythm; old-format files are auto-migrated on load. The notification bar was rewritten with a priority-based layout engine, compact Unicode symbols (Σ, #N, ↑↓), WCAG-contrast color tiers, and a gutter icon that frees up column space. A new diagnostics mode (diagnostics = { enabled = true }) lets contributors debug prompt caching by comparing consecutive API requests and warning when the prefix diverges — complete with byte-level diff view via :Flemma diagnostics:open. Model metadata enrichment adds per-model thinking budgets and cache pricing so thinking parameters are silently clamped to valid ranges instead of hitting API errors, and :Flemma status now displays context window, pricing, and thinking budget info. Under the hood, a deterministic JSON encoder sorts request keys for better prompt cache hit rates, and range extmarks replaced per-line highlights, dropping Neovim API calls from ~500 to ~20 per update. On the reliability side: fold auto-close race conditions that left blocks unfolded during streaming are fixed, API errors (non-SSE responses, HTML error pages) are now properly surfaced instead of silently swallowed, and UTF-8 content no longer breaks in fold previews.

Minor Changes

  • 6546355: Aligned all registry modules to a consistent API contract: every registry now exposes register(), unregister(), get(), get_all(), has(), clear(), and count(). Extracted shared name validation into a new flemma.registry utility module. Renamed tools registry define() to register() (define() kept as deprecated alias).
  • dea4561: Notification bar background is now a blend of Normal bg (base), StatusLine bg (30%), and DiffChange fg (20%), producing a subtly tinted bar that's easier to read against the editor background
  • 568fb63: Compact notification bar format: token arrows now follow numbers (129↑ 117↓), session request count is merged into the Σ label (Σ3), and the bar automatically uses relaxed double-spacing when width allows
  • bb15c08: Restore CursorLine visibility on line-highlighted chat buffer lines. Blended overlay highlights preserve role-specific backgrounds while showing the cursor line, with smart toggling via OptionSet and a fg-only thinking fold preview group.
  • 9459e97: Add deterministic key-ordered JSON encoder for prompt caching. API request bodies now serialize with sorted keys and provider-specific trailing keys (messages, tools) placed last, maximizing prefix-based cache hits across all providers.
  • 9c0f873: Added diagnostics mode for debugging prompt caching issues. When enabled via diagnostics = { enabled = true }, Flemma compares consecutive API requests per buffer and warns when the prefix diverges (breaking caching). Includes byte-level analysis, structural change detection, and a side-by-side diff view (:Flemma diagnostics:open).
  • a6618bd: Notification bar now derives all colors from DiffChange with three foreground tiers (primary, secondary, muted) and WCAG contrast enforcement on semantic cache colors. Added ^ contrast operator to highlight expressions and extracted color utilities into flemma.utilities.color for reuse.
  • bae5026: Extracted folding logic into dedicated ui/folding module with registry-based fold rules, O(1) cached fold map, and configurable auto_close per fold type (thinking, tool_use, tool_result, frontmatter)
  • c56f356: Added independent folding for Tool Use and Tool Result blocks at fold level 2. Completed and terminal tool blocks auto-fold after execution, reducing visual noise. In-flight tools (pending, approved, executing) remain visible. Fold summaries reuse the same preview format as pending tool extmarks.
  • 77cb82b: Added per-segment syntax highlighting to fold text lines. Fold lines now return {text, hl_group} tuples so each part (icon, title, tool name, preview, line count) uses its own highlight group. New config keys: tool_icon, tool_name, fold_preview, fold_meta. Renamed tool_use to tool_use_title and tool_result to tool_result_title for 1:1 correspondence with highlight groups. Added shared roles.lua utility for centralised role name mapping.
  • 0fc8bea: Merged ruler into role marker lines: @Role: now renders as ─ Role ─────... with the ruler extending to the window edge, replacing the separate virtual line above each message
  • 078a3a2: Enriched model metadata matrix with per-model thinking budgets, cache pricing, and cache minimum thresholds. Thinking parameters are now silently clamped to model-specific bounds instead of hitting runtime API errors. Cache percentage indicator is suppressed when input tokens are below the model's minimum cacheable threshold. Session pricing now uses per-model absolute cache costs where available, with provider-level multipliers as fallback.
  • b46f3ea: Rewrite notification bar with a priority-based layout engine and gutter icon. The 💬 prefix now renders in the gutter when space allows, freeing 3 columns for content. Renamed all FlemmaNotify_ highlight groups to FlemmaNotifications_ for consistency.
  • 5d646e1: Added configurable notifications.highlight and notifications.border options, and fixed notification misalignment when async plugins (git-signs, LSP) change gutter width after positioning
  • fe71464: Line highlights now use per-message range extmarks instead of per-line extmarks, reducing API calls from ~500 to ~20 per update. New lines created by pressing Return in insert mode are highlighted immediately via Neovim's gravity system instead of waiting for CursorHoldI.
  • 652e9f6: Reprioritized notification bar segments: session cost and request input tokens now survive truncation at narrow widths. Replaced word labels with compact Unicode symbols (Σ for session totals, #N for request count, bare percentage for cache).
  • 0c6e898: Role markers (@System:, @You:, @Assistant:) now occupy their own line in .chat buffers. Old-format files are automatically migrated on load, and a new :Flemma format command is available for manual migration. Insert-mode colon auto-newline moves the cursor to a new content line after completing a role marker.
  • 29ba841: :Flemma status now shows model metadata (context window, pricing, thinking budget range) in the Provider section for known models. Verbose mode includes a full Model Info dump. Syntax highlighting updated with model version suffixes, dollar amounts, and token count suffixes.
  • 46e6b25: Move shared utility modules to flemma.utilities.* namespace and introduce flemma.utilities.buffer for common buffer manipulation patterns

Patch Changes

  • b109b62: Cancel both Space and Enter after role marker auto-newline to prevent unwanted blank lines from muscle memory
  • acc51d0: Fixed spurious "A request is already in progress" warning during autopilot tool execution loops with sync tools
  • a870175: Fixed CursorLine overlay flashing on every keystroke when blink-cmp completion menu is open
  • 5de6e77: Fixed spurious "Cache break detected" diagnostics warning when switching between providers
  • b60a533: Fix diagnostics false positive when messages grow between turns. Cache-break warnings now only fire for actual prefix-breaking changes (tools, config, system prompt), not for normal message appends at the document tail.
  • 9cc706d: Fixed fold auto-close race condition where thinking blocks and tool blocks would remain unfolded ~10% of the time due to silent foldclose failures being permanently marked as successful. Also fixed folds not being applied when returning to a chat buffer after switching tabs during streaming.
  • 5c87b26: Fixed fold_completed_blocks firing redundantly on every cursor movement, spamming the debug log
  • 6bf2ed9: Fixed tool fold previews falling back to generic key=value format for tools registered via config.tools.modules (e.g. extras) by ensuring lazy modules are loaded before registry lookup
  • a57a6dc: Fixed preview truncation (fold text, tool indicators) using byte length instead of display width, which caused incorrect truncation and potential UTF-8 splitting with multibyte content (CJK, accented characters, Unicode symbols)
  • e098341: Fixed notification bar icon flickering during scrolling by replacing the 💬 emoji prefix with ℹ (U+2139), which renders reliably across terminal emulators
  • 720ddab: Fixed extra space in notification bar caused by stale item width alignment from dismissed notifications
  • 8686997: Fixed role_style attributes (e.g., underline) bleeding into ruler characters on role marker lines
  • 84442f0: Fixed self-closing thinking tags (<thinking .../>) creating unclosed folds that swallowed subsequent buffer content
  • 300525a: Fixed missing warning when pressing <C-]> while a request is already in progress — the keypress was silently ignored instead of showing the "Use <C-c> to cancel" message
  • f59d94f: Fixed silent failure when API returns non-SSE error responses (plain JSON, HTML error pages, or plain text). Errors are now properly surfaced via vim.notify instead of being silently swallowed.
  • 46da4a0: Fixed thinking blocks not auto-folding after the first response in a session
  • 932dc68: Tool block folds now absorb trailing blank lines when the next adjacent tool block is also foldable, producing a cleaner collapsed view without vertical gaps between folded blocks
  • 9386d8f: Notification recall now derives segments from ses...
Read more

v0.5.0

27 Feb 09:36
f2aee7b

Choose a tag to compare

The gist…

Flemma v0.5.0 is smarter about the things you shouldn't have to think about. Sandboxed bash commands are now auto-approved — if your sandbox backend is available, tool calls run without prompts, so agentic workflows feel seamless out of the box. Cancelling mid-stream is finally clean: hit <C-c> and orphaned tool calls resolve themselves instead of leaving you stuck in approval limbo. max_tokens defaults to "50%" of the model's output limit and percentages are a first-class config value, so you get sensible defaults without knowing every model's context window. Anthropic prompt caching switches to the auto-caching API, eliminating fragile edge cases when the conversation tail lands on an unusual message shape. Usage notifications have been redesigned with a compact two-column layout that foregrounds cost and cache hit rate (color-coded green/yellow), and rate-limit errors now surface retry-after timing and remaining quota straight from the API headers. Under the hood, a new per-buffer write queue eliminates the E565 textlock crashes that occurred when visual-mode plugins collided with streaming callbacks.

Minor Changes

  • 2350bd7: Added automatic handling of aborted responses: when a user cancels (<C-c>) mid-stream after tool_use blocks, orphaned tool calls are now automatically resolved with error results instead of triggering the approval flow. The abort marker (<!-- flemma:aborted: message -->) is preserved for the LLM on the last text-only assistant message so it can continue contextually.
  • 5c3aee7: Added max_input_tokens and max_output_tokens to all model definitions, enabling future context window awareness and cost prediction features
  • 681ebbf: Added flemma.sink module — a buffer-backed data accumulator that replaces in-memory string/table accumulators across the codebase. Sinks handle line framing, write batching, and lifecycle management behind an opaque API. Migrated cURL streaming, bash tool output, provider response buffering, thinking accumulation, and tool input accumulation to use sinks.
  • 2d24104: Use Anthropic's auto-caching API for the conversation tail breakpoint, replacing manual last-user-message walking with a more robust top-level cache_control field
  • 9aff386: Redesigned usage notifications with compact dotted-leader layout, cache hit percentage with conditional color highlighting, and arrow-based token display
  • c574d43: Show rate limit details (retry-after, remaining quota headers) in error notifications when API returns HTTP 429, with a fallback "Try again in a moment" hint when headers are unavailable
  • ee19164: Auto-approve bash tool when sandbox is enabled and a backend is available. A new resolver at priority 25 approves bash calls when sandboxing is active, so sandboxed sessions run without manual approval prompts by default. Users can opt out via tools.auto_approve_sandboxed = false in config, or by excluding bash from auto-approval in frontmatter (auto_approve:remove("bash")).
  • 8758bdd: Smart max_tokens: default is now "50%" (half the model's max output), percentage strings are resolved automatically, and integers exceeding the model limit are clamped with a warning. :Flemma status shows the resolved value alongside the percentage.

Patch Changes

  • 1991273: Fixed auto_write not consistently writing the buffer after tool execution, denied/rejected tool processing, and :Flemma import
  • 8058909: Fixed bwrap sandbox breaking nix commands on NixOS by using --symlink instead of --ro-bind for /run/current-system and /run/booted-system, preserving their symlink nature so nix can detect store paths correctly
  • e4afad6: Fixed role marker highlights losing foreground color when the base highlight group only defines background, and fixed spinner background not inheriting line highlight colors
  • b767a0d: Fixed pending tool blocks with user-provided content being silently discarded. When a user pastes output into a flemma:tool status=pending block and presses <C-]>, the content is now accepted as the tool result and sent to the provider instead of being replaced by a synthetic error.
  • 80eb9fc: Fixed E565 textlock errors when visual-mode plugins (e.g., targets.vim) hold textlock while streaming responses complete. All async buffer modifications now go through a per-buffer FIFO write queue that retries on textlock.
  • 0c333ef: Added FlemmaSinkCreated and FlemmaSinkDestroyed user autocmd events for observing sink lifecycle
  • 2d24104: Fixed non-deterministic tool ordering in Vertex provider that was causing implicit cache misses on every request

v0.4.0

23 Feb 12:56
92a21bf

Choose a tag to compare

The gist…

Flemma v0.4.0 turns the plugin into a proper extensible platform. Tool approval presets ($default, $readonly) let you go from zero config to a working agent loop — tools like read, write, and edit auto-approve out of the box while bash stays gated behind manual confirmation, and you can override any of it per-buffer via frontmatter. Third-party extensions are now first-class: drop a Lua module path (e.g., "3rd.tools.todos") into your config for providers, tools, approval resolvers, or sandbox backends — no require() boilerplate needed. Rich fold previews show what's hiding inside collapsed messages (tool names, commands, results) so you can skim long conversations at a glance. :Flemma status gained a full Approval section that shows you exactly which tools are auto-approved, denied, or awaiting manual confirmation, with frontmatter overrides clearly marked. Tools now resolve relative paths against the .chat file's directory instead of Neovim's cwd, matching how @./file references and {{ include() }} already work. On the reliability side, frontmatter evaluation was optimized from 2N+2 executions per dispatch down to exactly one, and the bwrap sandbox on NixOS no longer hides system packages. This release also removes all remaining Claudius-era compatibility shims — if you haven't migrated your config from require("claudius") to require("flemma") yet, now's the time.

Minor Changes

  • ffe72b3: tools.auto_approve now accepts a string[] of module paths (and mixed module paths + tool names). Internal approval resolver names use urn:flemma:approval:* convention; module-sourced resolvers are addressable by their module path directly.

  • fae1e16: Added dynamic module resolution for third-party extensions. Lua module paths (dot-notation strings like "3rd.tools.todos") can now be used in config.provider, config.tools.modules, config.tools.auto_approve, config.sandbox.backend, and flemma.opt.tools to reference third-party modules without explicit require() calls. Modules are validated at setup time and lazily loaded on first use.

  • 3cf9fe3: Refactor tool definitions to use ExecutionContext SDK — tools now code against ctx.path, ctx.sandbox, ctx.truncate, and ctx:get_config() instead of requiring internal Flemma modules directly

  • 75e34c8: Moved calculator and calculator_async tools from built-in definitions to lua/extras (dev-only); production builds no longer ship calculator tools

  • 974eac1: Auto-approve policy now expands $-prefixed preset references, allowing auto_approve = { "$default", "$readonly" } to union approve/deny lists from the preset registry. Config-level resolvers defer to frontmatter when it sets auto_approve, enabling per-buffer override of global presets.

  • ef6a932: Removed all backwards-compatibility layers from the Claudius-to-Flemma migration. This is a breaking change for users who still rely on any of the following:

    Removed: require("claudius") module fallback. The lua/claudius/ shim that forwarded to require("flemma") has been deleted. Update your config to require("flemma").

    Removed: legacy :Flemma* commands. The individual commands :FlemmaSend, :FlemmaCancel, :FlemmaImport, :FlemmaSendAndInsert, :FlemmaSwitch, :FlemmaNextMessage, :FlemmaPrevMessage, :FlemmaEnableLogging, :FlemmaDisableLogging, :FlemmaOpenLog, and :FlemmaRecallNotification have been removed. Use the unified :Flemma <subcommand> tree instead (e.g., :Flemma send, :Flemma cancel, :Flemma message:next).

    Removed: "claude" provider alias. Configs specifying provider = "claude" will no longer resolve to "anthropic". Update your configuration to use "anthropic" directly.

    Removed: reasoning_format config field. The deprecated reasoning_format type annotation (alias for thinking_format) has been removed from flemma.config.Statusline.

    Removed: resolve_all_awaiting_execution() internal API. This backwards-compatibility wrapper in flemma.tools.context has been removed. Use resolve_all_tool_blocks() and filter for the "pending" status group instead.

  • 50eea2b: Rich fold text previews for message blocks. Folded @Assistant messages now show tool use previews (e.g. bash: $ free -h | bash: $ cat /proc/meminfo (+1 tool)), and folded @You messages show tool result previews with resolved tool names (e.g. calculator_async: 4 | calculator_async: 8). Expression segments are included in fold previews, consecutive text segments are merged, and runs of whitespace are collapsed to keep previews compact.

  • 5b637d2: Added an Approval section to :Flemma status showing auto-approve, deny, and require-approval classification per tool with preset expansion. Frontmatter overrides are marked with ✲ on individual items across Tools, Approval, Parameters, and Autopilot sections, with a conditional legend at the bottom.

  • cd97ff5: Added tool approval presets for zero-config agent loops. Flemma now ships with $readonly and $default presets. The default auto_approve is { "$default" }, which auto-approves read, write, and edit while keeping bash gated behind manual approval. Users can define custom presets in tools.presets and reference them in auto_approve. Frontmatter supports flemma.opt.tools.auto_approve:remove("$default") and :remove("read") for per-buffer overrides.

  • 0617d2c: Changed tool execute function signature from (input, callback, ctx) to (input, ctx, callback?) — sync tools no longer need a placeholder _ argument, and callback-last ordering matches Node.js conventions

  • 5de4f32: Tools now resolve relative paths against the .chat buffer's directory (__dirname) instead of Neovim's working directory, matching the behavior of @./file references and {{ include() }} expressions. The tools.bash.cwd config defaults to "$FLEMMA_BUFFER_PATH" (set to nil to restore the previous cwd behavior).

  • ff794c4: Added tool approval presets configuration field and wired preset registry into plugin initialization with { "$default" } as the default auto_approve policy

Patch Changes

  • 5035b41: Fixed flemma.opt.tools.auto_approve:append() failing when auto_approve was not explicitly assigned first in frontmatter

  • 4062653: Fixed bwrap sandbox hiding NixOS system packages by re-binding /run/current-system read-only after the /run tmpfs mount

  • 93b79e8: Frontmatter is now evaluated exactly once per dispatch cycle instead of 2N+2 times (where N = number of tool calls), reducing redundant sandbox executions and preventing potential side-effects from repeated evaluation.

  • ec0072b: Updated model definitions with latest pricing and availability data from all three providers.

    Anthropic: Removed retired Claude Sonnet 3.7 and Claude Haiku 3.5 models (retired Feb 19, 2026). Updated Claude Haiku 3 deprecation comment to reflect April 2026 retirement date.

    Vertex AI: Added Gemini 3.1 Pro Preview (gemini-3.1-pro-preview). Removed superseded preview-dated aliases gemini-2.5-flash-preview-09-2025 and gemini-2.5-flash-lite-preview-09-2025.

    OpenAI: No changes — all existing models and pricing confirmed current against official documentation.

v0.3.0

18 Feb 09:14
826635d

Choose a tag to compare

The gist…

Tool approval gets a major UX upgrade — pending tool calls now show inline virtual-line previews so you can see exactly what you're approving or rejecting without unfolding anything, and every built-in tool (bash, read, edit, write, calculator) ships a tailored preview formatter. A new :Flemma status command gives you a one-glance dashboard of your runtime state: provider, model, merged parameters, autopilot, sandbox, and enabled tools. Config gets simpler too: model = "$preset-name" lets you point your default at an existing preset instead of duplicating provider/model/parameters at the top level. Under the hood, tool execution has been unified into a single three-phase algorithm with explicit status semantics (pending → approved/denied/rejected), replacing the old split between autopilot and manual flows — the result is more predictable behavior and cleaner .chat files. Model defaults have been refreshed: Claude Sonnet 4.6 is now the default Anthropic model and o3-pro has been added to the OpenAI roster. On the stability side, the bash tool no longer chokes on heredoc commands, JSON null values from LLM responses no longer crash tool definitions, and cross-provider parameter merges correctly preserve provider-specific keys like project_id when switching via presets.

Minor Changes

  • e5a9b6f: Added :Flemma status command that displays comprehensive runtime status (provider, model, merged parameters, autopilot state, sandbox state, enabled tools) in a read-only scratch buffer. Use :Flemma status verbose for full config dump. :Flemma autopilot:status and :Flemma sandbox:status now open the same status view with cursor positioned at the relevant section.
  • 9fc147c: Tool definitions can now provide an optional format_preview function for custom preview text in tool status blocks. All built-in tools (calculator, bash, read, edit, write) include tailored previews showing the most relevant input at a glance.
  • 6f8b455: Added support for model = "$preset-name" in config to use a preset as the startup default, avoiding duplication of provider/model/parameters at the top level
  • f20492f: Added virtual line previews inside tool status blocks showing a compact summary of the tool call, so users can see what they are approving or rejecting
  • 9bd2785: Unified tool execution into a three-phase advance algorithm with explicit status semantics (flemma:tool status=pending|approved|rejected|denied), replacing the old flemma:pending marker and separate autopilot/manual flows
  • 299702f: Added Claude Sonnet 4.6 as the new default Anthropic model, removed retired chatgpt-4o-latest, added o3-pro snapshot, and updated Gemini 2.0 retirement dates

Patch Changes

  • 6a5cb12: Fixed Sonnet 4.6 to use adaptive thinking instead of deprecated budget_tokens, clamped max effort to high on non-Opus models, and added budget_tokens < max_tokens guard for budget-based models
  • e4933aa: Preview text for tool blocks and folded messages now sizes dynamically to the editor width instead of using a fixed 72-character limit
  • 41c130b: Fixed bash tool failing with heredoc commands by replacing { cmd; } 2>&1 group wrapping with exec 2>&1 prefix
  • 1ca55b2: Fixed cross-provider parameter merge bug where provider-specific config keys (e.g., project_id) were silently dropped when switching providers via presets
  • e4ddd0b: Fixed JSON null values decoding as vim.NIL (truthy userdata) instead of Lua nil, causing crashes in tool definitions when LLMs send null for optional parameters like offset, limit, timeout, and delay
  • f88449f: Fixed thinking preview counter disappearing when models emit whitespace-only text before thinking blocks (e.g. Opus 4.6 with adaptive thinking)
  • 0af66ea: Moved session reset API from require("flemma.state").reset_session() to require("flemma.session").get():reset()

v0.2.0

16 Feb 07:51
c3cd0bd

Choose a tag to compare

The gist…

Flemma v0.2.0 is the first semver release and a major step up from the initial preview. The headline feature is autopilot – an autonomous agent loop that executes approved tool calls, feeds results back to the model, and repeats until the task is done or a tool needs manual approval. Tool execution is now sandboxed by default on Linux: shell commands run inside a read-only rootfs with write access limited to your project directory, keeping your system safe from runaway agents. A new unified thinking parameter lets you set thinking effort once (e.g., thinking = "high") and have it work across Anthropic, OpenAI, and Vertex AI, with five levels from minimal to max. The approval registry gives fine-grained control over which tools auto-approve, configurable globally, per-buffer via frontmatter, or through custom plugin resolvers. On the provider side, this release adds Gemini 3 support with native thinking levels, adaptive thinking for Claude Opus 4.6+, proactive OAuth2 token refresh for Vertex AI, and proper error surfacing for safety-filtered responses and stream errors across all providers.

Minor Changes

  • 7cccfc6: Adopted semantic versioning (semver) and changesets for automated version management and changelog generation. The project transitions from the previous CalVer (vYY.MM-N) scheme to standard semver, starting at 0.1.0.
  • c22dd05: Added Anthropic stop reason handling (max_tokens warns, refusal/sensitive surface as errors) and adaptive thinking for Opus 4.6+ models (auto-detected, sends effort level instead of deprecated budget_tokens)
  • 4471a07: Added autopilot: an autonomous tool execution loop that transforms Flemma into a fully autonomous agent. After each LLM response containing tool calls, autopilot executes approved tools, collects results, and re-sends the conversation automatically – repeating until the model stops calling tools or a tool requires manual approval. Includes per-buffer frontmatter override (flemma.opt.tools.autopilot), runtime toggle commands (:Flemma autopilot:enable/disable/status), configurable turn limits, conflict detection for user-edited pending blocks, and full cancellation safety via Ctrl-C.
  • 05809d5: Added minimal and max thinking levels, expanding from 3 to 5 gradations (minimal | low | medium | high | max). Budget values for low (1024 → 2048) and high (32768 → 16384) were adjusted to align with upstream defaults and make room for the new levels. Each provider maps the canonical levels to its API: Anthropic maps minimallow and passes max on Opus 4.6; OpenAI maps maxxhigh for GPT-5.2+; Vertex maps minimalMINIMAL (Flash) or LOW (Pro) and clamps max to HIGH.
  • 907b787: Added filesystem sandboxing for tool execution. Shell commands now run inside a read-only rootfs with write access limited to configurable paths (project directory, .chat file directory, /tmp by default). Enabled by default with auto-detection of available backends; silently degrades on platforms without one. Includes Bubblewrap backend (Linux), pluggable backend registry for custom/future backends, per-buffer overrides via frontmatter, runtime toggle via :Flemma sandbox:enable/disable/status, and comprehensive documentation.
  • 76c635e: Added Gemini 3 model support: uses thinkingLevel enum (LOW/MEDIUM/HIGH) instead of numeric thinkingBudget for gemini-3-pro and gemini-3-flash models
  • e6b53e2: Added approval resolver registry and per-buffer approval via frontmatter. Tool approval is now driven by a priority-based chain of named resolvers – global config, per-buffer frontmatter (flemma.opt.tools.auto_approve), and custom plugin resolvers are all evaluated in order. Consolidated tool documentation into docs/tools.md.
  • 629dfda: Sandbox enforcement for write and edit tools – both now check sandbox.is_path_writable() before modifying files and refuse operations outside rw_paths
  • dcaa5be: Add unified thinking parameter that works across all providers – set thinking = "high" once instead of provider-specific thinking_budget or reasoning. The default is "high" so all providers use maximum thinking out of the box. Provider-specific parameters still take priority when set. Also promotes cache_retention to a general parameter, consolidates output_has_thoughts into the capabilities registry, clamps sub-minimum thinking budgets instead of disabling, and supports flemma.opt.thinking in frontmatter for provider-agnostic overrides.
  • 93f4b68: Added proactive token refresh and reactive auth-error recovery for Vertex AI provider, eliminating the need to manually run :Flemma switch when OAuth2 tokens expire

Patch Changes

  • c22dd05: Fixed OpenAI top-level stream error events being silently discarded; they now properly surface as errors
  • a59da49: Fixed tool completion indicators being prematurely dismissed during concurrent execution and autopilot
  • 784fe5a: Fixed Vertex AI safety-filtered responses silently appearing as successful completions; SAFETY, RECITATION, and other error finish reasons now properly surface as errors
  • 5b6b5af: Fixed Vertex AI thinking signature retention during streaming; empty or non-string thoughtSignature chunks no longer overwrite a valid cached signature
  • 784fe5a: Fixed Vertex AI tool response format to use output key instead of result, matching the Google SDK convention
  • 7bf8d64: Fixed Vertex AI tool declarations rejecting nullable types by switching to parametersJsonSchema on v1beta1 API
  • 9995605: Flash a brief "● Pending" indicator on tool result headers awaiting user approval