Skip to content

Releases: SomeOddCodeGuy/WilmerAI

v0.62.2

18 Apr 20:14
8d5c76c

Choose a tag to compare

Major New Features

  1. End-to-End Tool Call Passthrough — Full tool calling support across all LLM handlers (Claude, OpenAI, Ollama) with both streaming and non-streaming paths. Frontend API handlers extract tools and tool_choice from incoming requests, thread them through the workflow pipeline, and forward them to backend LLM handlers. Backend handlers parse tool call data from LLM responses and return it through the response pipeline to the frontend client. A new allowTools boolean on workflow node configs (default false) gates which nodes forward tools, so memory nodes, summarizers, and categorizers silently suppress tool calls during internal processing. OpenAI format is used as the internal standard; Claude and Ollama handlers convert to/from their native formats. Streaming tool call chunks bypass all text processing (prefix stripping, think-block removal, group-chat reconstruction) and are emitted directly as SSE.

  2. DelimitedChunker Workflow Node — New node type that splits content on a delimiter and returns the first N (head) or last N (tail) chunks, rejoined with the same delimiter. Useful for trimming logs, CSV rows, or section-separated documents. Configurable via content, delimiter, mode ("head"/"tail"), and count properties. Supports variable substitution in both content and delimiter fields.

  3. Conversation Variable Formatting Controls — Two new formatting options for chat_user_prompt_* workflow variables. Node-level addUserAssistantTags (boolean) prepends User: / Assistant: / System: role prefixes to each message in conversation variable strings. User-level separateConversationInVariables (boolean) with conversationSeparationDelimiter (string) replaces the default newline between messages with a custom delimiter.

  4. Node-Level Image Controls — Standard nodes can now control image passthrough via acceptImages (boolean, preserves images on conversation messages sent to the LLM) and maxImagesToSend (integer, limits total images sent keeping the most recent; 0 = no limit). Images are trimmed oldest-first.

  5. /v1/chat/completions Versioned Route — Added /v1/chat/completions as the primary versioned route for the OpenAI-compatible API. The existing /chat/completions is kept as an alias for backward compatibility.

Bug Fixes

  1. Curly Brace Escaping in Agent Outputs/Inputs — Fixed str.format() crashes when agent outputs, agent inputs, or enriched tool call text contain literal curly braces (e.g., JSON from tool calls or files loaded by GetCustomFile). Uses a two-pass sentinel-escaping system: literal braces in variable values are replaced with sentinel tokens before formatting, then restored afterward.

  2. Category Matching With Underscores — Fixed _match_category failing to match category keys containing underscores (e.g., NEW_INSTRUCTION). The existing code stripped punctuation (including underscores) from the LLM output but compared against raw keys. Both sides are now normalized before comparison.

Code Quality

  1. Numeric Config Field Resolution — Replaced ad-hoc maxResponseSizeInTokens variable resolution with a table-driven _resolve_numeric_config_fields() method that handles all ~30 integer config fields and 1 float field in a single pass at the start of node processing.

Bug Fix Pt 2

  1. Concurrency Limiting — Fixed issue where the concurrency issue was stopping GET endpoints from responding, so models wouldn't load in frontends while another call was going through. Now only POST endpoints, which hit the LLM APIs should be limited.

  2. Dependabot Bumps — Dependabot bumped a couple of dependency versions for CVEs.

0.62.1 - Tool Calling

13 Apr 01:03
9ce5b98

Choose a tag to compare

Major New Features

  1. End-to-End Tool Call Passthrough — Full tool calling support across all LLM handlers (Claude, OpenAI, Ollama) with both streaming and non-streaming paths. Frontend API handlers extract tools and tool_choice from incoming requests, thread them through the workflow pipeline, and forward them to backend LLM handlers. Backend handlers parse tool call data from LLM responses and return it through the response pipeline to the frontend client. A new allowTools boolean on workflow node configs (default false) gates which nodes forward tools, so memory nodes, summarizers, and categorizers silently suppress tool calls during internal processing. OpenAI format is used as the internal standard; Claude and Ollama handlers convert to/from their native formats. Streaming tool call chunks bypass all text processing (prefix stripping, think-block removal, group-chat reconstruction) and are emitted directly as SSE.

  2. DelimitedChunker Workflow Node — New node type that splits content on a delimiter and returns the first N (head) or last N (tail) chunks, rejoined with the same delimiter. Useful for trimming logs, CSV rows, or section-separated documents. Configurable via content, delimiter, mode ("head"/"tail"), and count properties. Supports variable substitution in both content and delimiter fields.

  3. Conversation Variable Formatting Controls — Two new formatting options for chat_user_prompt_* workflow variables. Node-level addUserAssistantTags (boolean) prepends User: / Assistant: / System: role prefixes to each message in conversation variable strings. User-level separateConversationInVariables (boolean) with conversationSeparationDelimiter (string) replaces the default newline between messages with a custom delimiter.

  4. Node-Level Image Controls — Standard nodes can now control image passthrough via acceptImages (boolean, preserves images on conversation messages sent to the LLM) and maxImagesToSend (integer, limits total images sent keeping the most recent; 0 = no limit). Images are trimmed oldest-first.

  5. /v1/chat/completions Versioned Route — Added /v1/chat/completions as the primary versioned route for the OpenAI-compatible API. The existing /chat/completions is kept as an alias for backward compatibility.

Bug Fixes

  1. Curly Brace Escaping in Agent Outputs/Inputs — Fixed str.format() crashes when agent outputs, agent inputs, or enriched tool call text contain literal curly braces (e.g., JSON from tool calls or files loaded by GetCustomFile). Uses a two-pass sentinel-escaping system: literal braces in variable values are replaced with sentinel tokens before formatting, then restored afterward.

  2. Category Matching With Underscores — Fixed _match_category failing to match category keys containing underscores (e.g., NEW_INSTRUCTION). The existing code stripped punctuation (including underscores) from the LLM output but compared against raw keys. Both sides are now normalized before comparison.

Code Quality

  1. Numeric Config Field Resolution — Replaced ad-hoc maxResponseSizeInTokens variable resolution with a table-driven _resolve_numeric_config_fields() method that handles all ~30 integer config fields and 1 float field in a single pass at the start of node processing.

Bug Fix Pt 2

  1. Concurrency Limiting — Fixed issue where the concurrency issue was stopping GET endpoints from responding, so models wouldn't load in frontends while another call was going through. Now only POST endpoints, which hit the LLM APIs should be limited.

v0.62 - Tool Calling

12 Apr 21:25
7dc535d

Choose a tag to compare

Major New Features

  1. End-to-End Tool Call Passthrough — Full tool calling support across all LLM handlers (Claude, OpenAI, Ollama) with both streaming and non-streaming paths. Frontend API handlers extract tools and tool_choice from incoming requests, thread them through the workflow pipeline, and forward them to backend LLM handlers. Backend handlers parse tool call data from LLM responses and return it through the response pipeline to the frontend client. A new allowTools boolean on workflow node configs (default false) gates which nodes forward tools, so memory nodes, summarizers, and categorizers silently suppress tool calls during internal processing. OpenAI format is used as the internal standard; Claude and Ollama handlers convert to/from their native formats. Streaming tool call chunks bypass all text processing (prefix stripping, think-block removal, group-chat reconstruction) and are emitted directly as SSE.

  2. DelimitedChunker Workflow Node — New node type that splits content on a delimiter and returns the first N (head) or last N (tail) chunks, rejoined with the same delimiter. Useful for trimming logs, CSV rows, or section-separated documents. Configurable via content, delimiter, mode ("head"/"tail"), and count properties. Supports variable substitution in both content and delimiter fields.

  3. Conversation Variable Formatting Controls — Two new formatting options for chat_user_prompt_* workflow variables. Node-level addUserAssistantTags (boolean) prepends User: / Assistant: / System: role prefixes to each message in conversation variable strings. User-level separateConversationInVariables (boolean) with conversationSeparationDelimiter (string) replaces the default newline between messages with a custom delimiter.

  4. Node-Level Image Controls — Standard nodes can now control image passthrough via acceptImages (boolean, preserves images on conversation messages sent to the LLM) and maxImagesToSend (integer, limits total images sent keeping the most recent; 0 = no limit). Images are trimmed oldest-first.

  5. /v1/chat/completions Versioned Route — Added /v1/chat/completions as the primary versioned route for the OpenAI-compatible API. The existing /chat/completions is kept as an alias for backward compatibility.

Bug Fixes

  1. Curly Brace Escaping in Agent Outputs/Inputs — Fixed str.format() crashes when agent outputs, agent inputs, or enriched tool call text contain literal curly braces (e.g., JSON from tool calls or files loaded by GetCustomFile). Uses a two-pass sentinel-escaping system: literal braces in variable values are replaced with sentinel tokens before formatting, then restored afterward.

  2. Category Matching With Underscores — Fixed _match_category failing to match category keys containing underscores (e.g., NEW_INSTRUCTION). The existing code stripped punctuation (including underscores) from the LLM output but compared against raw keys. Both sides are now normalized before comparison.

Code Quality

  1. Numeric Config Field Resolution — Replaced ad-hoc maxResponseSizeInTokens variable resolution with a table-driven _resolve_numeric_config_fields() method that handles all ~30 integer config fields and 1 float field in a single pass at the start of node processing.

v0.61 - Dependabot update

05 Apr 15:32
56230a6

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.6...v0.61

v0.6 - Multi-user improvements, more memory and consistency improvements, and lots of bug fixes

29 Mar 22:57
1df4274

Choose a tag to compare

v0.6 - March 2026

Major New Features

  1. ContextCompactor Workflow Node — New node type that summarizes conversation messages into two rolling summaries (Old + Oldest) using token-based windowing. Separate from the memory system; designed for recency-aware conversation compaction. Uses XML-style tags and configurable via a settings file.

  2. Automatic Memory Condensation — Optional condensation layer for file-based memories. After enough new memories accumulate (configurable threshold), the oldest batch is LLM-summarized into a single condensed entry, reducing file bloat over long conversations.

  3. Per-Message Image Association — Major refactor replacing synthetic {"role": "images"} messages with a per-message "images" key. Images now stay associated with their originating message from ingestion through to LLM dispatch. Includes OpenAI multimodal content parsing on ingestion.

  4. Claude API Image Support — Full image support for the Claude handler. Supports base64, data URIs, and HTTP URLs. Uses PIL/Pillow for format detection (optional, falls back to JPEG). Images placed before text per Anthropic recommendation.

  5. Per-User Encryption — When an API key is provided via Authorization: Bearer, files are stored in isolated per-key directories. Optional Fernet encryption (AES-128-CBC + HMAC-SHA256, PBKDF2 key derivation) can be enabled per user. Transparent plaintext-to-encrypted migration. Includes a re-keying script.

  6. Multi-User Support — A single WilmerAI instance can now serve multiple users via repeated --User flags. Full per-user isolation: per-user config reads, request-scoped user identification, per-user log directories, aggregated models/tags endpoints.

  7. WSGI Concurrency Limiting Middleware — New --concurrency (default: 1) and --concurrency-timeout (default: 900s) CLI flags on all entry points. Requests exceeding the limit queue until a slot opens or timeout (503). Implemented at the WSGI layer so the semaphore holds across streaming responses.

Bug Fixes

  1. SillyTavern Streaming Hang — Fixed streaming hang when using SillyTavern as a front end.

  2. Open WebUI Streaming Error — Restored JSON heartbeat format (was changed to bare newline, causing JSONDecodeError in Open WebUI's NDJSON parser).

  3. Memory Generation Stalling — Fixed memory generation never triggering after the first run due to empty-message hash collision when front-end injects Author's Note with only a [DiscussionId] tag.

  4. GetCurrentMemoryFromFile Returning Wrong Data — Was sharing a code path with GetCurrentSummaryFromFile and returning rolling chat summary instead of memory chunks. Now correctly returns memory chunks.

  5. Image Lookback Default Regression — Restored default lookback window from 5 back to 10 (was silently halved).

  6. Multi-Word Prefix Detection in Streaming — Fixed StreamingResponseHandler failing to strip multi-word response prefixes (e.g., "AI: ").

  7. Data URI Stripping Before LLM Dispatch — Hardened image key stripping to cover all image formats when llm_takes_images is False.

Hardening and Security

  1. Dependency Pinning — All dependencies pinned to exact versions (==) to mitigate supply chain attacks. Updated several packages including requests (CVE fix), cryptography (reverted to 46.0.5, pre-supply-chain-attack window).

  2. Thread Safety — Per-discussion locks in timestamp service, context compactor, and RAG tool. Thread-safe globals via threading.local(). Lock dictionaries capped at 500 with LRU eviction. Atomic file writes (temp + rename).

  3. Sensitive Logging / Prompt Redaction — New sensitive_logging_utils module. All log statements exposing user content converted to redactable versions. Redaction activates when encryption is enabled or redactLogOutput: true.

  4. JSON Parsing Hardening — Incoming API handlers now use get_json(force=True, silent=True) returning 400 instead of unhandled 500 on invalid JSON.

  5. Configurable Categorization Retries — Removed hardcoded 4-retry loop; now configurable via maxCategorizationAttempts (default: 1).

Code Quality

  1. Optimized variable generation — Conversation-slice variables only computed when referenced in the prompt.

  2. Lazy-load time_context_summary — Skips file I/O when the variable isn't referenced.

v0.5 - Better message variables for prompts, some new nodes, and memory fixes

09 Feb 03:26
4771775

Choose a tag to compare

Summary

NOTE: This introduces new variables to help deprecate variables like "chat_user_prompt_last_twenty". I'm not getting rid of those, for backwards compatibility purposes, but going forward we don't need them as much.

New Workflow Nodes

  • JsonExtractor node: extracts fields from JSON in LLM responses without an additional LLM call
  • TagTextExtractor node: extracts content between XML/HTML-style tags without an additional LLM call

Configurable Prompt Variables

  • nMessagesToIncludeInVariable: node property to control how many messages are included in chat/templated prompt variables
  • estimatedTokensToIncludeInVariable: token-budget-based message selection, accumulates recent messages up to a token limit
  • minMessagesInVariable + maxEstimatedTokensInVariable: combo mode pulling a minimum message count then filling up to a token budget

Token Estimation

  • Recalibrated rough_estimate_token_length word ratio (1.538 -> 1.35 tokens/word)
  • Added configurable safety_margin parameter (default 1.10)

Memory System Fixes

  • Fixed file_exists check that was permanently disabling message-threshold triggers for new conversations
  • Fixed off-by-one in trigger comparisons (> to >=)
  • Added HTTP session cleanup via close() to prevent keep-alive connections from blocking llama.cpp slots
  • Split timeouts into (connect, read) tuples
  • Added diagnostic logging for memory trigger decisions

Code Quality

  • Fixed bare except clauses to except Exception in cancellation paths
  • Added prompt-aware info logging for configurable variable slicing

Example Workflow Configs

  • Updated all example workflow JSON files to use new configurable variable syntax

v0.4.1 - Small hotfix for memories

05 Jan 03:53
a437d1e

Choose a tag to compare

What's Changed

  • Corrected an issue with memory system due to recent change removing the imagespecific handlers. by @SomeOddCodeGuy in #82

v0.4 - Workflow collections, bug fixes, test UI, and some simplification

04 Jan 21:26
f9f2a6e

Choose a tag to compare

What's Changed

  • Fix oldest message chunk being silently discarded in memory generation
  • Fix incorrect new message count causing duplicate processing of memorized messages
  • Fix pytest.ini test path case sensitivity

Features:

  • Add shared workflow collections and workflow selection via API model field (/v1/models and /api/tags endpoints)
  • Add workflow node execution summary logging with timing info
  • Add workflowConfigsSubDirectoryOverride for shared workflow folders
  • Add sharedWorkflowsSubDirectoryOverride for custom shared folder names
  • Add {Discussion_Id} and {YYYY_MM_DD} variables for file paths
  • Add variable substitution support for maxResponseSizeInTokens
  • Add web-based setup wizard (setup_wizard_web.py) (this is a WIP and may be temporary/replaced)
  • Add vector memory resumability with per-chunk hash logging

Refactors:

  • Consolidated image handlers into standard handlers (remove ~700 lines)
  • Standardize preset/workflow naming convention (hyphenated)
  • Archive legacy workflows to _archive subdirectories
  • Add pre-configured shared workflow folders

Simplification:

  • Updated preset names to match endpoint names. Now it makes more sense, as you can more easily use presets to make sure each endpoint gets the appropriate settings.
  • The _example_general_workflow is the one stop shop for example productivity workflows, and thanks to the custom workflow system its easier to spin more off easily. You can just drop new folders into _shared within workflows and suddenly have new workflows available to you as models. I'll make a video about this later.
  • Dropped the imagespecific handlers. Finally. Those were something I did early on and I just kept putting off dealing with them, but they always annoyed me. Regular handlers have the image frameworks added in, if they support it.

Tests:

  • Update tests for corrected memory hash behavior
  • Added tests for new workflow override features

v0.3.1

07 Dec 23:14
ac447fc

Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0 - API swapped, Claude Support added, other fixes

13 Oct 02:50
8b4963b

Choose a tag to compare

  • Added support for the Claude llm_api
  • Replaced Flask exposed runnable api with Eventlet for MacOS/Linux, and Waitress for Windows
  • Fixed the unit tests not running in Windows properly
  • Corrected two places where a slash at the end of the llm_api url and at the end of the ConfigDirectory folder name caused a break
  • Added attempt at proper cancellation ability, where pressing "stop" in open webui or other front-ends will appropriately end a workflow and cascade down to an LLM
    • Some llm apis work with this, some don't. This should appropriately kill Wilmer and its workflows, but an LLM api in the middle of processing a prompt may not be compelled to stop.
  • Added the ability to replace Endpoints and Presets with variables
    • Limited to hardcoded variables at top of workflow, or agentXInputs from parent workflows