A simulation game for testing and validating the tools_at_end feature in the qvac LLM addon. It simulates the KV cache behavior of dynamic tool placement without requiring a model — pure Python, no dependencies.
The tools_at_end feature places tool definitions at the end of prompts and trims them from the KV cache after each completion. This prevents stale tool token accumulation across multi-turn conversations.
The game ports the C++ cache logic from qvac-lib-infer-llamacpp-llm into Python:
DynamicToolsState(tool boundary tracking)removeLastNTokens(cache trimming)applyContextDiscard(context sliding during generation)CacheManagersave/load (session persistence)- BOS tokens, generation prompt tokens, Qwen3 reasoning injection
python3 dynamic-tools-game.py
python3 dynamic-tools-game.py testRuns 21 test scenarios with 124 assertions covering:
- Single/multi-turn with changing tools
- Session save/reload/switch
- Context sliding with tools
- Qwen3 reasoning token injection
- Stop with EOT token
- Stateless (no-session) prompts
- BOS and generation prompt token handling
- Sliding that would eat into tool tokens (regression)
python3 dynamic-tools-game.py playConfigure context parameters at startup, then play turn by turn:
=== Dynamic Tools KV Cache Game ===
Configure context parameters (press Enter for defaults):
Context size (n_ctx) [2048]:
Tokens to discard on slide (n_discarded) [256]:
BOS tokens [1]:
Gen prompt tokens [3]:
Actions (single letter shortcuts):
u— add user messagea— add assistant messages— add system messaget— add tool response (result from tool execution)T— add Tool definitions (weather/search/email/calc/custom)g— generate (eval + generate + trim)S— Save sessionl— load/switch sessiond— show detailed cache contentsr— reset stateq— quit
Add multiple tools at once with first-letter shortcuts:
Tools: w s e # adds weather + search + email
# Turn 1: user asks question
s 10 # system message, 10 tokens
u 15 # user message, 15 tokens
T w s # add weather + search tools
g 20 # generate 20 tokens -> trims tools + generated
# Turn 2: tool response (model called a tool)
t 30 # tool response, 30 tokens
T w s # same tools (model might call another)
g 15 # generate -> trim
# Turn 3: final answer (no tool call), then new user question
a 10 # assistant's final answer
u 20 # new user question
T e c # new tools for new question
g 25 # generate -> trim
After each generation, the game:
- Shows a color-coded KV cache diagram
- Reports nPast, firstMsgTokens, nPastBeforeTools, nSlides
- Validates no stale tool tokens leaked into the cache
- Tracks your score (turns without violations)
python3 dynamic-tools-game.py autoRuns an infinite agentic loop based on the first principles:
- Tools are always at the end, always trimmed after generation
- After tool call: add tool response + same tools
- After final answer: add assistant response + user message + new tools
Randomly samples token counts, tool sets, and tool-call vs final-answer decisions. Validates after every turn that the cache contains only expected tokens (system, user, assistant, tool responses). Runs until context overflow or a bug is found.
=== Dynamic Tools — Infinite Agentic Loop ===
Configure (press Enter for defaults):
Context size (n_ctx) [2048]:
Tokens to discard on slide (n_discarded) [256]:
Random seed (0=random) [0]: 42
The simulation found two real bugs in the C++ implementation:
-
Context sliding during generation doesn't adjust
nPastBeforeTools— when sliding shifts tokens down, the tool trim boundary becomes stale, leaving tool tokens in the cache after trim. -
Sliding can eat into tool tokens — when
nDiscardedexceeds the number of conversation tokens betweenfirstMsgTokensand the tool boundary, the discard removes tool tokens instead of just conversation tokens.
Both bugs have been fixed in the game and in the C++ code (TextLlmContext.cpp, MtmdLlmContext.cpp).