Skip to content

olyasir/cache_game

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Dynamic Tools KV Cache Game

A simulation game for testing and validating the tools_at_end feature in the qvac LLM addon. It simulates the KV cache behavior of dynamic tool placement without requiring a model — pure Python, no dependencies.

What it simulates

The tools_at_end feature places tool definitions at the end of prompts and trims them from the KV cache after each completion. This prevents stale tool token accumulation across multi-turn conversations.

The game ports the C++ cache logic from qvac-lib-infer-llamacpp-llm into Python:

  • DynamicToolsState (tool boundary tracking)
  • removeLastNTokens (cache trimming)
  • applyContextDiscard (context sliding during generation)
  • CacheManager save/load (session persistence)
  • BOS tokens, generation prompt tokens, Qwen3 reasoning injection

Three modes

Test mode (automated validation)

python3 dynamic-tools-game.py
python3 dynamic-tools-game.py test

Runs 21 test scenarios with 124 assertions covering:

  • Single/multi-turn with changing tools
  • Session save/reload/switch
  • Context sliding with tools
  • Qwen3 reasoning token injection
  • Stop with EOT token
  • Stateless (no-session) prompts
  • BOS and generation prompt token handling
  • Sliding that would eat into tool tokens (regression)

Play mode (interactive)

python3 dynamic-tools-game.py play

Configure context parameters at startup, then play turn by turn:

=== Dynamic Tools KV Cache Game ===
Configure context parameters (press Enter for defaults):

  Context size (n_ctx) [2048]:
  Tokens to discard on slide (n_discarded) [256]:
  BOS tokens [1]:
  Gen prompt tokens [3]:

Actions (single letter shortcuts):

  • u — add user message
  • a — add assistant message
  • s — add system message
  • t — add tool response (result from tool execution)
  • T — add Tool definitions (weather/search/email/calc/custom)
  • ggenerate (eval + generate + trim)
  • SSave session
  • lload/switch session
  • d — show detailed cache contents
  • rreset state
  • qquit

Add multiple tools at once with first-letter shortcuts:

Tools: w s e    # adds weather + search + email

Typical agentic flow

# Turn 1: user asks question
s 10    # system message, 10 tokens
u 15    # user message, 15 tokens
T w s   # add weather + search tools
g 20    # generate 20 tokens -> trims tools + generated

# Turn 2: tool response (model called a tool)
t 30    # tool response, 30 tokens
T w s   # same tools (model might call another)
g 15    # generate -> trim

# Turn 3: final answer (no tool call), then new user question
a 10    # assistant's final answer
u 20    # new user question
T e c   # new tools for new question
g 25    # generate -> trim

After each generation, the game:

  • Shows a color-coded KV cache diagram
  • Reports nPast, firstMsgTokens, nPastBeforeTools, nSlides
  • Validates no stale tool tokens leaked into the cache
  • Tracks your score (turns without violations)

Auto mode (infinite stress test)

python3 dynamic-tools-game.py auto

Runs an infinite agentic loop based on the first principles:

  1. Tools are always at the end, always trimmed after generation
  2. After tool call: add tool response + same tools
  3. After final answer: add assistant response + user message + new tools

Randomly samples token counts, tool sets, and tool-call vs final-answer decisions. Validates after every turn that the cache contains only expected tokens (system, user, assistant, tool responses). Runs until context overflow or a bug is found.

=== Dynamic Tools — Infinite Agentic Loop ===
Configure (press Enter for defaults):

  Context size (n_ctx) [2048]:
  Tokens to discard on slide (n_discarded) [256]:
  Random seed (0=random) [0]: 42

Bugs found by this game

The simulation found two real bugs in the C++ implementation:

  1. Context sliding during generation doesn't adjust nPastBeforeTools — when sliding shifts tokens down, the tool trim boundary becomes stale, leaving tool tokens in the cache after trim.

  2. Sliding can eat into tool tokens — when nDiscarded exceeds the number of conversation tokens between firstMsgTokens and the tool boundary, the discard removes tool tokens instead of just conversation tokens.

Both bugs have been fixed in the game and in the C++ code (TextLlmContext.cpp, MtmdLlmContext.cpp).

About

kv cache game

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages