Skip to content

Non-deterministic output at temperature=0 — possible memory corruption #62

@unamedkr

Description

@unamedkr

Description

Sending the identical prompt twice to the same server with temperature=0.0 produces different outputs. The second response often contains corrupted text (Cyrillic characters, garbled tokens), suggesting uninitialized memory or state corruption between requests.

Steps to Reproduce

# Start server
./build-metal/quant-server SmolLM2-1.7B-Instruct-Q8_0.gguf -p 8080

# First request
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}],"temperature":0.0,"max_tokens":30}'
# Response 1: "2+2 is equal to 4." (coherent)

# Second request (identical)
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}],"temperature":0.0,"max_tokens":30}'
# Response 2: "2+2 = 4\nОтвет: 4" (Cyrillic corruption)

Expected Behavior

With temperature=0.0 (greedy decoding), identical inputs must produce identical outputs every time.

Impact

  • Severity: P0 — Breaks reproducibility, a fundamental requirement for testing and production
  • Suggests memory corruption or uninitialized state in the KV cache between requests
  • May be related to the KV cache reuse feature (chat-mode optimization)

Root Cause Hypothesis

The KV cache from the previous request may not be fully cleared/reset before the next request. If the cache reuse logic incorrectly detects a "match" or leaves stale data, the attention computation reads corrupted values, producing non-deterministic output.

Suggested Investigation

  1. Check if the KV cache is properly reset between unrelated requests
  2. Verify that memset/zero-initialization happens on all state buffers
  3. Test with KV cache reuse disabled to isolate the issue
  4. Run under AddressSanitizer (-fsanitize=address) to detect memory issues

Environment

  • quant.cpp: latest main (49c6605)
  • Model: SmolLM2-1.7B-Instruct-Q8_0.gguf
  • Build: cmake -DTQ_BUILD_METAL=ON
  • OS: macOS 15 (Apple M3)

Reported by ClawTeam Claw-5 (Researcher persona)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions