Skip to content

Phi-3 support not propagated from quant.h to libturboquant (quant-server broken) #67

@unamedkr

Description

@unamedkr

Description

PR #65 added full Phi-3/Phi-3.5 support to quant.h (single-header), but the changes were not propagated to the split source files used by libturboquant and quant-server. As a result:

  • tools/phi3_infer_test.c (uses quant.h) → works perfectly
  • quant-server (uses libturboquant) → still broken

Evidence

quant.h (single-header):

tq_load_gguf: loaded 32 layers (32 self_attn)   ← correct

Output: "Gravity is a fundamental force that attracts two bodies towards each other..."

quant-server (libturboquant):

tq_load_gguf: loaded 32 layers (0 self_attn)    ← still broken

Output: garbage tokens

Files that need Phi-3 changes ported

The following changes from quant.h need to be mirrored in the split sources:

Feature quant.h Needs porting to
Fused attn_qkv detection src/engine/tq_gguf.c
Fused ffn_up_gate detection src/engine/tq_gguf.c
LongRoPE factor loading src/engine/tq_gguf.c
Fused QKV matmul + split src/engine/tq_transformer.c
Fused gate||up FFN src/engine/tq_transformer.c
NeoX-style RoPE rotation src/engine/tq_transformer.c
Phi-3 BOS token handling src/engine/tq_generate.c
Layer dispatch for gguf_w_qkv src/engine/tq_transformer.c

Impact

  • quantcpp serve phi3.5:mini launches the server but inference is garbage
  • Users who follow the README to serve Phi-3.5 will get broken output
  • The Python Model class also uses libturboquant via ctypes, so it's also affected

Workaround

Compile a shared library from quant.h directly and use a Python wrapper server:

cc -O2 -shared -fPIC -o libquant_phi3.dylib -x c - -lm -lpthread <<< '#define QUANT_IMPLEMENTATION
#include "quant.h"'
python3 phi35_server.py 8080

This workaround is functional (tested: 8 tok/s, coherent output, streaming works).

Suggested Fix

Sync the Phi-3 changes from quant.h into the split source tree. Consider adding a CI check that validates quant.h and src/engine/*.c produce identical inference output for all supported architectures.

Environment


Reported by ClawTeam — verified via Claw-4 (Optimizer) retest

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions