Skip to content

Misc. bug: Repeated "No parser definition detected" messages in console output #20310

@EverchangerL

Description

@EverchangerL

Name and Version

llama-server; version: 1 (23fbfcb); built with MSVC 19.50.35725.0 for x64

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server.exe --model Crimson-Constellation-12B-IQ3_XXS.gguf --host 127.0.0.1 --port 5001 --ctx-size 16384 --gpu-layers 99 --cache-type-k q4_0 --cache-type-v q4_0 --no-webui --no-jinja --parallel 1 --kv-unified --backend-sampling --direct-io

Problem description & steps to reproduce

I am using llama-server with SillyTavern as the frontend via the Text Completion API. Tool calling is not used in this setup. After one of the recent commits, the console output started continuously printing the following message: No parser definition detected, assuming pure content parser.

The message appears after each generated token, which results in the console being heavily spammed.
This does not appear to affect the actual text generation or model output, but it significantly clutters the console logs.

First Bad Commit

I don't know for sure, but it should be something about #18675

Relevant log output

Logs
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 6143 MiB):
  Device 0: NVIDIA GeForce RTX 2060, compute capability 7.5, VMM: yes, VRAM: 6143 MiB (5105 MiB free)
build: 1 (23fbfcb) with MSVC 19.50.35725.0 for x64
system info: n_threads = 4, n_threads_batch = 4, total_threads = 8

system_info: n_threads = 4 (n_threads_batch = 4) / 8 | CUDA : ARCHS = 750 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

init: using 7 threads for HTTP server
Web UI is disabled
start: binding port with default address family
main: loading model
srv    load_model: loading model 'Crimson-Constellation-12B-IQ3_XXS.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 5457 MiB of device memory vs. 5105 MiB of free device memory
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 1376 MiB
llama_params_fit_impl: context size set by user to 16384 -> no change
llama_params_fit: failed to fit params to free device memory: n_gpu_layers already set by user to 99, abort
llama_params_fit: fitting params to free memory took 0.49 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 2060) (0000:01:00.0) - 5105 MiB free
llama_model_loader: direct I/O is enabled, disabling mmap
llama_model_loader: loaded meta data with 36 key-value pairs and 363 tensors from Crimson-Constellation-12B-IQ3_XXS.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Crimson Constellation 12B
llama_model_loader: - kv   3:                           general.basename str              = Crimson-Constellation
llama_model_loader: - kv   4:                         general.size_label str              = 12B
llama_model_loader: - kv   5:                          llama.block_count u32              = 40
llama_model_loader: - kv   6:                       llama.context_length u32              = 131072
llama_model_loader: - kv   7:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv   8:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   9:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  10:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  12:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  13:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  14:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  15:                           llama.vocab_size u32              = 131075
llama_model_loader: - kv  16:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  17:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  18:                         tokenizer.ggml.pre str              = tekken
llama_model_loader: - kv  19:                      tokenizer.ggml.tokens arr[str,131075]  = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv  20:                  tokenizer.ggml.token_type arr[i32,131075]  = [3, 3, 3, 3,3, 3, 3, 3, 3, 3, 3, 3, ...
llama_model_loader: - kv  21:                      tokenizer.ggml.merges arr[str,269443]  = ["─а ─а", "─а t", "e r", "i n", "─а ─...
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  23:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  24:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  25:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  26:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  27:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{ bos_token}}{% if messages[0]['rol...
llama_model_loader: - kv  29:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  30:               general.quantization_version u32              = 2
llama_model_loader: - kv  31:                          general.file_type u32              = 23
llama_model_loader: - kv  32:                      quantize.imatrix.file str              = F:\LLM\1\imatrix.gguf
llama_model_loader: - kv  33:                   quantize.imatrix.dataset str              = F:\LLM\Stuff\Soft\_imatrix.hybrid.txt
llama_model_loader: - kv  34:             quantize.imatrix.entries_count u32              = 280
llama_model_loader: - kv  35:              quantize.imatrix.chunks_count u32              = 14
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type q4_K:   40 tensors
llama_model_loader: - type q5_K:    1 tensors
llama_model_loader: - type iq3_xxs:  120 tensors
llama_model_loader: - type iq3_s:   41 tensors
llama_model_loader: - type iq2_s:   80 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = IQ3_XXS - 3.0625 bpw
print_info: file size   = 4.60 GiB (3.23 BPW)
load: 0 unused tokens
load: printing all EOG tokens:
load:   - 2 ('</s>')
load:   - 131073 ('<|im_end|>')
load: special tokens cache size = 1003
load: token to piece cache size = 0.8499 MB
print_info: arch                  = llama
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 131072
print_info: n_embd                = 5120
print_info: n_embd_inp            = 5120
print_info: n_layer               = 40
print_info: n_head                = 32
print_info: n_head_kv             = 8
print_info: n_rot                 = 128
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 128
print_info: n_embd_head_v         = 128
print_info: n_gqa                 = 4
print_info: n_embd_k_gqa          = 1024
print_info: n_embd_v_gqa          = 1024
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-05
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 14336
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = 0
print_info: rope type             = 0
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 131072
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = 13B
print_info: model params          = 12.25 B
print_info: general.name          = Crimson Constellation 12B
print_info: vocab type            = BPE
print_info: n_vocab               = 131075
print_info: n_merges              = 269443
print_info: BOS token             = 1 '<s>'
print_info: EOS token             = 2 '</s>'
print_info: EOT token             = 131073 '<|im_end|>'
print_info: UNK token             = 0 '<unk>'
print_info: LF token              = 1010 '─К'
print_info: EOG token             = 2 '</s>'
print_info: EOG token             = 131073 '<|im_end|>'
print_info: max token length      = 150
load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = true)
load_tensors: offloading output layer to GPU
load_tensors: offloading 39 repeating layers to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors:          CPU model buffer size =   275.01 MiB
load_tensors:        CUDA0 model buffer size =  4433.78 MiB
.......................................................................................
common_init_result: added </s> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
llama_context: constructing llama_context
llama_context: setting backend sampler for seq_id 0 (n = 10)
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 16384
llama_context: n_ctx_seq     = 16384
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = true
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     2.00 MiB
llama_kv_cache:      CUDA0 KV buffer size =   720.00 MiB
llama_kv_cache: size =  720.00 MiB ( 16384 cells,  40 layers,  1/1 seqs), K (q4_0):  360.00 MiB, V (q4_0):  360.00 MiB
sched_reserve: reserving ...
sched_reserve: Flash Attention was auto, set to enabled
sched_reserve:      CUDA0 compute buffer size =   564.52 MiB
sched_reserve:  CUDA_Host compute buffer size =    52.01 MiB
sched_reserve: graph nodes  = 1291
sched_reserve: graph splits = 2
sched_reserve: reserve took 13.04 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv    load_model: initializing slots, n_slots = 1
no implementations specified for speculative decoding
slot   load_model: id  0 | task -1 | speculative decoding context not initialized
slot   load_model: id  0 | task -1 | new slot, n_ctx = 16384
srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache
srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
init: chat template, example_format: '[INST] You are a helpful assistant
Hello [/INST]Hi there</s>[INST] How are you? [/INST]'
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://127.0.0.1:5001
main: starting the main loop...
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /tokenize 127.0.0.1 200
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> +logit-bias -> ?penalties -> ?dry -> top-n-sigma -> +top-k -> ?typical -> ?top-p -> +min-p -> ?xtc -> +temp-ext -> adaptive-p
slot launch_slot_: id  0 | task 0 | processing task, is_child = 0
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 16384, n_keep = 0, task.n_tokens = 1558
slot update_slots: id  0 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot init_sampler: id  0 | task 0 | init sampler, took 0.34 ms, tokens: text = 1558, total = 1558
slot update_slots: id  0 | task 0 | prompt processing done, n_tokens = 1558, batch.n_tokens = 1558
sched_reserve: reserving ...
sched_reserve:      CUDA0 compute buffer size =   564.52 MiB
sched_reserve:  CUDA_Host compute buffer size =    52.01 MiB
sched_reserve: graph nodes  = 1254
sched_reserve: graph splits = 2
sched_reserve: reserve took 62.02 ms, sched copies = 1
No parser definition detected, assuming pure content parser.srv  log_server_r: done request: POST /completion 127.0.0.1 200
No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming pure content parser.No parser definition detected, assuming purecontent parser.slot print_timing: id  0 | task 0 |
prompt eval time =    2657.51 ms /  1558 tokens (    1.71 ms per token,   586.26 tokens per second)
       eval time =   15991.44 ms /   200 tokens (   79.96 ms per token,    12.51 tokens per second)
      total time =   18648.95 ms /  1758 tokens
No parser definition detected, assuming pure content parser.slot      release: id  0 | task 0 | stop processing: n_tokens = 1757, truncated = 0
srv  update_slots: all slots are idle
No parser definition detected, assuming pure content parser.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions