在推理llm的示例中,encode速度异常的缓慢,在v10中:
[Info] encoding length: 28, decoding length: 183, encoding speed: 33.8989 tokens/s, decoding speed: 31.8915 tokens/s
context length: 211/8192 tokens
但是在v8中(虽然v8的输出不太对):
[Info] encoding length: 27, decoding length: 222, encoding speed: 52.5056 tokens/s, decoding speed: 23.3753 tokens/s
context length: 249/8192 tokens
在推理llm的示例中,encode速度异常的缓慢,在v10中:
[Info] encoding length: 28, decoding length: 183, encoding speed: 33.8989 tokens/s, decoding speed: 31.8915 tokens/s
context length: 211/8192 tokens
但是在v8中(虽然v8的输出不太对):
[Info] encoding length: 27, decoding length: 222, encoding speed: 52.5056 tokens/s, decoding speed: 23.3753 tokens/s
context length: 249/8192 tokens