Memory usage and slowness question

In CPU build, I experienced this issue both here and in the llama.cpp version. 


GGML is built for edge AI so resource constrained devices but Both in CLI and as python, even for a small text of say 64 tokens the code seems to run **very slow** when the RAM available is 10Gb or low but runs really FAST when with more RAM like 15-20GB. I used 6 threads. With less threads again things slow. 

Something is wrong. Have you encountered this ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage and slowness question #16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory usage and slowness question #16

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions