-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Latest version of https://github.com/saharNooby/rwkv.cpp has new quantization format (breaking change?) and GPU offload (!!!)
Since this might be potentially breaking changes, its gonna be a v2 update.
- update to the newer version, which has breaking change on the model (might be backwards compat)
- (to confirm if not backwards compat) create a new set of ".bin" files for the new version
- make changes to the API, to add support for GPU offload (its a param now in the new version, on how many layers you want to offload to the GPU)
- for input inference, update to the new batch mode API (10x faster)
- (stretch goal) change to async API
- support for world model / world tokenizer (we can detect this using the token count)
Metadata
Metadata
Assignees
Labels
No labels