Skip to content

V2 update plan #2

@PicoCreator

Description

@PicoCreator

Latest version of https://github.com/saharNooby/rwkv.cpp has new quantization format (breaking change?) and GPU offload (!!!)
Since this might be potentially breaking changes, its gonna be a v2 update.

  • update to the newer version, which has breaking change on the model (might be backwards compat)
  • (to confirm if not backwards compat) create a new set of ".bin" files for the new version
  • make changes to the API, to add support for GPU offload (its a param now in the new version, on how many layers you want to offload to the GPU)
  • for input inference, update to the new batch mode API (10x faster)
  • (stretch goal) change to async API
  • support for world model / world tokenizer (we can detect this using the token count)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions