Skip to content

V2 update plan #2

@PicoCreator

Description

@PicoCreator

Latest version of https://github.com/saharNooby/rwkv.cpp has new quantization format (breaking change?) and GPU offload (!!!)
Since this might be potentially breaking changes, its gonna be a v2 update.

  • update to the newer version, which has breaking change on the model (might be backwards compat)
  • (to confirm if not backwards compat) create a new set of ".bin" files for the new version
  • make changes to the API, to add support for GPU offload (its a param now in the new version, on how many layers you want to offload to the GPU)
  • for input inference, update to the new batch mode API (10x faster)
  • (stretch goal) change to async API
  • support for world model / world tokenizer (we can detect this using the token count)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions