Skip to content

CPU/RAM Offload CPU+GPU inference #10557

Description

@ro-mak

Is your feature request related to a problem? Please describe.
Problem - no way to use all available memory VRAM + RAM

Describe the solution you'd like
Hi, as far as i know you either have GPU or CPU support for models, is there a plan to add mixed inference? It will be great for MoE models for example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions