Does SwiftInfer integrate well with Page Attention ? 

Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that the Page Attention might hinder the Streaming LLM since we can not evict some slots within a block. So I want to know how swiftinfer integrate with PA ?

ref: [[2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention](https://arxiv.org/abs/2309.06180)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does SwiftInfer integrate well with Page Attention ? #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does SwiftInfer integrate well with Page Attention ? #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions