Page Attention is a widely used method for llm serving. It splits the KVCache of a request into multiple blocks and each block contains multiple slots (tokens). I think that the Page Attention might hinder the Streaming LLM since we can not evict some slots within a block. So I want to know how swiftinfer integrate with PA ?
ref: [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention