Skip to content

Adding the support of CCL to the Prefilling of Disaggregated Serving#825

Open
vjanfaza wants to merge 1 commit intoquic:mainfrom
vjanfaza:CCL-Main
Open

Adding the support of CCL to the Prefilling of Disaggregated Serving#825
vjanfaza wants to merge 1 commit intoquic:mainfrom
vjanfaza:CCL-Main

Conversation

@vjanfaza
Copy link
Contributor

@vjanfaza vjanfaza commented Mar 5, 2026

In this PR, I have added the support of CCL during prefilling of Disaggregated Serving. In the current version, we only have the support of CCL during decoding of DA which results in very high TTFT for larger Context Lengths. With this added we can compile the model with the largest CL and yet get good TTFT for smaller PL using the related CCL value instead of CL.
These changes don't affect other applications and are only related to Disaggregated Serving and only prefilling of Disaggregated Serving.

@quic-rishinr
Copy link
Contributor

@vjanfaza please rebase the PR

Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants