[BugFix] Refine the preparation of cpu and storage cache#5777
[BugFix] Refine the preparation of cpu and storage cache#5777Jiang-Jia-Jun merged 8 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
This PR refactors the cache preparation logic to unify the handling of CPU and storage cache within the prefix_cache_manager module. The main motivation is to consolidate scattered cache preparation code and eliminate duplication between CPU and storage cache handling.
Key Changes:
- Integrated storage cache matching and preparation logic into the
request_match_blocksmethod in prefix_cache_manager - Removed the separate
get_storage_cached_blocksmethod from resource_manager_v1 (though calls to it remain in some code paths outside this diff) - Split the
gpu_cpu_cache_prepare_timemetric into separatecpu_cache_prepare_timeandstorage_cache_prepare_timemetrics for better observability
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/engine/sched/resource_manager_v1.py | Removed duplicate storage cache logic and unused import; updated get_prefix_cached_blocks to handle new consolidated metrics |
| fastdeploy/cache_manager/prefix_cache_manager.py | Integrated storage cache matching into request_match_blocks; added storage cache preparation logic with proper block allocation and recycling; updated return value of issue_prefetch_storage_task to return count instead of list |
| fastdeploy/engine/request.py | Renamed metric field from gpu_cpu_cache_prepare_time to cpu_cache_prepare_time to better reflect its purpose |
| fastdeploy/cache_manager/cache_metrics.py | Added storage cache token tracking throughout metrics calculation and reporting |
| fastdeploy/cache_manager/cache_transfer_manager.py | Improved debug logging by showing counts instead of full lists for better performance |
| benchmarks/benchmark_serving.py | Updated to use new cpu_cache_prepare_time metric name |
| benchmarks/backend_request_func.py | Updated to use new cpu_cache_prepare_time metric name |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5777 +/- ##
==========================================
Coverage ? 66.71%
==========================================
Files ? 347
Lines ? 44354
Branches ? 6810
==========================================
Hits ? 29591
Misses ? 12578
Partials ? 2185
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e#5777) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info
Motivation
统一准备cpu和storage的cache
Modifications
prefix-cache-manager模块
Usage or Command
不变
Accuracy Tests
后续更新ci镜像后添加单侧
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.