[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4yJader wants to merge 8 commits into
Conversation
Replace the in-tree telefuser/cache_mem cache with cacheseek as the cross-request cache middleware. - service (container/task_service/api_server): build and drive (CacheService, TeleFuserCacheAdapter); per request build_query -> lookup -> apply_resume -> on_response -> save - lingbot_world_fast: world_kv hooks (on_runtime_created / on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse; enable rolling KV window (local_attn_size=7, sink_size=3) - remove legacy telefuser/cache_mem + service/cache/cache_factory| cache_service and the cache_mem unit tests - pin torch==2.7.0 + torchvision==0.22.0 - docs: update latent_cache (en/zh)
…arch-v2) arch-v2 退役了 cacheseek.core,CacheConfig 现从顶层 `cacheseek` 导出。cache 与 nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config, 导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors TeleFuser’s latent cache feature by removing the in-tree cache_mem implementation and replacing it with an optional CacheSeek-backed integration. The integration is wired through the service container / task service, exposed via CLI flags, and documented (EN/ZH), while preserving “cache disabled by default” behavior.
Changes:
- Replace TeleFuser-local latent cache wiring with CacheSeek
(cache_service, cache_adapter)lifecycle hooks (lookup/resume/save) and fail-fast startup when enabled but CacheSeek is missing. - Add CLI/server config plumbing for
--enable-latent-cacheand--cache-mode, plus unit tests covering lazy import and failure semantics. - Update Wan2.2 service examples and LingBot World Fast runtime hooks for CacheSeek reuse, and refresh latent-cache docs (EN/ZH).
Reviewed changes
Copilot reviewed 47 out of 53 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/viewer/weight_viewer.py | Minor formatting cleanup in number formatting. |
| tools/deploy/show_stat.py | Minor formatting/quoting cleanup in output strings. |
| tools/deploy/docker_monitor.py | Minor formatting cleanup in output strings. |
| tests/unit/service/test_latent_cache_task_service.py | New test validating CacheSeek lifecycle calls from MediaGenerationService. |
| tests/unit/service/test_latent_cache_cli.py | New CLI/container tests for lazy import and fail-fast behavior. |
| tests/unit/pipelines/wan_video/test_service_examples.py | New tests ensuring service examples import without CacheSeek present. |
| tests/unit/cache_mem/test_types_and_config.py | Removed legacy cache_mem unit tests. |
| tests/unit/cache_mem/test_storage.py | Removed legacy cache_mem unit tests. |
| tests/unit/cache_mem/test_metadata.py | Removed legacy cache_mem unit tests. |
| tests/unit/cache_mem/test_concurrency.py | Removed legacy cache_mem concurrency tests. |
| tests/unit/cache_mem/init.py | Removed legacy cache_mem test package init. |
| telefuser/service/main.py | Thread latent-cache flags into server config at startup. |
| telefuser/service/core/task_service.py | Switch cache flow to CacheSeek adapter (build_query/lookup/apply_resume/on_response/save). |
| telefuser/service/core/container.py | Lazy-import CacheSeek factory; fail fast on missing/failed init; store adapter in container. |
| telefuser/service/core/config.py | Add cache_mode to ServerConfig for service-level override plumbing. |
| telefuser/service/cache/cache_service.py | Removed legacy TeleFuser cache service implementation. |
| telefuser/service/cache/cache_factory.py | Removed legacy TeleFuser cache factory implementation. |
| telefuser/service/cache/init.py | Mark legacy cache namespace as deprecated (no longer a facade). |
| telefuser/service/api/api_server.py | Forward cache_adapter into API service initialization. |
| telefuser/pipelines/lingbot_world_fast/session.py | Add optional world_kv_binding + runtime state for cached-latent fast-forward. |
| telefuser/pipelines/lingbot_world_fast/pipeline.py | Add world-KV fast-forward hook points and decode-only cached chunk path. |
| telefuser/entrypoints/cli/main.py | Add --enable-latent-cache and --cache-mode options and forward into run_server. |
| telefuser/cache_mem/vector_store/qdrant.py | Removed legacy cache_mem vector store code. |
| telefuser/cache_mem/vector_store/interfaces.py | Removed legacy cache_mem vector store code. |
| telefuser/cache_mem/vector_store/faiss.py | Removed legacy cache_mem vector store code. |
| telefuser/cache_mem/vector_store/init.py | Removed legacy cache_mem vector store exports. |
| telefuser/cache_mem/strategies.py | Removed legacy cache_mem strategy implementation/registry. |
| telefuser/cache_mem/storage/memory.py | Removed legacy cache_mem storage backend. |
| telefuser/cache_mem/storage/local_file.py | Removed legacy cache_mem storage backend. |
| telefuser/cache_mem/storage/interfaces.py | Removed legacy cache_mem storage interfaces. |
| telefuser/cache_mem/storage/fluxon.py | Removed legacy cache_mem storage stub. |
| telefuser/cache_mem/storage/init.py | Removed legacy cache_mem storage exports. |
| telefuser/cache_mem/state/interfaces.py | Removed legacy cache_mem state interfaces. |
| telefuser/cache_mem/src/models/qwen3_vl_reranker.py | Removed legacy cache_mem model code. |
| telefuser/cache_mem/src/models/qwen3_vl_embedding.py | Removed legacy cache_mem model code. |
| telefuser/cache_mem/metadata.py | Removed legacy cache_mem metadata manager. |
| telefuser/cache_mem/log_monitor.py | Removed legacy cache_mem log sink utilities. |
| telefuser/cache_mem/latent_cache.py | Removed legacy cache_mem LatentCache facade. |
| telefuser/cache_mem/encoding/interfaces.py | Removed legacy cache_mem encoder interfaces. |
| telefuser/cache_mem/encoders.py | Removed legacy cache_mem encoder wiring. |
| telefuser/cache_mem/connection.py | Removed legacy cache_mem connection manager. |
| telefuser/cache_mem/config.py | Removed legacy cache_mem config types. |
| telefuser/cache_mem/cache_types.py | Removed legacy cache_mem cache result/types. |
| telefuser/cache_mem/init.py | Removed legacy cache_mem package facade. |
| pyproject.toml | Pin torch/torchvision and update cache extra description to reflect CacheSeek usage. |
| examples/wan_video/wan22_14b_text_to_video_service.py | Update service example docs/config for CacheSeek-based lifecycle. |
| examples/wan_video/wan22_14b_text_to_video_service_nocache.py | New “no-cache” Wan2.2 service example variant. |
| docs/zh/latent_cache.md | Rewrite doc to describe CacheSeek integration and updated service flow. |
| docs/en/latent_cache.md | Rewrite doc to describe CacheSeek integration and updated service flow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _build_cache_task_request(task_data: dict) -> SimpleNamespace: | ||
| """Build a minimal task_request stub for the cache layer. | ||
|
|
||
| Splatting ``task_data`` directly would crash because ``TaskRequest`` is | ||
| ``extra="allow"`` and may contain keys that are not valid Python | ||
| identifiers. The cache layer only reads ``task_id`` / ``task`` / | ||
| ``prompt`` via ``getattr``, so we whitelist those. | ||
| """ | ||
| """Build a minimal task_request stub for the cache layer.""" |
|
This commit (93be55f) fixes the CI failure where CPU test jobs imported GPU-only test modules during pytest collection, which triggered CUDA driver initialization. PTAL. |
| # Pin the torch stack to the validated combo (2.7.0 + cu126). Without this a | ||
| # fresh `pip install -e .` resolves to the latest torch (2.12 / cuda 13) via | ||
| # torchvision, which the H100 deployment + cacheseek repro were NOT validated on. | ||
| "torch==2.7.0", | ||
| "torchvision==0.22.0", |
There was a problem hiding this comment.
pyproject.toml: the torch pin snuck in from the base commit — let's drop it (restore torchvision from main); keep the cache-extra removal.
| pip install --force-reinstall --no-deps \ | ||
| torch==2.7.0 torchvision==0.22.0 \ |
There was a problem hiding this comment.
CI torch pin (test.yml + run_ci_tests.sh): same source as the pyproject.toml pin, drop it here too so CI and the manifest don't disagree.
Co-authored-by: @yx0716
Description
This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.
Motivation
Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local
cache_memcode. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.Type of Change
Changes Made
telefuser/cache_memimplementation and related unit tests.world_kv_bindingruntime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.https://github.com/Tele-AI/CacheSeek.Testing
pytest tests/)Test commands:
Additional validation notes:
lookup_hit skip_step=1andsave_stored.all_pass=true.world_kv: fast-forward 1 chunks (decode-only).Checklist
ruff)pre-commit run --all-files)pytest tests/)[TYPE] Brief descriptionRelated Issues
N/A
Additional Notes
This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser
world_kv_bindinghooks are usable end to end.GPU Architecture Support
No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.
Performance Impact
No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:
7.456s1.519s1.494s4.628sThese are smoke e2e timings on H100 and should not be treated as a formal benchmark.