[REFACTOR] Replace in-tree cache_mem with CacheSeek integration by yJader · Pull Request #4 · Tele-AI/TeleFuser

yJader · 2026-07-01T13:53:58Z

Co-authored-by: @yx0716

Description

This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.

Motivation

Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local cache_mem code. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Performance improvement
Code refactoring
Documentation update
Other (please describe):

Changes Made

Removed the in-tree telefuser/cache_mem implementation and related unit tests.
Added lazy CacheSeek service initialization in TeleFuser service container and task service code.
Added/updated latent cache CLI and server config plumbing, including direct failure when latent cache is enabled but CacheSeek is unavailable.
Updated Wan2.2 T2V service examples for CacheSeek-backed latent cache and added a nocache service example.
Added LingBot World Fast world_kv_binding runtime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.
Updated English and Chinese latent cache docs with CacheSeek usage and the CacheSeek GitHub link: https://github.com/Tele-AI/CacheSeek.

Testing

Unit tests pass (pytest tests/)
Manual testing performed
Benchmarks added/updated (if applicable)

Test commands:

# TeleFuser targeted latent-cache/service tests.
python -m pytest \
  tests/unit/service/test_latent_cache_cli.py \
  tests/unit/service/test_latent_cache_task_service.py \
  tests/unit/pipelines/wan_video/test_service_examples.py \
  tests/unit/pipelines/wan_video/test_latent_data_utils.py -q
# Result: 15 passed

# Real Wan2.2 service e2e without latent cache.
# Model: Wan2.2-T2V-A14B
# Config: num_inference_steps=2, num_frames=5, resolution=480p, parallelism=1.
# Result: completed; non-empty mp4 generated.

# Real Wan2.2 service e2e with CacheSeek enabled.
# Model: Wan2.2-T2V-A14B
# Cache mode: read_write
# Result: task_status=completed; non-empty mp4 generated.
# Cache evidence: audit log contains lookup_hit skip_step=1 and save_stored.

# Real LingBot World Fast exact-prefix e2e through TeleFuser world_kv hooks.
# Note: uses CacheSeek as an external dependency; this PR only includes TeleFuser-side hooks.
export CUDA_VISIBLE_DEVICES=0,1
export LINGBOT_WORLD_CHECKPOINT_DIR=<lingbot-world-fast-checkpoint-root>
export WORLDKV_REPO_ROOTS=<telefuser-repo>:<cacheseek-repo>
export PYTHONPATH=<telefuser-repo>:<cacheseek-repo>:${PYTHONPATH:-}
cd <cacheseek-repo>
python \
  examples/exact_prefix_reuse/e2e_telefuser_lingbot.py \
  --frame-num 13 \
  --prefix-chunks 1 \
  --out-dir <output-dir> \
  --image-path <lingbot-example-image> \
  --action-path <lingbot-example-action-dir> \
  --aux-device cuda:1 \
  --no-save-videos
# Result: all_pass=true; fast_forward_k A=0, B=1, C=1, D=0.

Additional validation notes:

Wan2.2 CacheSeek audit log contained lookup_hit skip_step=1 and save_stored.
LingBot e2e manifest reported all_pass=true.
LingBot e2e log reported world_kv: fast-forward 1 chunks (decode-only).
GPUs were checked after e2e runs and had no remaining compute processes.

Checklist

Code follows the project's coding standards (ruff)
Pre-commit hooks pass (pre-commit run --all-files)
All tests pass (pytest tests/)
New tests added for new functionality
Documentation updated (README, CLAUDE.md, docstrings)
Commit messages are clear and descriptive
PR title follows the convention: [TYPE] Brief description

Related Issues

N/A

Additional Notes

This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser world_kv_binding hooks are usable end to end.

GPU Architecture Support

SM80 (Ampere, Ada Lovelace)
SM90 (Hopper H100)
SM100+ (Blackwell)

No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.

Performance Impact

No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:

A cold run: 7.456s
B full hit: 1.519s
C prefix hit: 1.494s
D cold fork reference: 4.628s

These are smoke e2e timings on H100 and should not be treated as a formal benchmark.

Replace the in-tree telefuser/cache_mem cache with cacheseek as the cross-request cache middleware. - service (container/task_service/api_server): build and drive (CacheService, TeleFuserCacheAdapter); per request build_query -> lookup -> apply_resume -> on_response -> save - lingbot_world_fast: world_kv hooks (on_runtime_created / on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse; enable rolling KV window (local_attn_size=7, sink_size=3) - remove legacy telefuser/cache_mem + service/cache/cache_factory| cache_service and the cache_mem unit tests - pin torch==2.7.0 + torchvision==0.22.0 - docs: update latent_cache (en/zh)

…arch-v2) arch-v2 退役了 cacheseek.core，CacheConfig 现从顶层 `cacheseek` 导出。cache 与 nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config，导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR refactors TeleFuser’s latent cache feature by removing the in-tree cache_mem implementation and replacing it with an optional CacheSeek-backed integration. The integration is wired through the service container / task service, exposed via CLI flags, and documented (EN/ZH), while preserving “cache disabled by default” behavior.

Changes:

Replace TeleFuser-local latent cache wiring with CacheSeek (cache_service, cache_adapter) lifecycle hooks (lookup/resume/save) and fail-fast startup when enabled but CacheSeek is missing.
Add CLI/server config plumbing for --enable-latent-cache and --cache-mode, plus unit tests covering lazy import and failure semantics.
Update Wan2.2 service examples and LingBot World Fast runtime hooks for CacheSeek reuse, and refresh latent-cache docs (EN/ZH).

Reviewed changes

Copilot reviewed 47 out of 53 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tools/viewer/weight_viewer.py	Minor formatting cleanup in number formatting.
tools/deploy/show_stat.py	Minor formatting/quoting cleanup in output strings.
tools/deploy/docker_monitor.py	Minor formatting cleanup in output strings.
tests/unit/service/test_latent_cache_task_service.py	New test validating CacheSeek lifecycle calls from `MediaGenerationService`.
tests/unit/service/test_latent_cache_cli.py	New CLI/container tests for lazy import and fail-fast behavior.
tests/unit/pipelines/wan_video/test_service_examples.py	New tests ensuring service examples import without CacheSeek present.
tests/unit/cache_mem/test_types_and_config.py	Removed legacy `cache_mem` unit tests.
tests/unit/cache_mem/test_storage.py	Removed legacy `cache_mem` unit tests.
tests/unit/cache_mem/test_metadata.py	Removed legacy `cache_mem` unit tests.
tests/unit/cache_mem/test_concurrency.py	Removed legacy `cache_mem` concurrency tests.
tests/unit/cache_mem/init.py	Removed legacy `cache_mem` test package init.
telefuser/service/main.py	Thread latent-cache flags into server config at startup.
telefuser/service/core/task_service.py	Switch cache flow to CacheSeek adapter (`build_query/lookup/apply_resume/on_response/save`).
telefuser/service/core/container.py	Lazy-import CacheSeek factory; fail fast on missing/failed init; store adapter in container.
telefuser/service/core/config.py	Add `cache_mode` to `ServerConfig` for service-level override plumbing.
telefuser/service/cache/cache_service.py	Removed legacy TeleFuser cache service implementation.
telefuser/service/cache/cache_factory.py	Removed legacy TeleFuser cache factory implementation.
telefuser/service/cache/init.py	Mark legacy cache namespace as deprecated (no longer a facade).
telefuser/service/api/api_server.py	Forward `cache_adapter` into API service initialization.
telefuser/pipelines/lingbot_world_fast/session.py	Add optional `world_kv_binding` + runtime state for cached-latent fast-forward.
telefuser/pipelines/lingbot_world_fast/pipeline.py	Add world-KV fast-forward hook points and decode-only cached chunk path.
telefuser/entrypoints/cli/main.py	Add `--enable-latent-cache` and `--cache-mode` options and forward into `run_server`.
telefuser/cache_mem/vector_store/qdrant.py	Removed legacy `cache_mem` vector store code.
telefuser/cache_mem/vector_store/interfaces.py	Removed legacy `cache_mem` vector store code.
telefuser/cache_mem/vector_store/faiss.py	Removed legacy `cache_mem` vector store code.
telefuser/cache_mem/vector_store/init.py	Removed legacy `cache_mem` vector store exports.
telefuser/cache_mem/strategies.py	Removed legacy `cache_mem` strategy implementation/registry.
telefuser/cache_mem/storage/memory.py	Removed legacy `cache_mem` storage backend.
telefuser/cache_mem/storage/local_file.py	Removed legacy `cache_mem` storage backend.
telefuser/cache_mem/storage/interfaces.py	Removed legacy `cache_mem` storage interfaces.
telefuser/cache_mem/storage/fluxon.py	Removed legacy `cache_mem` storage stub.
telefuser/cache_mem/storage/init.py	Removed legacy `cache_mem` storage exports.
telefuser/cache_mem/state/interfaces.py	Removed legacy `cache_mem` state interfaces.
telefuser/cache_mem/src/models/qwen3_vl_reranker.py	Removed legacy `cache_mem` model code.
telefuser/cache_mem/src/models/qwen3_vl_embedding.py	Removed legacy `cache_mem` model code.
telefuser/cache_mem/metadata.py	Removed legacy `cache_mem` metadata manager.
telefuser/cache_mem/log_monitor.py	Removed legacy `cache_mem` log sink utilities.
telefuser/cache_mem/latent_cache.py	Removed legacy `cache_mem` `LatentCache` facade.
telefuser/cache_mem/encoding/interfaces.py	Removed legacy `cache_mem` encoder interfaces.
telefuser/cache_mem/encoders.py	Removed legacy `cache_mem` encoder wiring.
telefuser/cache_mem/connection.py	Removed legacy `cache_mem` connection manager.
telefuser/cache_mem/config.py	Removed legacy `cache_mem` config types.
telefuser/cache_mem/cache_types.py	Removed legacy `cache_mem` cache result/types.
telefuser/cache_mem/init.py	Removed legacy `cache_mem` package facade.
pyproject.toml	Pin torch/torchvision and update cache extra description to reflect CacheSeek usage.
examples/wan_video/wan22_14b_text_to_video_service.py	Update service example docs/config for CacheSeek-based lifecycle.
examples/wan_video/wan22_14b_text_to_video_service_nocache.py	New “no-cache” Wan2.2 service example variant.
docs/zh/latent_cache.md	Rewrite doc to describe CacheSeek integration and updated service flow.
docs/en/latent_cache.md	Rewrite doc to describe CacheSeek integration and updated service flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 def _build_cache_task_request(task_data: dict) -> SimpleNamespace:
-    """Build a minimal task_request stub for the cache layer.
-
-    Splatting ``task_data`` directly would crash because ``TaskRequest`` is
-    ``extra="allow"`` and may contain keys that are not valid Python
-    identifiers. The cache layer only reads ``task_id`` / ``task`` /
-    ``prompt`` via ``getattr``, so we whitelist those.
-    """
+    """Build a minimal task_request stub for the cache layer."""


yJader · 2026-07-02T09:49:20Z

This commit (93be55f) fixes the CI failure where CPU test jobs imported GPU-only test modules during pytest collection, which triggered CUDA driver initialization. PTAL.

yx0716 · 2026-07-03T02:41:27Z

+    # Pin the torch stack to the validated combo (2.7.0 + cu126). Without this a
+    # fresh `pip install -e .` resolves to the latest torch (2.12 / cuda 13) via
+    # torchvision, which the H100 deployment + cacheseek repro were NOT validated on.
+    "torch==2.7.0",
+    "torchvision==0.22.0",


pyproject.toml: the torch pin snuck in from the base commit — let's drop it (restore torchvision from main); keep the cache-extra removal.

yx0716 · 2026-07-03T02:56:36Z

+          pip install --force-reinstall --no-deps \
+            torch==2.7.0 torchvision==0.22.0 \


CI torch pin (test.yml + run_ci_tests.sh): same source as the pyproject.toml pin, drop it here too so CI and the manifest don't disagree.

yx0716 and others added 5 commits June 18, 2026 15:50

fix: prepare cacheseek integration for upstream

26bb897

docs: update latent cache documentation for CacheSeek integration

252e6f1

style: apply ruff format

38379f7

yJader marked this pull request as ready for review July 1, 2026 14:03

lzx1413 requested a review from Copilot July 2, 2026 02:22

Copilot started reviewing on behalf of lzx1413 July 2, 2026 02:23 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

yJader and others added 3 commits July 2, 2026 07:55

refactor: split cacheseek dependency install path

83b255e

Delete scripts/verify_cacheseek_dependency_split.sh

f796759

ci: avoid importing GPU-only tests in CPU jobs

93be55f

yx0716 reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
yJader wants to merge 8 commits into
Tele-AI:mainfrom
yJader:refactor/cacheseek-adapt

yJader commented Jul 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

yJader commented Jul 2, 2026

Uh oh!

yx0716 Jul 3, 2026

Uh oh!

yx0716 Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		pip install --force-reinstall --no-deps \
		torch==2.7.0 torchvision==0.22.0 \

Uh oh!

Conversation

yJader commented Jul 1, 2026

Description

Motivation

Type of Change

Changes Made

Testing

Checklist

Related Issues

Additional Notes

GPU Architecture Support

Performance Impact

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

yJader commented Jul 2, 2026

Uh oh!

yx0716 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

yx0716 Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants