Skip to content

vivoCameraResearch/Anchor-Forcing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion

arxiv  page  License

overview

📖TL;DR: Anchor Forcing enables prompt switches to introduce new subjects and actions while preserving context, motion quality, and temporal coherence; prior methods often degrade over time and miss newly specified interactions.

📢 News

  • [2026-03-18] 🎉 We have officially released the code for public use!

✅ ToDo List for Any-to-Bokeh Release

  • Release the code
  • Release the inference pipeline
  • Release the training files
  • Release the model weights

🔧 Installation

We tested this repo on the following setup:

  • Nvidia GPU with at least 40 GB memory (A100 tested).
  • Linux operating system.
  • 64 GB RAM.

Other hardware setup could also work but hasn't been tested.

Environment

Create a conda environment and install dependencies:

git clone https://github.com/vivoCameraResearch/Anchor-Forcing.git
cd Anchor-Forcing
conda create -n af python=3.10 -y
conda activate af
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

# Manual installation flash-attention. Recommended version: 2.7.4.post1
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

⏬ Demo Inference

Download Wan2.1-T2V-1.3B

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

Download checkpoints

TODO: The checkpoints is currently under internal review.

Single Prompt Video Generation

bash inference/inference.sh

Interactive Long Video Generation

bash inference/interactive_inference.py

Training

Download checkpoints

Please follow Self-Forcing to download text prompts and ODE initialized checkpoint.

Download Wan2.1-T2V-14B as the teacher model.

huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B

Step1: Self-Forcing Initialization for Short Window and Frame Sink

Please follow LongLive

Step2: Streaming Long Tuning

bash train.sh

Hints

This repository only provides the training code for step 2. We default to following the training method of LongLive's step 1. Therefore, you can directly train step 2 using LongLive's checkpoints.

📜 Acknowledgement

This codebase builds on LongLive. Thanks for open-sourcing! Besides, we acknowledge following great open-sourcing projects:

  • MemFlow: We followed its interactive video benchmark.
  • Self-Forcing: We followed its vbench prompt and checkpoints.

🌏 Citation

@article{yang2026anchor,
  title={Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion},
  author={Yang, Yang and Zhang, Tianyi and Huang, Wei and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},
  journal={arXiv preprint arXiv:2603.13405},
  year={2026}
}

📧 Contact

If you have any questions and improvement suggestions, please email Yang Yang (yangyang98@zju.edu.cn), or open an issue.

About

Anchor Forcing is a cache-centric framework for interactive streaming video generation that preserves visual quality and coherent motion across prompt switches

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors