Skip to content

pp_size doesn't work due to assert #18

@Zhaojp-Frank

Description

@Zhaojp-Frank

Hi, there,
wonderful work and thanks for sharing.

I tried to run toka with Qwen2.5 32b model, tp_size works well, but pp_size failed.
Hardware: Hopper GPUs with NVLink. pp_size = 2. GPU memory is enough.

Error1:
Starting 7 processes: ['model_worker_pp0', 'model_worker_pp1', 'model_worker_pp2', 'model_worker_pp3', 'fanout_worker', 'manager', 'server']
Running in the main process: server
2025-10-09 09:14:22 | INFO | server | Starting web server
W1009 09:14:27.610000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W1009 09:14:27.610000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.191000 297514 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W1009 09:14:28.191000 297514 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.300000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W1009 09:14:28.300000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.433000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W1009 09:14:28.433000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.454000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W1009 09:14:28.454000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.946000 297515 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W1009 09:14:28.946000 297515 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:29 | INFO | model_worker_pp1 | Pipeline worker 1 started!
2025-10-09 09:14:30 | INFO | fanout_worker | Fanout worker started!
2025-10-09 09:14:30 | INFO | model_worker_pp3 | Pipeline worker 3 started!
2025-10-09 09:14:30 | INFO | model_worker_pp0 | Pipeline worker 0 started!
2025-10-09 09:14:30 | INFO | model_worker_pp2 | Pipeline worker 2 started!
2025-10-09 09:14:30 | INFO | manager | Manager started
2025-10-09 09:14:31 | INFO | model_worker_pp2 | Creating model on device cuda:2 with dtype torch.bfloat16
Building layers 32 to 48
Loading from safetensors
Loading safetensors files: 0%| | 0/14 [00:00<?, ?it/s]2025-10-09 09:14:31 | INFO | model_worker_pp1 | Creating model on device cuda:1 with dtype torch.bfloat16
2025-10-09 09:14:31 | INFO | model_worker_pp3 | Creating model on device cuda:3 with dtype torch.bfloat16
Building layers 16 to 32
2025-10-09 09:14:31 | INFO | model_worker_pp0 | Creating model on device cuda:0 with dtype torch.bfloat16
Building layers 48 to 64
Building layers 0 to 16
Loading from safetensors
Loading safetensors files: 0%| | 0/14 [00:00<?, ?it/s]Loading from safetensors
Loading safetensors files: 0%| | 0/14 [00:00<?, ?it/s]Loading from safetensors
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:05<00:00, 2.65it/s]
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:04<00:00, 2.95it/s]
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:04<00:00, 2.97it/s]
2025-10-09 09:14:36 | INFO | model_worker_pp1 | Created model
2025-10-09 09:14:36 | INFO | model_worker_pp2 | Created model
Capturing cudagraphs for model_worker_pp1: 0%| | 0/8 [00:00<?, ?it/s]2025-10-09 09:14:36,832 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank2]:W1009 09:14:36.832000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank2]:W1009 09:14:36.832000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,834 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank1]:W1009 09:14:36.835000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank1]:W1009 09:14:36.835000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36 | INFO | model_worker_pp3 | Created model
Capturing cudagraphs for model_worker_pp3: 0%| | 0/8 [00:00<?, ?it/s][rank2]:W1009 09:14:36.860000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank2]:W1009 09:14:36.860000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,869 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
2025-10-09 09:14:36,872 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank3]:W1009 09:14:36.872000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:36.872000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:36.880000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:36.880000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,889 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank1]:W1009 09:14:36.946000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank1]:W1009 09:14:36.946000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,955 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:04<00:00, 2.90it/s]
2025-10-09 09:14:37 | INFO | model_worker_pp0 | Created model
Capturing cudagraphs for model_worker_pp0: 0%| | 0/8 [00:00<?, ?it/s]2025-10-09 09:14:37,075 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank0]:W1009 09:14:37.075000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1009 09:14:37.075000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank0]:W1009 09:14:37.083000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1009 09:14:37.083000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:37,093 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
2025-10-09 09:14:37,169 - INFO - flashinfer.jit: Loading JIT ops: sampling
[rank3]:W1009 09:14:37.169000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:37.169000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:37.176000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:37.176000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:37,185 - INFO - flashinfer.jit: Finished loading JIT ops: sampling
Capturing cudagraphs for model_worker_pp1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.54it/s]
Warmup loop for model_worker_pp1: 0%| | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,117 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank1]:W1009 09:14:39.117000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank1]:W1009 09:14:39.117000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank1]:W1009 09:14:39.125000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank1]:W1009 09:14:39.125000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,140 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,149 - INFO - flashinfer.jit: Loading JIT ops: page
[rank1]:W1009 09:14:39.150000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank1]:W1009 09:14:39.150000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank1]:W1009 09:14:39.156000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank1]:W1009 09:14:39.156000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Capturing cudagraphs for model_worker_pp0: 88%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7/8 [00:02<00:00, 3.69it/s]2025-10-09 09:14:39,163 - INFO - flashinfer.jit: Finished loading JIT ops: page
Capturing cudagraphs for model_worker_pp2: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.41it/s]
Warmup loop for model_worker_pp2: 0%| | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,213 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank2]:W1009 09:14:39.213000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank2]:W1009 09:14:39.213000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank2]:W1009 09:14:39.220000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank2]:W1009 09:14:39.220000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Capturing cudagraphs for model_worker_pp3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.38it/s]
2025-10-09 09:14:39,229 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,238 - INFO - flashinfer.jit: Loading JIT ops: page
[rank2]:W1009 09:14:39.238000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank2]:W1009 09:14:39.238000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank2]:W1009 09:14:39.244000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank2]:W1009 09:14:39.244000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,251 - INFO - flashinfer.jit: Finished loading JIT ops: page
Warmup loop for model_worker_pp3: 0%| | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,267 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank3]:W1009 09:14:39.267000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:39.267000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:39.274000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:39.274000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,282 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,291 - INFO - flashinfer.jit: Loading JIT ops: page
[rank3]:W1009 09:14:39.291000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:39.291000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:39.297000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank3]:W1009 09:14:39.297000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,304 - INFO - flashinfer.jit: Finished loading JIT ops: page
Capturing cudagraphs for model_worker_pp0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00, 3.45it/s]
Warmup loop for model_worker_pp0: 0%| | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,425 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank0]:W1009 09:14:39.425000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1009 09:14:39.425000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Warmup loop for model_worker_pp1: 1%|█▊ | 1/84 [00:00<00:26, 3.13it/s][rank0]:W1009 09:14:39.432000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1009 09:14:39.432000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,441 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,451 - INFO - flashinfer.jit: Loading JIT ops: page
[rank0]:W1009 09:14:39.451000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1009 09:14:39.451000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank0]:W1009 09:14:39.457000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1009 09:14:39.457000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,463 - INFO - flashinfer.jit: Finished loading JIT ops: page
Warmup loop for model_worker_pp1: 27%|████████████████████████████████████████▊
Traceback (most recent call last):
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/utils.py", line 502, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/pipeline_worker.py", line 278, in pipeline_worker_model_loop
setup_and_run_loop(
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 896, in setup_and_run_loop
run_warmup_batches(
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 483, in run_warmup_batches
run_overlapped_loop(
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 296, in run_overlapped_loop
run_model(run_work)
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/pipeline_worker.py", line 213, in run_model
output_batch_state = model_runner.run(
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 849, in run
return self.graphs[graph_index].run(input_batch_state, non_blocking)
File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 618, in run
assert self.output_batch_state.outputs is not None
AssertionError

Error2: if I commented out the assertion, it finally failed on next replace op
input_batch_state.outputs = replace(self.output_batch_state.outputs)

File "/opt/conda/envs/tokasaurus/lib/python3.10/dataclasses.py", line 1424, in replace
raise TypeError("replace() should be called on dataclass instances")
TypeError: replace() should be called on dataclass instances

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions