pp_size doesn't work due to assert

Hi, there,
wonderful work and thanks for sharing.

I tried to run toka with Qwen2.5 32b model, tp_size works well, but pp_size failed. 
Hardware: Hopper GPUs with NVLink. pp_size = 2. GPU memory is enough.

Error1:
Starting 7 processes: ['model_worker_pp0', 'model_worker_pp1', 'model_worker_pp2', 'model_worker_pp3', 'fanout_worker', 'manager', 'server']
Running in the main process: server
2025-10-09 09:14:22 | INFO | server | Starting web server
W1009 09:14:27.610000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W1009 09:14:27.610000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.191000 297514 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W1009 09:14:28.191000 297514 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.300000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W1009 09:14:28.300000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.433000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W1009 09:14:28.433000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.454000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W1009 09:14:28.454000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
W1009 09:14:28.946000 297515 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W1009 09:14:28.946000 297515 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:29 | INFO | model_worker_pp1 | Pipeline worker 1 started!
2025-10-09 09:14:30 | INFO | fanout_worker | Fanout worker started!
2025-10-09 09:14:30 | INFO | model_worker_pp3 | Pipeline worker 3 started!
2025-10-09 09:14:30 | INFO | model_worker_pp0 | Pipeline worker 0 started!
2025-10-09 09:14:30 | INFO | model_worker_pp2 | Pipeline worker 2 started!
2025-10-09 09:14:30 | INFO | manager | Manager started
2025-10-09 09:14:31 | INFO | model_worker_pp2 | Creating model on device cuda:2 with dtype torch.bfloat16
Building layers 32 to 48
Loading from safetensors
Loading safetensors files:   0%|                                                                                                                                                                     | 0/14 [00:00<?, ?it/s]2025-10-09 09:14:31 | INFO | model_worker_pp1 | Creating model on device cuda:1 with dtype torch.bfloat16
2025-10-09 09:14:31 | INFO | model_worker_pp3 | Creating model on device cuda:3 with dtype torch.bfloat16
Building layers 16 to 32
2025-10-09 09:14:31 | INFO | model_worker_pp0 | Creating model on device cuda:0 with dtype torch.bfloat16
Building layers 48 to 64
Building layers 0 to 16
Loading from safetensors
Loading safetensors files:   0%|                                                                                                                                                                     | 0/14 [00:00<?, ?it/s]Loading from safetensors
Loading safetensors files:   0%|                                                                                                                                                                     | 0/14 [00:00<?, ?it/s]Loading from safetensors
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:05<00:00,  2.65it/s]
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:04<00:00,  2.95it/s]
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:04<00:00,  2.97it/s]
2025-10-09 09:14:36 | INFO | model_worker_pp1 | Created model
2025-10-09 09:14:36 | INFO | model_worker_pp2 | Created model
Capturing cudagraphs for model_worker_pp1:   0%|                                                                                                                                                      | 0/8 [00:00<?, ?it/s]2025-10-09 09:14:36,832 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank2]:W1009 09:14:36.832000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank2]:W1009 09:14:36.832000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,834 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank1]:W1009 09:14:36.835000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank1]:W1009 09:14:36.835000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36 | INFO | model_worker_pp3 | Created model
Capturing cudagraphs for model_worker_pp3:   0%|                                                                                                                                                      | 0/8 [00:00<?, ?it/s][rank2]:W1009 09:14:36.860000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank2]:W1009 09:14:36.860000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,869 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
2025-10-09 09:14:36,872 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank3]:W1009 09:14:36.872000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:36.872000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:36.880000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:36.880000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,889 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank1]:W1009 09:14:36.946000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank1]:W1009 09:14:36.946000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:36,955 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
Loading safetensors files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:04<00:00,  2.90it/s]
2025-10-09 09:14:37 | INFO | model_worker_pp0 | Created model
Capturing cudagraphs for model_worker_pp0:   0%|                                                                                                                                                      | 0/8 [00:00<?, ?it/s]2025-10-09 09:14:37,075 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
[rank0]:W1009 09:14:37.075000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank0]:W1009 09:14:37.075000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank0]:W1009 09:14:37.083000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank0]:W1009 09:14:37.083000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:37,093 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False
2025-10-09 09:14:37,169 - INFO - flashinfer.jit: Loading JIT ops: sampling
[rank3]:W1009 09:14:37.169000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:37.169000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:37.176000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:37.176000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:37,185 - INFO - flashinfer.jit: Finished loading JIT ops: sampling
Capturing cudagraphs for model_worker_pp1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  3.54it/s]
Warmup loop for model_worker_pp1:   0%|                                                                                                                                                              | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,117 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank1]:W1009 09:14:39.117000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank1]:W1009 09:14:39.117000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank1]:W1009 09:14:39.125000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank1]:W1009 09:14:39.125000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,140 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,149 - INFO - flashinfer.jit: Loading JIT ops: page
[rank1]:W1009 09:14:39.150000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank1]:W1009 09:14:39.150000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank1]:W1009 09:14:39.156000 297511 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank1]:W1009 09:14:39.156000 297511 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Capturing cudagraphs for model_worker_pp0:  88%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                 | 7/8 [00:02<00:00,  3.69it/s]2025-10-09 09:14:39,163 - INFO - flashinfer.jit: Finished loading JIT ops: page
Capturing cudagraphs for model_worker_pp2: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  3.41it/s]
Warmup loop for model_worker_pp2:   0%|                                                                                                                                                              | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,213 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank2]:W1009 09:14:39.213000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank2]:W1009 09:14:39.213000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank2]:W1009 09:14:39.220000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank2]:W1009 09:14:39.220000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Capturing cudagraphs for model_worker_pp3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  3.38it/s]
2025-10-09 09:14:39,229 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,238 - INFO - flashinfer.jit: Loading JIT ops: page
[rank2]:W1009 09:14:39.238000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank2]:W1009 09:14:39.238000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank2]:W1009 09:14:39.244000 297512 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank2]:W1009 09:14:39.244000 297512 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,251 - INFO - flashinfer.jit: Finished loading JIT ops: page
Warmup loop for model_worker_pp3:   0%|                                                                                                                                                              | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,267 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank3]:W1009 09:14:39.267000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:39.267000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:39.274000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:39.274000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,282 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,291 - INFO - flashinfer.jit: Loading JIT ops: page
[rank3]:W1009 09:14:39.291000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:39.291000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank3]:W1009 09:14:39.297000 297513 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank3]:W1009 09:14:39.297000 297513 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,304 - INFO - flashinfer.jit: Finished loading JIT ops: page
Capturing cudagraphs for model_worker_pp0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:02<00:00,  3.45it/s]
Warmup loop for model_worker_pp0:   0%|                                                                                                                                                              | 0/84 [00:00<?, ?it/s]2025-10-09 09:14:39,425 - INFO - flashinfer.jit: Loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
[rank0]:W1009 09:14:39.425000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank0]:W1009 09:14:39.425000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Warmup loop for model_worker_pp1:   1%|█▊                                                                                                                                                    | 1/84 [00:00<00:26,  3.13it/s][rank0]:W1009 09:14:39.432000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank0]:W1009 09:14:39.432000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,441 - INFO - flashinfer.jit: Finished loading JIT ops: batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90
2025-10-09 09:14:39,451 - INFO - flashinfer.jit: Loading JIT ops: page
[rank0]:W1009 09:14:39.451000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank0]:W1009 09:14:39.451000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[rank0]:W1009 09:14:39.457000 297510 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
[rank0]:W1009 09:14:39.457000 297510 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-10-09 09:14:39,463 - INFO - flashinfer.jit: Finished loading JIT ops: page
Warmup loop for model_worker_pp1:  27%|████████████████████████████████████████▊        
Traceback (most recent call last):
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/utils.py", line 502, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/pipeline_worker.py", line 278, in pipeline_worker_model_loop
    setup_and_run_loop(
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 896, in setup_and_run_loop
    run_warmup_batches(
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 483, in run_warmup_batches
    run_overlapped_loop(
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 296, in run_overlapped_loop
    run_model(run_work)
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/pipeline_worker.py", line 213, in run_model
    output_batch_state = model_runner.run(
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 849, in run
    return self.graphs[graph_index].run(input_batch_state, non_blocking)
  File "/opt/conda/envs/tokasaurus/lib/python3.10/site-packages/tokasaurus/model/utils.py", line 618, in run
    assert self.output_batch_state.outputs is not None
AssertionError

Error2: if I commented out the assertion, it finally failed on next replace op
input_batch_state.outputs = replace(self.output_batch_state.outputs)

  File "/opt/conda/envs/tokasaurus/lib/python3.10/dataclasses.py", line 1424, in replace
    raise TypeError("replace() should be called on dataclass instances")
TypeError: replace() should be called on dataclass instances


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pp_size doesn't work due to assert #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pp_size doesn't work due to assert #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions