Skip to content

LLaDA2.0-mini torch error #28

@AIxyz

Description

@AIxyz
python3 /code/dInfer/benchmarks/benchmark.py --model_name /mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8 --gen_len 2048 --block_length 32 --gpu 0 --parallel_decoding threshold --threshold 0.9 --cache prefix --use_bd

I use dinfer v0.2.0 in 1*H800-80G to test LLaDA2.0-mini, but the error as follows, what could be the reason for this?@zheng-da

/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directl
y, please report this to the maintainers of the package that installed pynvml for you.                                                                                                        
  import pynvml  # type: ignore[import]                                                                                                                                                       
INFO 12-21 23:08:34 [__init__.py:216] Automatically detected platform cuda.                                                                                                                   
The input args are listed as follows: Namespace(model_name='/mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8', gpu='0', gen_len=2048, prefix_look=0, after_look=0, b
lock_length=32, threshold=0.9, warmup_times=0, low_threshold=0.3, cont_weight=0, parallel_decoding='threshold', use_credit=False, exp_name='exp', cache='prefix', use_tp=False, use_shift=Fals
e, use_bd=True, model_type='llada2')                                                                                                                                                          
started 1 0 0 Namespace(model_name='/mnt/tenant-home_speed/shared/models/huggingface/LLaDA2.0-mini--572899f-C8', gpu='0', gen_len=2048, prefix_look=0, after_look=0, block_length=32, threshol
d=0.9, warmup_times=0, low_threshold=0.3, cont_weight=0, parallel_decoding='threshold', use_credit=False, exp_name='exp', cache='prefix', use_tp=False, use_shift=False, use_bd=True, model_ty
pe='llada2', tp_size=1, port_offset=0)                                                                                                                                                        
WARNING 12-21 23:08:36 [__init__.py:3804] Current vLLM config is not set.                                                                                                                     
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
WARNING 12-21 23:08:36 [__init__.py:3804] Current vLLM config is not set.                                                                                                                     
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0                                                                                                    
INFO 12-21 23:08:36 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0                                                                 
[Loading model]                                                                                                                                                                               
EP Enabled: True                                                                                                                                                                              
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 38.08it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14813/14813 [00:00<00:00, 1478270.36it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:13<00:00,  1.45it/s]
unused_keys []                                                                                                                                                                                
not_inited_keys []                                                                                                                                                                            
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 259/259 [00:00<00:00, 348.66it/s]
/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py:282: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setti
ng `torch.set_float32_matmul_precision('high')` for better performance.                                                                                                                       
  warnings.warn(                                                                                                                                                                              
[rank0]: Traceback (most recent call last):                                                                                                                                                   
[rank0]:   File "/code/dInfer/benchmarks/benchmark.py", line 209, in <module>                                                                                                                 
[rank0]:     main(1, 0, gpus[0], args)                                                                                                                                                        
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context                                                                           
[rank0]:     return func(*args, **kwargs)                                                                                                                                                     
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank0]:   File "/code/dInfer/benchmarks/benchmark.py", line 135, in main                                                                                                                     
[rank0]:     dllm.generate(input_ids, gen_length=gen_length, block_length=block_length)                                                                                                       
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
[rank0]:     return func(*args, **kwargs)                                                                                                                                                     
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                     
[rank0]:   File "/code/dInfer/python/dinfer/decoding/generate_uniform.py", line 1074, in naive_batching_generate                                                                              
[rank0]:     self.block_runner.prefill(self.model, x[:, :prefill_length], kv_cache, pos_ids[:, :prefill_length], bd_attn_mask[:,:prefill_length,:prefill_length], self.prefilling_limit, block
_length)                                                                                                                                                                                      
[rank0]:   File "/code/dInfer/python/dinfer/decoding/generate_uniform.py", line 168, in prefill                                                                                               
[rank0]:     output = model(prefilling_x.clone(memory_format=torch.contiguous_format), use_cache=True, attention_mask=attn_mask, position_ids=pos_ids.clone(memory_format=torch.contiguous_for
mat))                                                                                                                                                                                         
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^                                                                                                                                                                                         
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl                                                                        
[rank0]:     return self._call_impl(*args, **kwargs)                                                                                                                                          
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                          
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl                                                                                
[rank0]:     return forward_call(*args, **kwargs)                                                                                                                                             
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                             
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 749, in compile_wrapper                                                                           
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1                                                                                                            
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                         
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 923, in _compile_fx_inner
[rank0]:     raise InductorError(e, currentframe()).with_traceback(
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 907, in _compile_fx_inner
[rank0]:     mb_compiled_graph = fx_codegen_and_compile(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1578, in fx_codegen_and_compile
[rank0]:     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1236, in codegen_and_compile
[rank0]:     _recursive_post_grad_passes(gm, is_inference=is_inference)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 504, in _recursive_post_grad_passes
[rank0]:     post_grad_passes(gm, is_inference) 
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 208, in post_grad_passes
[rank0]:     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 85, in apply_graph_pass
[rank0]:     return pass_fn(self.gm.graph)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1232, in decompose_auto_functionalized
[rank0]:     graph_pass.apply(graph)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1963, in apply
[rank0]:     entry.apply(m, graph, node)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1115, in apply
[rank0]:     self.handler(match, *match.args, **match.kwargs)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1230, in _
[rank0]:     match.replace_by_example(decomp, flat_args, run_functional_passes=False)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 309, in replace_by_example
[rank0]:     assert len(graph_with_eager_vals.graph.nodes) == len(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch._inductor.exc.InductorError: AssertionError: 

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W1221 23:09:28.578579446 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see
 https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions