(EngineCore_DP0 pid=272) 2025-12-22 19:08:35.000109: 272 [INFO]: Compilation Successfully Completed for model.MODULE_4aae3bed9043a81c0125+97c2cc02.hlo_module.pb
(EngineCore_DP0 pid=272) INFO:Neuron:Done compilation for the priority HLO in 106.28764724731445 seconds
(EngineCore_DP0 pid=272) INFO:Neuron:Updating the hlo module with optimized layout
(EngineCore_DP0 pid=280) 2025-12-22 19:08:36.000096: 280 [INFO]: Using a cached neff at /cache/neuronxcc-2.22.12471.0+b4a00d10/MODULE_4aae3bed9043a81c0125+97c2cc02/model.neff
(EngineCore_DP0 pid=280) INFO:Neuron:Done compilation for the priority HLO in 106.46791672706604 seconds
(EngineCore_DP0 pid=280) INFO:Neuron:Updating the hlo module with optimized layout
(EngineCore_DP0 pid=276) 2025-12-22 19:08:36.000776: 276 [INFO]: Using a cached neff at /cache/neuronxcc-2.22.12471.0+b4a00d10/MODULE_4aae3bed9043a81c0125+97c2cc02/model.neff
(EngineCore_DP0 pid=276) INFO:Neuron:Done compilation for the priority HLO in 106.47657084465027 seconds
(EngineCore_DP0 pid=276) INFO:Neuron:Updating the hlo module with optimized layout
(EngineCore_DP0 pid=272) INFO:Neuron:Updating the hlo module with optimized layout
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=280) Process EngineCore_DP0:
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self.collective_rpc("load_model")
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuron_worker.py", line 86, in load_model
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self.model_runner.load_model()
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_runner.py", line 221, in load_model
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self.model = get_neuron_model(
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 714, in get_neuron_model
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] model.load_weights(model_name_or_path=model_config.model,
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 394, in load_weights
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self._compile_and_load_model(model_name_or_path, neuronx_model_cls,
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 240, in _compile_and_load_model
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self.model.compile(compiled_path)
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed_inference/models/application_base.py", line 302, in compile
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] traced_model = self.get_builder(debug).trace(
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 680, in trace
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] self._add_layout_optimization_to_remaining_hlo()
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 1196, in _add_layout_optimization_to_remaining_hlo
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] os.remove(original_hlo_file_name)
(EngineCore_DP0 pid=280) ERROR 12-22 19:08:38 [core.py:708] FileNotFoundError: [Errno 2] No such file or directory: 'context_encoding_model_0.hlo'
(EngineCore_DP0 pid=280) Traceback (most recent call last):
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=280) self.run()
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=280) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=280) raise e
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=280) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=280) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=280) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=280) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=280) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=280) self._init_executor()
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=280) self.collective_rpc("load_model")
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=280) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=280) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=280) return func(*args, **kwargs)
(EngineCore_DP0 pid=280) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuron_worker.py", line 86, in load_model
(EngineCore_DP0 pid=280) self.model_runner.load_model()
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_runner.py", line 221, in load_model
(EngineCore_DP0 pid=280) self.model = get_neuron_model(
(EngineCore_DP0 pid=280) ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 714, in get_neuron_model
(EngineCore_DP0 pid=280) model.load_weights(model_name_or_path=model_config.model,
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 394, in load_weights
(EngineCore_DP0 pid=280) self._compile_and_load_model(model_name_or_path, neuronx_model_cls,
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 240, in _compile_and_load_model
(EngineCore_DP0 pid=280) self.model.compile(compiled_path)
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed_inference/models/application_base.py", line 302, in compile
(EngineCore_DP0 pid=280) traced_model = self.get_builder(debug).trace(
(EngineCore_DP0 pid=280) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 680, in trace
(EngineCore_DP0 pid=280) self._add_layout_optimization_to_remaining_hlo()
(EngineCore_DP0 pid=280) File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 1196, in _add_layout_optimization_to_remaining_hlo
(EngineCore_DP0 pid=280) os.remove(original_hlo_file_name)
(EngineCore_DP0 pid=280) FileNotFoundError: [Errno 2] No such file or directory: 'context_encoding_model_0.hlo'
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=276) Process EngineCore_DP0:
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self.collective_rpc("load_model")
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuron_worker.py", line 86, in load_model
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self.model_runner.load_model()
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_runner.py", line 221, in load_model
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self.model = get_neuron_model(
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 714, in get_neuron_model
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] model.load_weights(model_name_or_path=model_config.model,
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 394, in load_weights
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self._compile_and_load_model(model_name_or_path, neuronx_model_cls,
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 240, in _compile_and_load_model
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self.model.compile(compiled_path)
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed_inference/models/application_base.py", line 302, in compile
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] traced_model = self.get_builder(debug).trace(
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 680, in trace
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] self._add_layout_optimization_to_remaining_hlo()
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 1196, in _add_layout_optimization_to_remaining_hlo
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] os.remove(original_hlo_file_name)
(EngineCore_DP0 pid=276) ERROR 12-22 19:08:39 [core.py:708] FileNotFoundError: [Errno 2] No such file or directory: 'context_encoding_model_0.hlo'
(EngineCore_DP0 pid=276) Traceback (most recent call last):
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=276) self.run()
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=276) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=276) raise e
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=276) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=276) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=276) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=276) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=276) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=276) self._init_executor()
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=276) self.collective_rpc("load_model")
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=276) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=276) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=276) return func(*args, **kwargs)
(EngineCore_DP0 pid=276) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuron_worker.py", line 86, in load_model
(EngineCore_DP0 pid=276) self.model_runner.load_model()
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_runner.py", line 221, in load_model
(EngineCore_DP0 pid=276) self.model = get_neuron_model(
(EngineCore_DP0 pid=276) ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 714, in get_neuron_model
(EngineCore_DP0 pid=276) model.load_weights(model_name_or_path=model_config.model,
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 394, in load_weights
(EngineCore_DP0 pid=276) self._compile_and_load_model(model_name_or_path, neuronx_model_cls,
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/vllm_neuron/worker/neuronx_distributed_model_loader.py", line 240, in _compile_and_load_model
(EngineCore_DP0 pid=276) self.model.compile(compiled_path)
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed_inference/models/application_base.py", line 302, in compile
(EngineCore_DP0 pid=276) traced_model = self.get_builder(debug).trace(
(EngineCore_DP0 pid=276) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 680, in trace
(EngineCore_DP0 pid=276) self._add_layout_optimization_to_remaining_hlo()
(EngineCore_DP0 pid=276) File "/opt/conda/lib/python3.12/site-packages/neuronx_distributed/trace/model_builder.py", line 1196, in _add_layout_optimization_to_remaining_hlo
(EngineCore_DP0 pid=276) os.remove(original_hlo_file_name)
The cache directory is local mount on the container and is shared among all three vllm server instances.
Error:
Dockerfile:
Machine:
inf2.48xlarge
OS
ubuntu 24.04
Model:
deepseek-ai/DeepSeek-R1-Distill-Llama-8BVLLM Open AI Server Config:
Number of server instances
24/8 = 3
The cache directory is local mount on the container and is shared among all three vllm server instances.