-
Notifications
You must be signed in to change notification settings - Fork 93
Description
System Info
Optimum version: 2.0.0
Optimum-neuron version: 0.4.3
Platform: Ubuntu (Deep Learning AMI Neuron Ubuntu 24.04)
Python version: 3.12.3 (I know this does not meet the requirements)Who can help?
@JingyaHuang I am hoping you can help with this error. I am having trouble exporting the KALM 12B embedding model to Neuron. I have set up a Deep Learning AMI Neuron instance on EC2 and I am running from the Neuron venv. I have confirmed I can run optimum to export other models, like the BAAI/bge-small-en-v1.5 model, but it will not work with the KALM model unfortunately.
Here is the command I am running: optimum-cli export neuron --model tencent/KaLM-Embedding-Gemma3-12B-2511 --sequence_length 512 --batch_size 256 kalm_neuron/
Error traceback:
Using Neuron: --optlevel 2 2026-Feb-03 22:32:35.0530 38614:38726 [0] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):213 CCOM WARN NET/OFI Failed to initialize sendrecv protocol 2026-Feb-03 22:32:35.0532 38614:38726 [0] int nccl_net_ofi_create_plugin(nccl_net_ofi_plugin_t**):354 CCOM WARN NET/OFI aws-ofi-nccl initialization failed 2026-Feb-03 22:32:35.0535 38614:38726 [0] ncclResult_t nccl_net_ofi_init_no_atexit_fini_v6(ncclDebugLogger_t):183 CCOM WARN NET/OFI Initializing plugin failed 2026-Feb-03 22:32:35.0538 38614:38726 [0] net_plugin.cc:97 CCOM WARN OFI plugin initNet() failed is EFA enabled? 2026-02-03 22:32:35.000590: 38614 [INFO]: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.22.12471.0+b4a00d10/MODULE_142159770607723651+e30acd3a/model.neff Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/__main__.py", line 905, in <module> main() File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/__main__.py", line 877, in main main_export( File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/__main__.py", line 685, in main_export _, neuron_outputs = export_models( ^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/convert.py", line 376, in export_models neuron_inputs, neuron_outputs = export( ^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/convert.py", line 472, in export return export_neuronx( ^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/convert.py", line 579, in export_neuronx trace_neuronx( File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/convert.py", line 764, in trace_neuronx neuron_model = neuronx.trace( ^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch_neuronx/xla_impl/trace.py", line 606, in trace neff_filename, metaneff, flattener, packer, weights = _trace( ^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch_neuronx/xla_impl/trace.py", line 676, in _trace hlo_artifacts = generate_hlo( ^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch_neuronx/xla_impl/trace.py", line 460, in generate_hlo ) = xla_trace( ^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 601, in xla_trace return _xla_trace(func, example_inputs, states, input_output_aliases, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 387, in _xla_trace outputs = func(*example_inputs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/exporters/neuron/model_wrappers.py", line 590, in forward out_tuple = self.model({"input_ids": input_ids, "attention_mask": attention_mask}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/sentence_transformers/SentenceTransformer.py", line 1175, in forward input = module(input, **module_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/sentence_transformers/models/Transformer.py", line 262, in forward outputs = self.auto_model(**trans_features, **kwargs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/utils/generic.py", line 1072, in wrapper outputs = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 570, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__ return super().__call__(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 382, in forward hidden_states, self_attn_weights = self.self_attn( ^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/models/gemma3/modeling_gemma3.py", line 321, in forward key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/cache_utils.py", line 776, in update keys, values = self.layers[layer_idx].update(key_states, value_states, cache_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/transformers/cache_utils.py", line 207, in update self.keys = full_key_states[:, :, -self.sliding_window + 1 :, :] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/torch/_tensor.py", line 1654, in __torch_function__ ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Value out of range (expected to be in range of [-512, 511], but got -1023) Traceback (most recent call last): File "/opt/aws_neuronx_venv_pytorch_2_9/bin/optimum-cli", line 8, in <module> sys.exit(main()) ^^^^^^ File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/commands/optimum_cli.py", line 219, in main service.run() File "/opt/aws_neuronx_venv_pytorch_2_9/lib/python3.12/site-packages/optimum/commands/export/neuronx.py", line 342, in run subprocess.run(full_command, shell=True, check=True) File "/usr/lib/python3.12/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model tencent/KaLM-Embedding-Gemma3-12B-2511 --sequence_length 512 --batch_size 256 kalm_neuron' returned non-zero exit status 1.
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
Here is the command I am running: optimum-cli export neuron --model tencent/KaLM-Embedding-Gemma3-12B-2511 --sequence_length 512 --batch_size 256 kalm_neuron/
Expected behavior
I would expect the model to export to the folder I provided.