Skip to content

[Bug]: [NCC_INLA001] neuronx-cc fails to compile Llama-3.3-70B context_encoding_model with NxDI 0.7 — "type must be boolean, but is null" #16

@EzioEzi0

Description

@EzioEzi0

Your current environment

bug Description:

neuronx-cc fails to compile context_encoding_model HLO for Llama-3.3-70B-Instruct when using vllm-neuron 0.3.0 + NxDI 0.7. The token_generation_model compiles successfully, but multiple context_encoding_model buckets fail with an internal error.

Error:

[INTERNAL_ERROR] [NCC_INLA001] Unhandled exception with message:
[json.exception.type_error.302] type must be boolean, but is null
What I've tried (all fail with the same error):

TP=32, max_model_len=8192
TP=32, max_model_len=4096
TP=16, max_model_len=4096
The bug is in the context_encoding_model HLO compilation path — it's not sensitive to TP size or sequence length. The token_generation_model always compiles fine.

Reproduction:

from vllm import LLM, SamplingParams

llm = LLM(
model="meta-llama/Llama-3.3-70B-Instruct",
max_num_seqs=4,
max_model_len=4096,
block_size=32,
num_gpu_blocks_override=1024,
tensor_parallel_size=32,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(temperature=0.8))

export NEURON_RT_VISIBLE_CORES=0-31
export VLLM_PLUGINS=neuron
python test.py
Failed compiler invocation (from logs):

neuronx-cc compile --framework=XLA
/tmp/nxd_model/context_encoding_model/tp0_bk18/model.MODULE*.hlo_module.pb
--target=trn1 --auto-cast=none --model-type=transformer
--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma
--lnc=1 -O1
--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true
Failed buckets include: _tp0_bk16, _tp0_bk17, _tp0_bk18, _tp0_bk19, _tp0_bk25–_tp0_bk29 depending on configuration.

The output of python collect_env.py

Python: 3.12.3
PyTorch: 2.9.0+cu128
OS: Linux-6.17.0-1007-aws-x86_64-with-glibc2.39

Instance Type

trn1.32xlarge

Python Environment (pip list | grep -E "torch|neuron|nki|vllm|nxdi|nixl")

libneuronxla 2.2.14584.0+06ac23d1
neuronx-cc 2.22.12471.0+b4a00d10
neuronx-distributed 0.16.25997+f431c02e
neuronx-distributed-inference 0.7.15063+bafa28d5
optimum-neuron 0.4.3
torch 2.9.0
torch-neuronx 2.9.0.2.11.19912+e48cd891
torch-xla 2.9.0
torchaudio 2.9.0
torchvision 0.24.0
vllm 0.13.0
vllm-neuron 0.3.0

🐛 Describe the bug

neuronx-cc fails to compile context_encoding_model HLO for Llama-3.3-70B-Instruct using vllm-neuron 0.3.0 + NxDI 0.7 on trn1.32xlarge. The token_generation_model compiles fine — all buckets pass. But context_encoding_model consistently fails on several buckets with:

[INTERNAL_ERROR] [NCC_INLA001] Unhandled exception with message: 
[json.exception.type_error.302] type must be boolean, but is null

Tried TP=32/16, max_model_len=8192/4096 — same error every time. The bug is in the context_encoding HLO compilation path, not sensitive to TP size or sequence length.

# Minimal reproduction
from vllm import LLM, SamplingParams

# Run with: NEURON_RT_VISIBLE_CORES=0-31 VLLM_PLUGINS=neuron python test.py
llm = LLM(
    model="meta-llama/Llama-3.3-70B-Instruct",
    max_num_seqs=4,
    max_model_len=4096,
    block_size=32,
    num_gpu_blocks_override=1024,
    tensor_parallel_size=32,
)
outputs = llm.generate(["Hello, my name is"], SamplingParams(temperature=0.8))

Failed compiler invocation from logs:

neuronx-cc compile --framework=XLA \
  /tmp/nxd_model/context_encoding_model/_tp0_bk18/model.MODULE_*.hlo_module.pb \
  --target=trn1 --auto-cast=none --model-type=transformer \
  --tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma \
  --lnc=1 -O1 \
  --internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true

Failed buckets: _tp0_bk16_tp0_bk19, _tp0_bk25_tp0_bk29 depending on config.

Python: 3.12.3
PyTorch: 2.9.0+cu128
OS: Linux-6.17.0-1007-aws-x86_64-with-glibc2.39

trn1.32xlarge

libneuronxla 2.2.14584.0+06ac23d1
neuronx-cc 2.22.12471.0+b4a00d10
neuronx-distributed 0.16.25997+f431c02e
neuronx-distributed-inference 0.7.15063+bafa28d5
optimum-neuron 0.4.3
torch 2.9.0
torch-neuronx 2.9.0.2.11.19912+e48cd891
torch-xla 2.9.0
torchaudio 2.9.0
torchvision 0.24.0
vllm 0.13.0
vllm-neuron 0.3.0

Before submitting a new issue...

  • Make sure you already searched for relevant issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions