Skip to content

Initializing and listing devices crashes #1320

@wsmoses

Description

@wsmoses

Describe the bug

ubuntu@ip-172-31-40-172:~/Reactant.jl$ TF_CPP_MIN_VLOG_LEVEL=3 TF_CPP_MIN_LOG_LEVEL=0 TF_CPP_MAX_VLOG_LEVEL=3 julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.9 (2026-02-06)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Libdl
       # Load the Python library globally

julia> Libdl.dlopen("/usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so", Libdl.RTLD_GLOBAL)
       # Initialize the Python interpreter
Ptr{Nothing} @0x000000003f80df30

julia> ccall(:Py_Initialize, Cvoid, ())

julia> println("Python initialized successfully")
Python initialized successfully

julia> using Reactant

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1777926420.394462    9050 ffi_registry.cc:128] Register XLA FFI handler for 'enzymexla_compile_gpu'; platform=CUDA (canonical=cuda), stages=[instantiate, prepare, initialize, execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394510    9050 ffi_registry.cc:128] Register XLA FFI handler for 'enzymexla_compile_gpu_with_error'; platform=CUDA (canonical=cuda), stages=[instantiate, prepare, initialize, execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394530    9050 ffi_registry.cc:128] Register XLA FFI handler for 'xla_throw_error'; platform=Host (canonical=host), stages=[execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394538    9050 ffi_registry.cc:128] Register XLA FFI handler for 'xla_always_throw_error'; platform=Host (canonical=host), stages=[execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18
I0000 00:00:1777926420.394546    9050 ffi_registry.cc:128] Register XLA FFI handler for 'reactant_julia_callback'; platform=Host (canonical=host), stages=[execute], metadata={api_version: 0.3, traits: [], state: unknown}, registry=0x7974da8daf18

julia> Reactant.devices()
I0000 00:00:1777926420.647839    9050 parse_flags_from_env.cc:214] For env var XLA_FLAGS found arguments:
I0000 00:00:1777926420.647871    9050 parse_flags_from_env.cc:216]   argv[0] = <argv[0]>
I0000 00:00:1777926420.648973    9050 cpu_client.cc:330] PjRtCpuClient created.
2026-05-04 20:27:01.131647: I ./neuron/hlo_validator/hlo_validator_runner.h:220] Registering Verifier.....CollectivesComputeOrderVerifier

2026-05-04 20:27:01.131686: I ./neuron/hlo_validator/hlo_validator_runner.h:222] Registering Verifier.....HloSummarizer

2026-05-04 20:27:01.131694: I ./neuron/hlo_validator/hlo_validator_runner.h:223] Registering Verifier.....FsdpTpPassVerifier

2026-05-04 20:27:01.135261: I ./neuron/hlo_validator/hlo_validator_runner.h:220] Registering Verifier.....CollectivesComputeOrderVerifier

2026-05-04 20:27:01.135284: I ./neuron/hlo_validator/hlo_validator_runner.h:222] Registering Verifier.....HloSummarizer

2026-05-04 20:27:01.135296: I ./neuron/hlo_validator/hlo_validator_runner.h:223] Registering Verifier.....FsdpTpPassVerifier

I0000 00:00:1777926421.135374    9050 pjrt_api.cc:118] GetPjrtApi was found for trainium at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1777926421.135585    9050 neuronpjrt.cc:1569] Check failed: hook == nullptr (0x41181080 vs. (null)) code=3: RunNullaryVoidImpl: error condition !(py_module != __null): 
*** Check failure stack trace: ***
    @     0x79745242cbb1  absl::lts_20230802::log_internal::LogMessage::SendToLog()
    @     0x79745242d089  absl::lts_20230802::log_internal::LogMessageFatal::~LogMessageFatal()
    @     0x79744a8eda88  neuron::GetPjrtApi()
    @     0x7974cdbdd0a7  pjrt::LoadPjrtPlugin()
    @     0x7974cb290f34  LoadPjrtPlugin
    @     0x7974cb2912dc  MakeClientUsingPluginAPI
    @     0x79751450d502  (unknown)
    @     0x79751450d85d  (unknown)
    @     0x79751450dab3  (unknown)
    @     0x79751450db93  (unknown)
    @     0x79751450242f  (unknown)
    @     0x797514503de5  (unknown)
    @     0x797514504daf  (unknown)
    @     0x797514504df1  (unknown)
    @     0x7974a756df16  julia_initialize_default_clientsNOT._46697
    @     0x797514501aa1  (unknown)
    @     0x797514501b40  (unknown)
    @     0x797520870335  do_call
    @     0x79752086fdfd  eval_value
    @     0x797520870f68  eval_body
    @     0x797520871b0e  jl_interpret_toplevel_thunk
    @     0x79752088e2ce  jl_toplevel_eval_flex
    @     0x79752088ec1a  jl_toplevel_eval_flex
    @     0x79752088fc86  ijl_toplevel_eval_in
    @     0x7974eaa90ba8  japi1_eval_user_input_10119.2
    @     0x7974ea34f051  julia_repl_backend_loop_10154.2
    @     0x7974ea992908  japi1_YY.start_repl_backendYY.59_10151.1
    @     0x7974eaa3b16f  japi1_start_repl_backend_10734.2
    @     0x7974ea6098f8  julia_YY.run_replYY.76_10235.1
    @     0x7974ea7f46c3  julia_run_repl_10226.1
    @     0x7974ea763c33  jfptr_run_repl_10227.1
    @     0x7974ea586da5  julia_YY.1152_14894.1
    @     0x7974ea9a9c88  jfptr_YY.1152_14895.1
    @     0x79752085f85a  jl_f__call_latest
    @     0x79750c9f5912  julia_run_main_repl_73611.2
    @     0x79750bff7b85  julia__start_73651.2
    @     0x79750ad8ef44  jfptr__start_73652.1
    @     0x7975208c5066  true_main
    @     0x7975208c5aff  jl_repl_entrypoint
    @           0x401089  main

[9050] signal 6 (-6): Aborted
in expression starting at REPL[6]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZN4absl12lts_2023080212log_internal10LogMessage21FailWithoutStackTraceEv at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4absl12lts_2023080212log_internal10LogMessage3DieEv at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4absl12lts_2023080212log_internal10LogMessage9SendToLogEv at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4absl12lts_2023080212log_internal15LogMessageFatalD1Ev at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN6neuron10GetPjrtApiEv.cold at /home/ubuntu/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/pjrt_plugin_trainium/libneuronpjrt.so (unknown line)
_ZN4pjrt14LoadPjrtPluginESt17basic_string_viewIcSt11char_traitsIcEES3_ at /home/ubuntu/.julia/artifacts/a50a9e76a1693cebca768009b78a09562f711ea9/lib/libReactantExtra.so (unknown line)
LoadPjrtPlugin at /home/ubuntu/.julia/artifacts/a50a9e76a1693cebca768009b78a09562f711ea9/lib/libReactantExtra.so (unknown line)
MakeClientUsingPluginAPI at /home/ubuntu/.julia/artifacts/a50a9e76a1693cebca768009b78a09562f711ea9/lib/libReactantExtra.so (unknown line)
MakeClientUsingPluginAPI at /home/ubuntu/Reactant.jl/src/mlir/libMLIR_h.jl:14299
MakeClientUsingPluginAPI at /home/ubuntu/Reactant.jl/src/xla/PJRT/Client.jl:101
#make_pjrt_client#1 at /home/ubuntu/Reactant.jl/src/accelerators/Trainium.jl:40 [inlined]
make_pjrt_client at /home/ubuntu/Reactant.jl/src/accelerators/Trainium.jl:26
unknown function (ip: 0x79751450db92)
#make_client#3 at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:71
make_client at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:60 [inlined]
#initialize_backends#4 at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:107
initialize_backends at /home/ubuntu/Reactant.jl/src/accelerators/Registration.jl:79
unknown function (ip: 0x797514504df0)
initialize_default_clients! at /home/ubuntu/Reactant.jl/src/xla/XLA.jl:284
getproperty at /home/ubuntu/Reactant.jl/src/xla/XLA.jl:96 [inlined]
default_backend at /home/ubuntu/Reactant.jl/src/xla/XLA.jl:164 [inlined]
devices at /home/ubuntu/Reactant.jl/src/Devices.jl:9
unknown function (ip: 0x797514501b3f)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
do_call at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:666
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/interpreter.c:824
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:943
jl_toplevel_eval_flex at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:261
repl_backend_loop at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:368
#start_repl_backend#59 at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:343
start_repl_backend at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:340
#run_repl#76 at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:500
run_repl at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:486
jfptr_run_repl_10227.1 at /home/ubuntu/.julia/juliaup/julia-1.11.9+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_CBMnm.so (unknown line)
#1152 at ./client.jl:439
jfptr_YY.1152_14895.1 at /home/ubuntu/.julia/juliaup/julia-1.11.9+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_CBMnm.so (unknown line)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:423
repl_main at ./client.jl:560 [inlined]
_start at ./client.jl:534
jfptr__start_73652.1 at /home/ubuntu/.julia/juliaup/julia-1.11.9+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci4-7/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x797521429d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 2223755 (Pool: 2223586; Big: 169); GC: 4
Aborted (core dumped)

cc @ptiede

Model Name

N/A

Describe the workload type

setup

Instance Type

trn1.2xlarge

Release version

ubuntu@ip-172-31-40-172:~/Reactant.jl$ apt list --installed | grep -i -e neuron
pip list | grep -i -e neuron -e torch -e transformers -e jax

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/unknown,now 2.31.24.0-1a31ba186 amd64 [installed]
aws-neuronx-dkms/unknown,now 2.27.4.0 all [installed]
aws-neuronx-runtime-lib/unknown,now 2.31.24.0-0b044f4ce amd64 [installed]
aws-neuronx-tools/unknown,now 2.29.22.0-b486b0ade amd64 [installed]
Command 'pip' not found, but can be installed with:
sudo apt install python3-pip

Reproduction Steps

Download julia 1.11, git clone EnzymeAD/Reactant.jl#2852, follow the step above

Regression Issue

  • Select this option if this issue appears to be a regression.

Possible Solution

No response

Logs/Context/Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions