Describe the bug
Neuron PJRT appears to expose the PJRT async host-to-device transfer manager API entries, but calling PJRT_Client_CreateBuffersForAsyncHostToDevice fails.
This prevents frameworks from using the standard PJRT streaming host-to-device upload path:
- Create destination device buffers,
- Stream host data into them in chunks,
- Retrieve the resulting
PJRT_Buffer.
The goal is to avoid full host-side bufferization of each tensor. For large checkpoints, requiring a complete tensor to be materialized in host memory before upload adds extra memory pressure, copies, and startup latency.
Instead, uploads have to fall back to whole-buffer paths such as PJRT_Client_BufferFromHostBuffer.
Model Name
N/A. The issue is independent of model execution and occurs during host-to-device buffer upload.
Describe the workload type
Fast model loading
Instance Type
inf2.8xlarge
Release version
python=3.12.12
numpy=2.4.6
jax=0.7.0
jaxlib=0.7.0
jax-neuronx=0.7.0.1.0.8181+1e892be0
libneuronxla=3.0.2891.0+e2a4b1f5
neuronx-cc=2.25.3371.0+f524f7f8
aws-neuronx-runtime-lib=2.32.31.0-0234f5ed2
Reproduction Steps
$ strings libneuronpjrt.so | rg 'CreateBuffersForAsyncHostToDevice|AsyncHostToDeviceTransferManager|BufferFromHostBuffer'
CreateBuffersForAsyncHostToDevice with ShapeSpec and Layout is not implemented on platform:
PJRT_Client_CreateBuffersForAsyncHostToDevice
PJRT_AsyncHostToDeviceTransferManager_Destroy
PJRT_AsyncHostToDeviceTransferManager_TransferData
PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
PJRT_AsyncHostToDeviceTransferManager_Device
PJRT_AsyncHostToDeviceTransferManager_BufferCount
PJRT_AsyncHostToDeviceTransferManager_BufferSize
PJRT_AsyncHostToDeviceTransferManager_SetBufferError
PJRT_AsyncHostToDeviceTransferManager_AddMetadata
BufferFromHostBuffer with PjRtMemorySpace is not implemented on platform:
PJRT_Client_BufferFromHostBuffer
Regression Issue
Possible Solution
Implement support for:
PJRT_Client_CreateBuffersForAsyncHostToDevice
PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
PJRT_AsyncHostToDeviceTransferManager_TransferData
PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
PJRT_AsyncHostToDeviceTransferManager_Device
PJRT_AsyncHostToDeviceTransferManager_BufferCount
PJRT_AsyncHostToDeviceTransferManager_BufferSize
PJRT_AsyncHostToDeviceTransferManager_Destroy
If async H2D transfer manager support is intentionally unavailable on Neuron, please document the recommended PJRT-level alternative for streaming large host tensors into Neuron device memory without materializing and uploading whole buffers via PJRT_Client_BufferFromHostBuffer.
Logs/Context/Additional Information
No response
Describe the bug
Neuron PJRT appears to expose the PJRT async host-to-device transfer manager API entries, but calling
PJRT_Client_CreateBuffersForAsyncHostToDevicefails.This prevents frameworks from using the standard PJRT streaming host-to-device upload path:
PJRT_Buffer.The goal is to avoid full host-side bufferization of each tensor. For large checkpoints, requiring a complete tensor to be materialized in host memory before upload adds extra memory pressure, copies, and startup latency.
Instead, uploads have to fall back to whole-buffer paths such as
PJRT_Client_BufferFromHostBuffer.Model Name
N/A. The issue is independent of model execution and occurs during host-to-device buffer upload.
Describe the workload type
Fast model loading
Instance Type
inf2.8xlarge
Release version
python=3.12.12
numpy=2.4.6
jax=0.7.0
jaxlib=0.7.0
jax-neuronx=0.7.0.1.0.8181+1e892be0
libneuronxla=3.0.2891.0+e2a4b1f5
neuronx-cc=2.25.3371.0+f524f7f8
aws-neuronx-runtime-lib=2.32.31.0-0234f5ed2
Reproduction Steps
Regression Issue
Possible Solution
Implement support for:
PJRT_Client_CreateBuffersForAsyncHostToDevicePJRT_AsyncHostToDeviceTransferManager_RetrieveBufferPJRT_AsyncHostToDeviceTransferManager_TransferDataPJRT_AsyncHostToDeviceTransferManager_RetrieveBufferPJRT_AsyncHostToDeviceTransferManager_DevicePJRT_AsyncHostToDeviceTransferManager_BufferCountPJRT_AsyncHostToDeviceTransferManager_BufferSizePJRT_AsyncHostToDeviceTransferManager_DestroyIf async H2D transfer manager support is intentionally unavailable on Neuron, please document the recommended PJRT-level alternative for streaming large host tensors into Neuron device memory without materializing and uploading whole buffers via
PJRT_Client_BufferFromHostBuffer.Logs/Context/Additional Information
No response