Skip to content

PJRT_Client_CreateBuffersForAsyncHostToDevice is present but not implemented #1337

@hugomano

Description

@hugomano

Describe the bug

Neuron PJRT appears to expose the PJRT async host-to-device transfer manager API entries, but calling PJRT_Client_CreateBuffersForAsyncHostToDevice fails.

This prevents frameworks from using the standard PJRT streaming host-to-device upload path:

  • Create destination device buffers,
  • Stream host data into them in chunks,
  • Retrieve the resulting PJRT_Buffer.

The goal is to avoid full host-side bufferization of each tensor. For large checkpoints, requiring a complete tensor to be materialized in host memory before upload adds extra memory pressure, copies, and startup latency.

Instead, uploads have to fall back to whole-buffer paths such as PJRT_Client_BufferFromHostBuffer.

Model Name

N/A. The issue is independent of model execution and occurs during host-to-device buffer upload.

Describe the workload type

Fast model loading

Instance Type

inf2.8xlarge

Release version

python=3.12.12
numpy=2.4.6
jax=0.7.0
jaxlib=0.7.0
jax-neuronx=0.7.0.1.0.8181+1e892be0
libneuronxla=3.0.2891.0+e2a4b1f5
neuronx-cc=2.25.3371.0+f524f7f8
aws-neuronx-runtime-lib=2.32.31.0-0234f5ed2

Reproduction Steps

$ strings libneuronpjrt.so | rg 'CreateBuffersForAsyncHostToDevice|AsyncHostToDeviceTransferManager|BufferFromHostBuffer'
CreateBuffersForAsyncHostToDevice with ShapeSpec and Layout is not implemented on platform:
PJRT_Client_CreateBuffersForAsyncHostToDevice
PJRT_AsyncHostToDeviceTransferManager_Destroy
PJRT_AsyncHostToDeviceTransferManager_TransferData
PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
PJRT_AsyncHostToDeviceTransferManager_Device
PJRT_AsyncHostToDeviceTransferManager_BufferCount
PJRT_AsyncHostToDeviceTransferManager_BufferSize
PJRT_AsyncHostToDeviceTransferManager_SetBufferError
PJRT_AsyncHostToDeviceTransferManager_AddMetadata
BufferFromHostBuffer with PjRtMemorySpace is not implemented on platform:
PJRT_Client_BufferFromHostBuffer

Regression Issue

  • Select this option if this issue appears to be a regression.

Possible Solution

Implement support for:

  • PJRT_Client_CreateBuffersForAsyncHostToDevice
  • PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
  • PJRT_AsyncHostToDeviceTransferManager_TransferData
  • PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
  • PJRT_AsyncHostToDeviceTransferManager_Device
  • PJRT_AsyncHostToDeviceTransferManager_BufferCount
  • PJRT_AsyncHostToDeviceTransferManager_BufferSize
  • PJRT_AsyncHostToDeviceTransferManager_Destroy

If async H2D transfer manager support is intentionally unavailable on Neuron, please document the recommended PJRT-level alternative for streaming large host tensors into Neuron device memory without materializing and uploading whole buffers via PJRT_Client_BufferFromHostBuffer.

Logs/Context/Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions