PJRT_Client_CreateBuffersForAsyncHostToDevice is present but not implemented

### Describe the bug

Neuron PJRT appears to expose the PJRT async host-to-device transfer manager API entries, but calling `PJRT_Client_CreateBuffersForAsyncHostToDevice` fails.

This prevents frameworks from using the standard PJRT streaming host-to-device upload path:
- Create destination device buffers,
- Stream host data into them in chunks,
- Retrieve the resulting `PJRT_Buffer`.

The goal is to avoid full host-side bufferization of each tensor. For large checkpoints, requiring a complete tensor to be materialized in host memory before upload adds extra memory pressure, copies, and startup latency.

Instead, uploads have to fall back to whole-buffer paths such as `PJRT_Client_BufferFromHostBuffer`.

### Model Name

N/A. The issue is independent of model execution and occurs during host-to-device buffer upload.

### Describe the workload type

Fast model loading

### Instance Type

inf2.8xlarge


### Release version

python=3.12.12
numpy=2.4.6
jax=0.7.0
jaxlib=0.7.0
jax-neuronx=0.7.0.1.0.8181+1e892be0
libneuronxla=3.0.2891.0+e2a4b1f5
neuronx-cc=2.25.3371.0+f524f7f8
aws-neuronx-runtime-lib=2.32.31.0-0234f5ed2

### Reproduction Steps

```
$ strings libneuronpjrt.so | rg 'CreateBuffersForAsyncHostToDevice|AsyncHostToDeviceTransferManager|BufferFromHostBuffer'
CreateBuffersForAsyncHostToDevice with ShapeSpec and Layout is not implemented on platform:
PJRT_Client_CreateBuffersForAsyncHostToDevice
PJRT_AsyncHostToDeviceTransferManager_Destroy
PJRT_AsyncHostToDeviceTransferManager_TransferData
PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer
PJRT_AsyncHostToDeviceTransferManager_Device
PJRT_AsyncHostToDeviceTransferManager_BufferCount
PJRT_AsyncHostToDeviceTransferManager_BufferSize
PJRT_AsyncHostToDeviceTransferManager_SetBufferError
PJRT_AsyncHostToDeviceTransferManager_AddMetadata
BufferFromHostBuffer with PjRtMemorySpace is not implemented on platform:
PJRT_Client_BufferFromHostBuffer
```

### Regression Issue

- [ ] Select this option if this issue appears to be a regression.

### Possible Solution

Implement support for:

- `PJRT_Client_CreateBuffersForAsyncHostToDevice`
- `PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer`
- `PJRT_AsyncHostToDeviceTransferManager_TransferData`
- `PJRT_AsyncHostToDeviceTransferManager_RetrieveBuffer`
- `PJRT_AsyncHostToDeviceTransferManager_Device`
- `PJRT_AsyncHostToDeviceTransferManager_BufferCount`
- `PJRT_AsyncHostToDeviceTransferManager_BufferSize`
- `PJRT_AsyncHostToDeviceTransferManager_Destroy`


If async H2D transfer manager support is intentionally unavailable on Neuron, please document the recommended PJRT-level alternative for streaming large host tensors into Neuron device memory without materializing and uploading whole buffers via `PJRT_Client_BufferFromHostBuffer`.

### Logs/Context/Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PJRT_Client_CreateBuffersForAsyncHostToDevice is present but not implemented #1337

Describe the bug

Model Name

Describe the workload type

Instance Type

Release version

Reproduction Steps

Regression Issue

Possible Solution

Logs/Context/Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PJRT_Client_CreateBuffersForAsyncHostToDevice is present but not implemented #1337

Description

Describe the bug

Model Name

Describe the workload type

Instance Type

Release version

Reproduction Steps

Regression Issue

Possible Solution

Logs/Context/Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions