Skip to content

Commit b5861a4

Browse files
committed
Add Python bindings for hipFile (GPU-direct storage on ROCm)
ctypes-based Python package mirroring the cufile API so that LMCache can use hipFile as a drop-in replacement on ROCm. - Low-level bindings (bindings.py) wrapping libhipfile.so - High-level API with CuFile/CuFileDriver context managers - buf_register/buf_deregister helpers matching cufile interface - Mock-based test suite (34 tests, runs without AMD hardware) - PyTorch integration example Ref: #201 Signed-off-by: Boris Glimcher <Boris.Glimcher@emc.com>
1 parent 88b004c commit b5861a4

9 files changed

Lines changed: 1238 additions & 0 deletions

File tree

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,9 @@ build/
66

77
# Ignore .cache directory generated by clangd
88
.cache/
9+
10+
# Python
11+
__pycache__/
12+
*.pyc
13+
.pytest_cache/
14+
*.egg-info/

python/README.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# hipfile – Python bindings for AMD hipFile
2+
3+
Python `ctypes`-based bindings for [AMD hipFile](https://github.com/ROCm/hipFile),
4+
the ROCm equivalent of NVIDIA's cuFile, enabling **GPU-direct storage** – data
5+
movement directly between NVMe/filesystem storage and GPU memory, bypassing CPU
6+
staging buffers.
7+
8+
> **Status:** Early-stage community bindings, tracking
9+
> [ROCm/hipFile#201](https://github.com/ROCm/hipFile/issues/201).
10+
11+
---
12+
13+
## Requirements
14+
15+
- Linux (x86_64 or aarch64)
16+
- ROCm installed (tested with ROCm 6.x)
17+
- hipFile library built and installed from [ROCm/hipFile](https://github.com/ROCm/hipFile)
18+
- Python 3.8+
19+
20+
---
21+
22+
## Installation
23+
24+
```bash
25+
# From source
26+
git clone https://github.com/ROCm/hipFile.git
27+
cd hipFile/python
28+
pip install -e .
29+
```
30+
31+
Make sure `libhipfile.so` is on your `LD_LIBRARY_PATH`, or set:
32+
33+
```bash
34+
export HIPFILE_LIB_PATH=/opt/rocm/lib/libhipfile.so
35+
```
36+
37+
---
38+
39+
## Quick start
40+
41+
```python
42+
import os
43+
import hipfile
44+
import ctypes
45+
46+
# --- Open the driver ---
47+
with hipfile.CuFileDriver():
48+
49+
# Open a file with O_DIRECT for best performance
50+
fd = os.open("data.bin", os.O_RDONLY | os.O_DIRECT)
51+
try:
52+
byte_size = 4 * 1024 * 1024 # 4 MB
53+
54+
# Allocate GPU memory (example using PyTorch)
55+
import torch
56+
tensor = torch.empty(1024 * 1024, dtype=torch.float32, device="cuda")
57+
gpu_ptr = tensor.data_ptr()
58+
59+
# Register the GPU buffer, then do the I/O
60+
hipfile.buf_register(gpu_ptr, byte_size)
61+
try:
62+
with hipfile.CuFile("data.bin", "r") as f:
63+
n = f.read(ctypes.c_void_p(gpu_ptr), byte_size, file_offset=0)
64+
print(f"Read {n} bytes directly into GPU memory")
65+
finally:
66+
hipfile.buf_deregister(gpu_ptr)
67+
68+
finally:
69+
os.close(fd)
70+
```
71+
72+
---
73+
74+
## API overview
75+
76+
### Driver lifecycle
77+
78+
```python
79+
hipfile.hipFileDriverOpen() # initialise the hipFile driver
80+
hipfile.hipFileDriverClose() # tear down
81+
82+
props = hipfile.hipFileDriverGetProperties() # returns hipFileDriverProps_t
83+
print(props.major_version, props.minor_version)
84+
85+
hipfile.hipFileDriverSetMaxDirectIOSize(128) # KB
86+
hipfile.hipFileDriverSetMaxCacheSize(512) # KB
87+
hipfile.hipFileDriverSetMaxPinnedMemSize(256) # KB
88+
```
89+
90+
### Context managers
91+
92+
```python
93+
with hipfile.CuFileDriver(): # open / close driver
94+
# Register GPU buffer
95+
hipfile.buf_register(ptr, size)
96+
try:
97+
with hipfile.CuFile("data.bin", "r+") as f: # open / close file
98+
f.read(ptr, count=size, file_offset=0)
99+
f.write(ptr, count=size, file_offset=0)
100+
finally:
101+
hipfile.buf_deregister(ptr)
102+
```
103+
104+
### Buffer registration
105+
106+
```python
107+
# Direct function calls
108+
hipfile.buf_register(gpu_ptr, size, flags=0)
109+
hipfile.buf_deregister(gpu_ptr)
110+
111+
# Note: RegisteredBuffer context manager is planned for future release
112+
```
113+
114+
### Error handling
115+
116+
```python
117+
try:
118+
hipfile.hipFileDriverOpen()
119+
except hipfile.HipFileError as e:
120+
print(f"HipFile error occurred")
121+
```
122+
123+
---
124+
125+
## Running the tests
126+
127+
The test suite uses mocks and runs without real hardware:
128+
129+
```bash
130+
pip install pytest
131+
pytest tests/ -v
132+
```
133+
134+
For LMCache integration testing:
135+
136+
```bash
137+
python test_lmcache_integration.py
138+
```
139+
140+
---
141+
142+
## PyTorch example
143+
144+
```bash
145+
python examples/pytorch_example.py --create --count 1048576
146+
```
147+
148+
---
149+
150+
## LMCache Integration
151+
152+
hipFile Python bindings are designed to be a drop-in replacement for NVIDIA's cuFile in applications like LMCache:
153+
154+
```python
155+
# Works with both cuFile and hipFile
156+
try:
157+
import cufile as gds_lib
158+
except ImportError:
159+
import hipfile as gds_lib
160+
161+
# Same API for both
162+
with gds_lib.CuFileDriver():
163+
gds_lib.buf_register(tensor.data_ptr(), tensor.nbytes)
164+
# ... perform GDS operations ...
165+
```
166+
167+
---
168+
169+
## How it works
170+
171+
hipFile provides a C API for GPU-direct I/O on AMD ROCm hardware. These Python
172+
bindings use `ctypes` to call `libhipfile.so` directly, with no C compilation
173+
needed. The binding layer:
174+
175+
1. Loads `libhipfile.so` at import time (lazy, configurable via `HIPFILE_LIB_PATH`).
176+
2. Declares `argtypes` / `restype` for each API function.
177+
3. Wraps the C types in Pythonic classes with context-manager support.
178+
4. Translates error status codes to `HipFileError` exceptions.
179+
5. Provides cuFile-compatible API for easy migration.
180+
181+
---
182+
183+
## Known Limitations
184+
185+
- `RegisteredBuffer` context manager is not yet implemented (use direct `buf_register`/`buf_deregister` calls)
186+
- Some advanced cuFile features may not yet have hipFile equivalents
187+
- Error reporting is less detailed than NVIDIA's cuFile
188+
189+
---
190+
191+
## Contributing
192+
193+
PRs welcome! The main tracking issue for official bindings is
194+
[ROCm/hipFile#201](https://github.com/ROCm/hipFile/issues/201).

python/examples/pytorch_example.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
"""
2+
examples/pytorch_example.py
3+
----------------------------
4+
Demonstrates loading a tensor from disk directly into GPU memory using
5+
hipFile + PyTorch on an AMD ROCm system.
6+
7+
Requirements:
8+
- AMD GPU with ROCm installed
9+
- hipFile library (see https://github.com/ROCm/hipFile)
10+
- PyTorch with ROCm support (pip install torch --index-url https://download.pytorch.org/whl/rocm6.1)
11+
- This package: pip install hipfile (or python -m pip install -e .)
12+
13+
Usage::
14+
15+
python pytorch_example.py --file /path/to/float32_tensor.bin --count 1048576
16+
"""
17+
18+
import argparse
19+
import os
20+
import struct
21+
import time
22+
23+
import hipfile
24+
25+
26+
def create_test_file(path: str, n_floats: int = 1024 * 1024) -> None:
27+
"""Write n_floats random float32 values to a file for testing."""
28+
import random
29+
data = struct.pack(f"{n_floats}f", *[random.random() for _ in range(n_floats)])
30+
with open(path, "wb") as f:
31+
f.write(data)
32+
print(f"Created test file: {path} ({len(data)} bytes)")
33+
34+
35+
def load_tensor_gpu_direct(filepath: str, n_floats: int):
36+
"""Load float32 data from file directly into a GPU tensor via hipFile."""
37+
try:
38+
import torch
39+
except ImportError:
40+
print("PyTorch not available – skipping GPU demo.")
41+
return
42+
43+
if not torch.cuda.is_available():
44+
print("No GPU available (torch.cuda.is_available() = False).")
45+
return
46+
47+
dtype = torch.float32
48+
byte_size = n_floats * dtype.itemsize
49+
50+
# 1. Allocate device tensor
51+
tensor = torch.empty(n_floats, dtype=dtype, device="cuda")
52+
dev_ptr = tensor.data_ptr() # raw integer GPU pointer
53+
54+
# 2. Initialise hipFile driver and register the GPU buffer
55+
hipfile.CuFileDriver()
56+
hipfile.buf_register(dev_ptr, byte_size)
57+
try:
58+
with hipfile.CuFile(filepath, "r", use_direct_io=True) as hf:
59+
t0 = time.perf_counter()
60+
n_read = hf.read(dev_ptr, byte_size, file_offset=0)
61+
elapsed = time.perf_counter() - t0
62+
finally:
63+
hipfile.buf_deregister(dev_ptr)
64+
65+
bw_gb = (n_read / elapsed) / 1e9
66+
print(f"GPU-direct read: {n_read} bytes in {elapsed*1000:.2f} ms "
67+
f"({bw_gb:.2f} GB/s)")
68+
print(f"Tensor first 5 values: {tensor[:5].tolist()}")
69+
70+
71+
def main():
72+
parser = argparse.ArgumentParser(description="hipFile + PyTorch example")
73+
parser.add_argument("--file", default="/tmp/test_hipfile.bin",
74+
help="Path to float32 binary file")
75+
parser.add_argument("--count", type=int, default=1024 * 1024,
76+
help="Number of float32 values")
77+
parser.add_argument("--create", action="store_true",
78+
help="Create a test file before reading")
79+
args = parser.parse_args()
80+
81+
if args.create or not os.path.exists(args.file):
82+
create_test_file(args.file, args.count)
83+
84+
load_tensor_gpu_direct(args.file, args.count)
85+
86+
87+
if __name__ == "__main__":
88+
main()

python/hipfile/__init__.py

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
"""
2+
hipfile – Python bindings for AMD hipFile (GPU-direct storage).
3+
4+
Drop-in replacement for the cufile Python package.
5+
Mirrors cufile/__init__.py structure with hip* naming.
6+
7+
Quick start::
8+
9+
import hipfile
10+
11+
with hipfile.CuFile("data.bin", "r+") as f:
12+
f.write(ctypes.c_void_p(gpu_ptr), size, file_offset=0)
13+
"""
14+
15+
# High-level (mirrors cufile.cufile exports)
16+
from .hipfile import (
17+
CuFile,
18+
CuFileDriver,
19+
buf_register,
20+
buf_deregister,
21+
)
22+
23+
# Low-level convenience functions (mirrors cufile.bindings exports)
24+
from .bindings import (
25+
HipFileError,
26+
hipFileDriverOpen,
27+
hipFileDriverClose,
28+
hipFileHandleRegister,
29+
hipFileHandleDeregister,
30+
hipFileBufRegister,
31+
hipFileBufDeregister,
32+
hipFileRead,
33+
hipFileWrite,
34+
hipFileHandle_t,
35+
hipFileStatus,
36+
hipFileDescr,
37+
DescrUnion,
38+
)
39+
40+
__version__ = "0.1.0"
41+
42+
__all__ = [
43+
# High-level
44+
"CuFile",
45+
"CuFileDriver",
46+
"buf_register",
47+
"buf_deregister",
48+
"HipFileError",
49+
# Low-level
50+
"hipFileDriverOpen",
51+
"hipFileDriverClose",
52+
"hipFileHandleRegister",
53+
"hipFileHandleDeregister",
54+
"hipFileBufRegister",
55+
"hipFileBufDeregister",
56+
"hipFileRead",
57+
"hipFileWrite",
58+
"hipFileHandle_t",
59+
"hipFileStatus",
60+
"hipFileDescr",
61+
"DescrUnion",
62+
]

0 commit comments

Comments
 (0)