Regarding the multiprocessing problem we have arrived at a place where the idea of minimal changes is more difficult than re-thinking the entire thing imo.
So I'm starting again 4 years ago when this was introduced: #848.
class H5NodeProxy(object):
def raw_keys(self):
# !!!!!! THIS CAN SEGFAULT when the node is h5py.File !!!!!!
for node in self.children:
yield node.name
class H5FileProxy(H5NodeProxy):
def raw_keys(self):
if isinstance(self.file, h5py.File):
# The root node is a native h5py.File object
file_path = self.file.filename
data_path = self.name
return HDF5Utils.safe_hdf5_group_keys(file_path, data_path=data_path)
else:
return super().raw_keys()
Note that the SEGFAULTs happen when looking at a file that is being written to (so online).
So in summary: when are getting the keys of an h5py.File instance we can get a SEGFAULT and we want to protect ourselves from it.
Protecting against SEGFAULTs can only be done by using a subprocesses. We cannot pass the actual h5py.File object so we need to pass the file_path and data_path (which is always going to be "/" in this case).
The reason when we don't expect safe_hdf5_group_keys to raise an error is that we already have self.file which is an opened file object. So any failure would be most likely due to a SEGFAULT. I think this is where the bad design starts to creep in because this assumption makes the implementation of safe_hdf5_group_keys unclear.
So if we forget about what calls this and just look a a function that protect against crashing the main process but other wide behaves like normal h5py (you get exceptions when HDF5 raises execptions):
def safe_hdf5_group_keys(file_path, data_path="/"):
"""Return a list of keys for the HDF5 group or [] if the process crashes (segfault)."""
return _run_with_segfault_protection(
_get_hdf5_group_keys,
file_path,
data_path,
default=[],
)
def _get_hdf5_group_keys(file_path, data_path):
import h5py
with h5py.File(file_path, "r") as f:
return list(f[data_path].keys())
def _run_with_segfault_protection(func, *args, default=None, timeout=5, **kwargs):
"""
Run `func(*args, **kwargs)` in a subprocess to isolate crashes.
- Returns result if subprocess succeeds
- Returns `default` if subprocess crashes or hangs
- Falls back to direct execution if multiprocessing unavailable
"""
queue = None
p = None
try:
import multiprocessing
ctx = multiprocessing.get_context("spawn")
queue = ctx.Queue(maxsize=1)
p = ctx.Process(
target=_subprocess_worker,
args=(queue, func, args, kwargs),
)
p.start()
p.join(timeout)
# Subprocess still alive? force kill
if p.is_alive():
try:
p.terminate() # graceful attempt
except Exception:
pass
try:
p.kill() # forceful attempt
except Exception:
pass
p.join()
return default
# If subprocess crashed
if p.exitcode != 0:
return default
try:
status, payload = queue.get(timeout=1)
except Exception:
return default
if status == "ok":
return payload
else:
raise payload
except Exception:
# Fallback: multiprocessing unavailable or not working
return func(*args, **kwargs)
finally:
# Safe cleanup
if queue is not None:
try:
queue.close()
except Exception:
pass
if p is not None:
try:
p.close()
except Exception:
pass
def _subprocess_worker(queue, func, args, kwargs):
try:
result = func(*args, **kwargs)
queue.put(("ok", result))
except Exception as exc:
# We treat Python exceptions as normal results
queue.put(("error", exc))
The big difference with the current implementation is that we make a different between a normal python exception and a SEGFAULT.
Regarding the
multiprocessingproblem we have arrived at a place where the idea of minimal changes is more difficult than re-thinking the entire thing imo.So I'm starting again 4 years ago when this was introduced: #848.
Note that the SEGFAULTs happen when looking at a file that is being written to (so online).
So in summary: when are getting the keys of an
h5py.Fileinstance we can get a SEGFAULT and we want to protect ourselves from it.Protecting against SEGFAULTs can only be done by using a subprocesses. We cannot pass the actual
h5py.Fileobject so we need to pass thefile_pathanddata_path(which is always going to be "/" in this case).The reason when we don't expect
safe_hdf5_group_keysto raise an error is that we already have self.file which is an opened file object. So any failure would be most likely due to a SEGFAULT. I think this is where the bad design starts to creep in because this assumption makes the implementation ofsafe_hdf5_group_keysunclear.So if we forget about what calls this and just look a a function that protect against crashing the main process but other wide behaves like normal h5py (you get exceptions when HDF5 raises execptions):
The big difference with the current implementation is that we make a different between a normal python exception and a SEGFAULT.