Skip to content

BUG: scitex.io.save() dispatch ignores registered .parquet and .feather savers #276

@ywatanabe1989

Description

@ywatanabe1989

Summary

scitex.io.save(df, "foo.parquet") (and .feather) warns "Unsupported file format" and silently does nothing, even though standalone scitex-io registers both extensions and scitex_io.get_saver('.parquet') returns the valid _save_parquet function.

The umbrella's save() dispatcher (in scitex/io/_save.py) is not consulting the same registry that list_formats() / get_saver() consult.

Reproducer

```python
import scitex as stx, pandas as pd, os, tempfile

Standalone registry says: parquet is supported, get_saver returns a function

print('.parquet listed:', '.parquet' in stx.io.list_formats()['save']['builtin']) # True
print(stx.io.get_saver('.parquet')) # <function _save_parquet at ...>

Umbrella save() does not dispatch to it

with tempfile.TemporaryDirectory() as td:
p = os.path.join(td, "x.parquet")
stx.io.save(pd.DataFrame({'a':[1.0]}), p, verbose=True)
# WARN: Unsupported file format. .../x.parquet was not saved.
assert not os.path.exists(p)

Feather: same pattern

with tempfile.TemporaryDirectory() as td:
p = os.path.join(td, "x.feather")
stx.io.save(pd.DataFrame({'a':[1.0]}), p, verbose=True)
assert not os.path.exists(p)
```

Likely root cause

scitex/io/_save.py carries its own extension→saver mapping (used by the umbrella's save() dispatcher) which has not been updated to match scitex_io's registry after the standalonization. Three sources of truth now exist:

  1. scitex_io.list_formats()['save']['builtin'] — includes .parquet, .feather
  2. scitex_io.get_saver('.parquet') → returns _save_parquet
  3. Umbrella _save.py dispatcher → unknown — falls through to "Unsupported"

Suggested fix

Have scitex.io.save() delegate dispatch entirely to scitex_io.get_saver(ext):

```python
ext = _os.path.splitext(spath)[1].lower()
saver = scitex_io.get_saver(ext)
if saver is None:
print(f"WARN: Unsupported file format. {spath} was not saved.")
return
saver(obj, spath_final, **kwargs)
```

This makes the umbrella delegate to the single source of truth and prevents future drift.

Impact

Hit while building a feature-extraction pipeline that needed parquet for ~4 000 small DataFrames per patient cohort. Pivoted to .pkl as workaround. The silent dispatch failure is dangerous because subsequent stx.io.load(<path>) raises FileNotFoundError far from the actual write-site failure.

Cross-ref

Originally filed (and closed as misdirected) at scitex-io#25 and scitex-io#26.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions