Skip to content

Serialize subject-level indexer#85

Open
gkiar wants to merge 4 commits into
mainfrom
serial-index
Open

Serialize subject-level indexer#85
gkiar wants to merge 4 commits into
mainfrom
serial-index

Conversation

@gkiar
Copy link
Copy Markdown
Collaborator

@gkiar gkiar commented May 20, 2026

Summary

Refactored the indexer to shift parallelism from subject-level to dataset-level, simplifying the architecture for future generalization.

Changes

  • bids2table/_indexing.py:
    • Removed max_workers, chunksize, executor_cls from index_dataset signature
    • Replaced _pmap with sequential for loop in index_dataset body
    • Updated docstring to reflect sequential subject indexing
  • bids2table/_pathlib.py:
    • Generalized exception handling to catch all exceptions (not just ImportError) when importing/configuring cloudpathlib
    • Falls back to pathlib.Path when cloudpathlib clients fail to initialize
  • bids2table/__main__.py:
    • Removed parallelism arguments from index_dataset call in CLI
    • Kept batch_index_dataset parallelism unchanged (dataset-level only)
  • tests/conftest.py:
    • Added pytest hook to skip cloud tests when cloudpathlib isn't available or functional
  • tests/pybids/test_layout.py:
    • Picked a better dataset, since previous choice didn't test session-level functionality.
  • pyproject.toml:
    • Updated author list to be more modern and accurate.

Rationale

  • Subject-level parallelism provided minimal efficiency gains while constraining future architectural generalization (such as supporting indexing of bids datasets that do not start with subject-level directories)
  • Dataset-level parallelism via batch_index_dataset remains intact
  • Cloudpathlib exception handling now gracefully handles runtime configuration failures (not just import failures)

@gkiar gkiar requested a review from kaitj May 20, 2026 21:00
@github-actions
Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py70100% 
__main__.py64592%101, 127, 155, 159, 163
_entities.py112199%129
_indexing.py211597%150, 159–160, 386, 424
_logging.py31487%30, 37, 39–40
_metadata.py48491%39–40, 66, 71
_pathlib.py17382%12–13, 15
_version.py110100% 
pybids
   __init__.py40100% 
   _bidsfile.py381365%71–73, 77–79, 83–85, 89–91, 95
   _layout.py1564571%63, 72, 81, 104, 114–115, 118, 140–141, 156–157, 173–174, 177–181, 186, 188–189, 192–193, 228, 233, 241, 322–324, 389–394, 396, 399–404, 406, 462, 482
   _utils.py13561%47–50, 52
TOTAL7128588% 

Tests Skipped Failures Errors Time
100 0 💤 0 ❌ 0 🔥 15.636s ⏱️

@kaitj
Copy link
Copy Markdown
Contributor

kaitj commented May 21, 2026

Will take a closer look at this soon, but exception handling for the cloud paths should be more gracefully handled now. Stemmed from having both s3 and cloud extra dependencies, so it was possible that gcs dependencies never got installed if only running pip install bids2table[s3].

(s3 only installation will be deprecated in next major release, so this will also no longer be an issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants