Serialize subject-level indexer by gkiar · Pull Request #85 · childmindresearch/bids2table

gkiar · 2026-05-20T21:00:03Z

Summary

Refactored the indexer to shift parallelism from subject-level to dataset-level, simplifying the architecture for future generalization.

Changes

bids2table/_indexing.py:
- Removed max_workers, chunksize, executor_cls from index_dataset signature
- Replaced _pmap with sequential for loop in index_dataset body
- Updated docstring to reflect sequential subject indexing
bids2table/_pathlib.py:
- Generalized exception handling to catch all exceptions (not just ImportError) when importing/configuring cloudpathlib
- Falls back to pathlib.Path when cloudpathlib clients fail to initialize
bids2table/__main__.py:
- Removed parallelism arguments from index_dataset call in CLI
- Kept batch_index_dataset parallelism unchanged (dataset-level only)
tests/conftest.py:
- Added pytest hook to skip cloud tests when cloudpathlib isn't available or functional
tests/pybids/test_layout.py:
- Picked a better dataset, since previous choice didn't test session-level functionality.
pyproject.toml:
- Updated author list to be more modern and accurate.

Rationale

Subject-level parallelism provided minimal efficiency gains while constraining future architectural generalization (such as supporting indexing of bids datasets that do not start with subject-level directories)
Dataset-level parallelism via batch_index_dataset remains intact
Cloudpathlib exception handling now gracefully handles runtime configuration failures (not just import failures)

…ured

github-actions · 2026-05-20T21:01:21Z

Coverage Report

File	Stmts	Miss	Cover	Missing
__init__.py	7	0	100%
__main__.py	64	5	92%	101, 127, 155, 159, 163
_entities.py	112	1	99%	129
_indexing.py	211	5	97%	150, 159–160, 386, 424
_logging.py	31	4	87%	30, 37, 39–40
_metadata.py	48	4	91%	39–40, 66, 71
_pathlib.py	17	3	82%	12–13, 15
_version.py	11	0	100%
pybids
__init__.py	4	0	100%
_bidsfile.py	38	13	65%	71–73, 77–79, 83–85, 89–91, 95
_layout.py	156	45	71%	63, 72, 81, 104, 114–115, 118, 140–141, 156–157, 173–174, 177–181, 186, 188–189, 192–193, 228, 233, 241, 322–324, 389–394, 396, 399–404, 406, 462, 482
_utils.py	13	5	61%	47–50, 52
TOTAL	712	85	88%

Tests	Skipped	Failures	Errors	Time
100	0 💤	0 ❌	0 🔥	15.636s ⏱️

kaitj · 2026-05-21T15:11:17Z

Will take a closer look at this soon, but exception handling for the cloud paths should be more gracefully handled now. Stemmed from having both s3 and cloud extra dependencies, so it was possible that gcs dependencies never got installed if only running pip install bids2table[s3].

(s3 only installation will be deprecated in next major release, so this will also no longer be an issue)

gkiar added 4 commits May 20, 2026 16:53

serialized subject-level indexing

1da91c0

generalized cloudpath failures if the lib is installed but not config…

24b6f39

…ured

fixed bug in test to use ds that has sessions

ca9654e

updated author list

f8c966b

gkiar requested a review from kaitj May 20, 2026 21:00

gkiar mentioned this pull request May 21, 2026

[WIP] Generalize entities #86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize subject-level indexer#85

Serialize subject-level indexer#85
gkiar wants to merge 4 commits into
mainfrom
serial-index

gkiar commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

kaitj commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gkiar commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Rationale

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

kaitj commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gkiar commented May 20, 2026 •

edited

Loading