[WIP] Generalize entities#86
Open
gkiar wants to merge 12 commits into
Open
Conversation
Coverage Report
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
…d cohort nesting working
… without descriptions
effigies
reviewed
May 21, 2026
| name = get_entity_name(entity_type) | ||
| if not name: | ||
| return "" | ||
| return f"{name}-[a-zA-Z0-9]+" |
Contributor
There was a problem hiding this comment.
If you want to be very correct, you can look up the format:
fmt = schema.objects.entities[entity_type].format
pattern = schema.objects.formats[fmt].pattern
return f'{name}-{pattern}'But the main thing that's actually missing here is + in labels:
Suggested change
| return f"{name}-[a-zA-Z0-9]+" | |
| return f"{name}-[a-zA-Z0-9+]+" |
effigies
reviewed
May 21, 2026
|
|
||
|
|
||
| @lru_cache(maxsize=None) | ||
| def _find_bidsignore(start: PathT) -> PathT | None: |
Contributor
There was a problem hiding this comment.
IDK if you actually want this. The BIDS validator only supports a root-level .bidsignore, though that is defined at the dataset root level, not a git-like per-directory ignore.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Downstream/On top of #85
PR Contribution Summary
This PR generalizes the indexing layer from subject-only discovery to entity-based crawling, and replaces hardcoded patterns with schema-derived logic wherever possible.
Architecture: Subject → Entity
_index_bids_subject_dir→_index_bids_entity_dir— indexes any entity directory (sub-*,tpl-*, etc.)_find_bids_subject_dirs→_find_bids_entity_dirs— discovers any entity type at a dataset root_is_bids_subject_dir→_is_bids_entity_dir— checks arbitrary entity type by name_entities.pyTemplate and Cohort Support
tpl-*andcohort-*directories are indexed alongsidesub-*_is_bids_datasetderivative checks look for subject OR template entity dirsSchema-Driven Discovery
get_entity_child_dirs(dataset_type, parent_rule)— reads valid entity subdirectories fromrules.directoriesget_file_entity_prefixes()— root-level entity name prefixes derived from schemaget_all_root_entity_types()— deduplicated root entity types across all dataset typesget_all_dataset_types()— enumerates schema-defined dataset types_BIDS_JSON_SIDECAR_EXCEPTION_SUFFIXES— derived fromrules.files(currentlycoordsystem,description)_BIDS_DATATYPE_PATTERN— built from entity names at schema init_ensure_dict()helper — centralizesbidsschematoolsNamespace→dict conversionDerivative Detection
_is_bids_dataset()and_get_dataset_type()detect derivative datasets withoutdataset_description.jsonby checking insidederivatives/for valid entity subdirectoriessub-*_ses-*directories (spec-invalid)Generic Filtering and .bidsignore
include_subjects→ generic filters dict mapping any entity name to glob patterns--filter/-fCLI argument replaces--subjects(deprecated, backward-compatible).bidsignoresupport via_is_bidsignoredwith cached upward searchbatch_index_datasetto workersDataset Metadata Columns
dataset_name,dataset_type,bids_versionadded to Arrow schema, populated fromdataset_description.jsonclear_schema_caches()exposed as public API for schema reload safetyCode Cleanup
get_all_entity_prefixes,get_required_entity_types_get_subdir_names()for oneOf expansion_read_dataset_descriptionwith@lru_cacheto deduplicate reads_resolve_entity_dirs— extracts entity discovery into_discover_entity_dirsTesting
test_derivative_detection— 5 scenarios including no-description derivatives and invalid combined entity dirstest_index_dataset_filters— single, multi-value, glob, and cross-entity AND filterstest_batch_index_dataset_filters— filter forwarding through parallel workerstest_index_dataset_bidsignore—.bidsignoreexclusion@templateflow_availabletest_is_bids_subject_dir→test_is_bids_entity_dirtest_find_bids_datasetsis now skipped (@pytest.mark.skip); therglob("dataset_description.json")baseline no longer matches the schema-correct derivative detectionImpact
We should now be less fragile in schema updates, and can correctly index derivative datasets using entity types other than subject and session (namely template and cohort), meaning this can be used across a wide range of the field's projects.