Skip to content

feat: Allow schema paths/objects to be passed to API functions#74

Open
effigies wants to merge 24 commits into
childmindresearch:mainfrom
effigies:rf/schemaadapter
Open

feat: Allow schema paths/objects to be passed to API functions#74
effigies wants to merge 24 commits into
childmindresearch:mainfrom
effigies:rf/schemaadapter

Conversation

@effigies
Copy link
Copy Markdown
Contributor

@effigies effigies commented May 4, 2026

This is a backwards-incompatible break of the current API.

Along the lines of #70 (comment), this introduces a public type:

SchemaSpec: TypeAlias = Namespace | str | PathT | None

This allows a schema reference or schema object to be passed to any of:

  • validate_bids_entities
  • get_arrow_schema
  • get_column_names
  • index_dataset
  • batch_index_dataset

These functions turn the schema spec into a BIDSSchemaAdapter object, which encapsulates the components that are used by bids2table, and should be extensible going forward:

@dataclass(frozen=True)
class BIDSSchemaAdapter:
    bids_version: str
    schema_version: str
    entity_schema: dict[str, dict[str, Any]] = field(hash=False)

The only current direct consumer is:

def entity_arrow_schema(adapter: BIDSSchemaAdapter) -> pa.Schema: ...

The only current consumer of pa.Schema is:

def _pyarrow_validate_entities(entities: dict[str, Any], *, pa_schema: pa.Schema) -> tuple[dict[str, BIDSValue], dict[str, Any]]: ...

Which is called by the dataset indexer and the public API function validate_bids_entities. The main reason for this indirection is that pa.Schema is passed to _index_dataset_subject_dir and I didn't want to refactor that more than necessary, but this seems like a good, future-thinking approach anyway: By pulling the validation metadata out of the pyarrow schema, a parquet file contains what is needed to validate itself, without needing to look up and load the original schema.

That said, that isn't done here, and I could refactor these functions to take a BIDSSchemaAdapter, which will save a round-trip of metadata into and out of pyarrow.Schema.

Closes #70.
Replaces #71.

@effigies effigies force-pushed the rf/schemaadapter branch from dce0fa0 to e8464c1 Compare May 4, 2026 02:31
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.66%. Comparing base (c1d67b2) to head (e8464c1).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #74      +/-   ##
==========================================
+ Coverage   95.15%   95.66%   +0.51%     
==========================================
  Files           7        8       +1     
  Lines         495      531      +36     
==========================================
+ Hits          471      508      +37     
+ Misses         24       23       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@effigies effigies marked this pull request as ready for review May 4, 2026 02:32
Copy link
Copy Markdown
Contributor

@kaitj kaitj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great. I think there is minimal to no difference in performance, but I'd like to add / perform a quick benchmark with this before any merging.

@nx10, any additional thoughts, comments?

@effigies
Copy link
Copy Markdown
Contributor Author

effigies commented May 4, 2026

Yes, please, if you have benchmarks I would be happy to add them to my success criteria while contributing to this project.

@effigies
Copy link
Copy Markdown
Contributor Author

effigies commented May 9, 2026

The test failure is due to a misconfiguration of "Pytest coverage comment": https://github.com/childmindresearch/bids2table/actions/runs/25566684415/job/75051826576?pr=74

@effigies effigies force-pushed the rf/schemaadapter branch from 977e0ba to 6fb748f Compare May 21, 2026 16:34
@github-actions
Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py100100% 
__main__.py69888%101, 110, 135, 137–138, 164, 168, 172
_entities.py930100% 
_indexing.py214597%149, 158–159, 418, 458
_logging.py31487%30, 37, 39–40
_metadata.py48491%39–40, 66, 71
_pathlib.py21576%16–17, 19–20, 22
_schema.py520100% 
_version.py110100% 
pybids
   __init__.py40100% 
   _bidsfile.py381365%71–73, 77–79, 83–85, 89–91, 95
   _layout.py1564571%63, 72, 81, 104, 114–115, 118, 140–141, 156–157, 173–174, 177–181, 186, 188–189, 192–193, 228, 233, 241, 322–324, 389–394, 396, 399–404, 406, 462, 482
   _utils.py13561%47–50, 52
TOTAL7608988% 

Tests Skipped Failures Errors Time
131 1 💤 0 ❌ 0 🔥 13.845s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace global schema state with schema adapter dataclass

2 participants