Add benchmarking script by kaitj · Pull Request #76 · childmindresearch/bids2table

kaitj · 2026-05-06T17:23:04Z

Adds CI to benchmark new PRs against both the main branch (dev) and previous release. Benchmarks are performed on a small dataset from the bids-examples submodule (labelled as "local") and a subset from OpenNeuro (labelled as "remote"), with the focus primarily on performance.

Index sizes are negligible with these benchmarks given the size of these datasets and if desired should be performed on datasets that are larger than possible to fit on the gh runners. Can consider adding these using self-hosted runners with larger datasets.

I've disabled benchmarking against tags for now due to missing dependencies in older versions, but should re-enable them following the next release. Benchmarks on the main branch should work once this is merged in, so leaving that in there for now.

Below is an example of what this benchmark results would look like (was run on a fork):

Closes #75

effigies · 2026-05-07T15:57:19Z

I would make the comparisons ratios instead of differences, e.g., new/old (1 is no change, <1 is faster, >1 is slower). A difference of 40ms is huge if the baseline is 100ms, not so much if it's 20s.

kaitj · 2026-05-11T14:26:24Z

Was chatting with @nx10 about this on Friday. It didn't seem like benchmarking using the gh actions runners is very reliable from his past experience.

Given this, one solution is an updated script + desired changes that can be used to run the benchmarks locally. Another alternative to explore is self-hosted runners, which may be more reliable for performing benchmarks.

kaitj · 2026-05-14T15:27:01Z

f7a737d folded in the CI stuff into a locally run script for benchmarking and generates a markdown file with the results (same as to what was being commented via the action - see screenshot)

Note that in the current state, this is not meant to be exhaustive or comprehensive at the moment, but at least give us a quick look into how changes may affect performance.

gkiar · 2026-05-20T21:17:09Z

Forgive my failing to find this in the code, but how many retries are we doing here? To @nx10 's point, benchmarking in gh-actions isn't super consistent, but I don't think that's a big deal... We are still able to get apples-apples comparison if we run all jobs in the same node in the same conditions, and all we're looking for, to @effigies ' point is a sense of ratio for faster/slower.

I think the more important piece is to have ~10–20 retries per so we have a consistent estimate.

I'd also suggest a margin of error (eg 5%), below which we consider performance unchanged, since it'll rarely be perfectly identical.

nx10 · 2026-05-20T21:26:43Z

From past experience, continuous benchmarking and microbenchmarking on GHA generally don't work well. They can catch large regressions, but even then it takes care - stochastic sampling, cache warmups, and controlling for runner-to-runner variance all matter.

nx10 · 2026-05-20T21:28:26Z

To Greg's point: I think the margin of error on GHA would be much higher even - maybe 10% maybe 20%

- Mark tests with "cloud" and / or "benchmark" as needed - Combine both "dev" and "benchmark" dependencies, was causing issues with the pytest due to imports (alternatively, use `try-except` block for optional dependency import) - Replace pandas with polars in dev dependency (for benchmarking)

- Switch to shortened SHA for PR - Add PR for unique output file artifact - Disable comparison against tag due to lack of dependency group - Add step to comment on PR - Sort labels for comment

- Fold CI scripts into local benchmark script - Remove CI workflow - Use importlib for pytest for identical file names across different test modules

github-actions · 2026-05-21T20:35:17Z

Coverage Report

File	Stmts	Miss	Cover	Missing
__init__.py	9	0	100%
__main__.py	69	8	88%	101, 110, 135, 137–138, 164, 168, 172
_entities.py	112	1	99%	129
_indexing.py	212	5	97%	150, 159–160, 409, 447
_logging.py	31	4	87%	30, 37, 39–40
_metadata.py	48	4	91%	39–40, 66, 71
_pathlib.py	21	5	76%	16–17, 19–20, 22
_version.py	11	0	100%
pybids
__init__.py	4	0	100%
_bidsfile.py	38	13	65%	71–73, 77–79, 83–85, 89–91, 95
_layout.py	156	45	71%	63, 72, 81, 104, 114–115, 118, 140–141, 156–157, 173–174, 177–181, 186, 188–189, 192–193, 228, 233, 241, 322–324, 389–394, 396, 399–404, 406, 462, 482
_utils.py	13	5	61%	47–50, 52
TOTAL	724	90	87%

Tests	Skipped	Failures	Errors	Time
100	1 💤	0 ❌	0 🔥	13.061s ⏱️

kaitj force-pushed the ci/benchmark branch 4 times, most recently from 978193e to b35fc76 Compare May 6, 2026 19:45

kaitj force-pushed the ci/benchmark branch from dae6dcf to f7a737d Compare May 14, 2026 15:15

kaitj marked this pull request as ready for review May 14, 2026 15:34

kaitj requested a review from nx10 May 14, 2026 15:35

kaitj changed the title ~~Add benchmarking CI~~ Add benchmarking script May 14, 2026

kaitj force-pushed the ci/benchmark branch from f7a737d to bb5898f Compare May 15, 2026 15:25

kaitj added 7 commits May 21, 2026 16:08

Tests + dependencies for benchmarks

bef3eaf

Add scripts for benchmarking

fcad52b

Add benchmark CI

7241133

Register benchmark pytest marker

93f1914

Fix benchmark workflow bugs

2b2dbcf

- Switch to shortened SHA for PR - Add PR for unique output file artifact - Disable comparison against tag due to lack of dependency group - Add step to comment on PR - Sort labels for comment

Add benchmarking script

3354ce4

- Fold CI scripts into local benchmark script - Remove CI workflow - Use importlib for pytest for identical file names across different test modules

kaitj force-pushed the ci/benchmark branch from bb5898f to 3354ce4 Compare May 21, 2026 20:34

childmindresearch deleted a comment from github-actions Bot May 21, 2026

kaitj marked this pull request as draft May 21, 2026 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking script#76

Add benchmarking script#76
kaitj wants to merge 7 commits into
mainfrom
ci/benchmark

kaitj commented May 6, 2026 •

edited

Loading

Uh oh!

effigies commented May 7, 2026

Uh oh!

kaitj commented May 11, 2026

Uh oh!

kaitj commented May 14, 2026 •

edited

Loading

Uh oh!

gkiar commented May 20, 2026

Uh oh!

nx10 commented May 20, 2026

Uh oh!

nx10 commented May 20, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kaitj commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

effigies commented May 7, 2026

Uh oh!

kaitj commented May 11, 2026

Uh oh!

kaitj commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkiar commented May 20, 2026

Uh oh!

nx10 commented May 20, 2026

Uh oh!

nx10 commented May 20, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaitj commented May 6, 2026 •

edited

Loading

kaitj commented May 14, 2026 •

edited

Loading