bids2table

Index BIDS datasets fast, locally or in the cloud.

Installation

Install the core package using pip:

pip install bids2table

Variants

Depending on your use case, you may need extra dependencies. Choose the option that matches your use case:

If you want to...	Run this command
Add cloud storage support (S3, GCS)	`pip install bids2table[cloud]`
Enable `pybids` compatibility	`pip install bids2table[pybids]`
Install everything	`pip install bids2table[cloud,pybids]`

Warning

Deprecation Warning: Previous versions used bids2table[s3] for cloud support. While the s3 extra still works for now, it will be removed in the next major release. Please update your installation scripts to use [cloud].

Development Version

To test out the absolute latest features directly from the main branch, install directly from GitHub:

pip install "bids2table[cloud,pybids] @ git+https://github.com/childmindresearch/bids2table.git"

Usage

To run these examples, you will need to clone the bids-examples repo.

git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git

Finding BIDS datasets

You can search a directory for valid BIDS datasets using b2t2 find

(bids2table) clane$ b2t2 find bids-examples | head -n 10
bids-examples/asl002
bids-examples/ds002
bids-examples/ds005
bids-examples/asl005
bids-examples/ds051
bids-examples/eeg_rishikesh
bids-examples/asl004
bids-examples/asl003
bids-examples/ds003
bids-examples/eeg_cbm

Indexing datasets from the command line

Indexing datasets is done with b2t2 index. Here we index a single example dataset, saving the output as a parquet file.

(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]

You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.

(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]

You can pipe the output of b2t2 find to b2t2 index to create an index of all datasets under a root directory.

(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
97it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]

The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.

Indexing datasets hosted on S3

bids2table supports indexing datasets hosted on S3 via cloudpathlib. To use this functionality, make sure to install bids2table with the s3 extra. Or you can also just install cloudpathlib directly

pip install cloudpathlib[s3]

As an example, here we index all datasets on OpenNeuro

(bids2table) clane$ b2t2 index -o openneuro.parquet \
  -j 8 --use-threads s3://openneuro.org/ds*
100%|█████████████████████████████████████| 1408/1408 [12:25<00:00,  1.89it/s, ds=ds006193, N=1.2M]

Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.

Indexing datasets from python

You can also index datasets using the Python API.

import bids2table as b2t2
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Index a single dataset.
tab = b2t2.index_dataset("bids-examples/ds102")

# Find and index a batch of datasets.
tabs = b2t2.batch_index_dataset(
    b2t2.find_bids_datasets("bids-examples"),
)
tab = pa.concat_tables(tabs)

# Index a dataset on S3.
tab = b2t2.index_dataset("s3://openneuro.org/ds000224")

# Save as parquet.
pq.write_table(tab, "ds000224.parquet")

# Convert to a pandas dataframe.
df = tab.to_pandas(types_mapper=pd.ArrowDtype)

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
bids-examples @ b6e5234		bids-examples @ b6e5234
bids2table		bids2table
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
codecov.yaml		codecov.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bids2table

Installation

Variants

Development Version

Usage

Finding BIDS datasets

Indexing datasets from the command line

Indexing datasets hosted on S3

Indexing datasets from python

About

Uh oh!

Releases 11

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bids2table

Installation

Variants

Development Version

Usage

Finding BIDS datasets

Indexing datasets from the command line

Indexing datasets hosted on S3

Indexing datasets from python

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages