Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
2a8e673
Tutorial template
Claptar Jan 15, 2026
0aa4453
Extended info command
Claptar Jan 15, 2026
7c73603
Add export command to CLI for exporting HDF5 objects in various formats
Claptar Jan 15, 2026
48e7efc
Add tests for info command and entry type detection
Claptar Jan 15, 2026
cf8bb08
Add optional dependency for images support
Claptar Jan 15, 2026
ec48226
Add import command for importing data into h5ad files
Claptar Jan 15, 2026
e1db36f
Refactor CLI commands for exporting and importing dataframes, arrays,…
Claptar Jan 15, 2026
ea404a8
Remove export_table function from CLI commands
Claptar Jan 15, 2026
f87cc77
Refactor export_table function and update imports in CLI commands
Claptar Jan 15, 2026
ddf6f21
Rename 'object' option to 'entry' in info command and add depth optio…
Claptar Jan 23, 2026
99b6335
Refactor show_info function to replace obj_path with entry_path and a…
Claptar Jan 23, 2026
99a9834
Refactor get_entry_type function to replace 'obj' with 'entry' and im…
Claptar Jan 23, 2026
82de268
Refactor info command tests to replace 'object' with 'entry' flag and…
Claptar Jan 23, 2026
aaeddb2
Enhance axis_len tests to validate error handling for non-existent ax…
Claptar Jan 23, 2026
a7d23e2
Added element specs for .h5ad files
Claptar Jan 23, 2026
1bda8d8
Refactor info and export_dataframe commands to use arguments instead …
Claptar Jan 23, 2026
4c6b185
Enhance get_entry_type function to support legacy categorical and dat…
Claptar Jan 23, 2026
d46981f
Refactor read_categorical_column and col_chunk_as_strings to support …
Claptar Jan 23, 2026
1cebbbf
Add support for legacy v0.1.0 h5ad files; implement tests for legacy …
Claptar Jan 23, 2026
b5ea898
Update show_info and _show_types_tree functions to exclude '__categor…
Claptar Jan 23, 2026
e0c727a
Enhance export_table function to support both modern and legacy dataf…
Claptar Jan 23, 2026
f14868f
Enhance console initialization in CLI to ensure Rich output is visibl…
Claptar Jan 23, 2026
9929e5b
Add tests for export_dataframe command with various options and flags
Claptar Jan 23, 2026
bf67289
Update import statements in __init__.py to include additional export …
Claptar Jan 23, 2026
295548d
Add Zarr element specifications documentation
Claptar Jan 23, 2026
6a454f0
Enhance export functionality to support awkward-array type and improv…
Claptar Jan 23, 2026
4b5d1bd
Refactor CLI export commands to improve parameter naming and enhance …
Claptar Jan 23, 2026
4d4cfde
Add tests for exporting legacy v0.1.0 dataframe and improve output va…
Claptar Jan 23, 2026
f380cb6
Refactor error handling in subset and read functions to use more spec…
Claptar Jan 26, 2026
95b6553
Refactor CLI options for chunk processing: update parameter names and…
Claptar Jan 26, 2026
1695f25
Refactor export_mtx function: update chunk_elements description, impr…
Claptar Jan 26, 2026
559b40d
Enhance export functions: add detailed docstrings for export_npy and …
Claptar Jan 26, 2026
36e5e8b
Add in-memory option to export_sparse command for improved performance
Claptar Jan 26, 2026
878421d
Refactor export_dict and export_json functions: update output paramet…
Claptar Jan 26, 2026
47fc283
Refactor export_image function: update output parameter to use Option…
Claptar Jan 26, 2026
595d81b
HUGE REFACTOR: Add format-specific import/export helpers for various …
Claptar Jan 26, 2026
6d8903b
HUGE REFACTOR: Add core functionality for handling .h5ad and .zarr st…
Claptar Jan 26, 2026
4f6d2e5
HUGE REFACTOR: Refactor h5ad command modules: update imports and stre…
Claptar Jan 26, 2026
4b09cf5
HUGE REFACTOR:Add initial implementation of Store class and backend d…
Claptar Jan 26, 2026
55558f9
HUGE REFACTOR:Add utility functions for path normalization in h5ad mo…
Claptar Jan 26, 2026
d40e849
HUGE REFACTOR: Update CLI to support .zarr stores alongside .h5ad, en…
Claptar Jan 26, 2026
bfec2b2
HUGE REFACTOR: Remove unused functions and imports from info.py and r…
Claptar Jan 26, 2026
dd14d3e
HUGE REFACTOR: Enhance export tests for Zarr support, adding new test…
Claptar Jan 26, 2026
eee67ae
HUGE REFACTOR: Rename job from 'test' to 'tests' and enhance test mat…
Claptar Jan 26, 2026
a2dc8f7
Update README to include support for .zarr stores and enhance feature…
Claptar Jan 26, 2026
b4c58a6
Renamed docs
Claptar Jan 26, 2026
d418e4d
Update README to add tutorial reference and ensure proper formatting
Claptar Jan 26, 2026
333925f
Update dependencies in pyproject.toml: remove obsolete images section…
Claptar Jan 26, 2026
bc7711f
Add GET_STARTED.md for initial setup and usage instructions
Claptar Jan 26, 2026
e28d5d3
Rename option '--types' to '--tree' in info command for clarity and u…
Claptar Jan 27, 2026
0674018
Add support for copying HDF5 groups in copy_tree function
Claptar Jan 27, 2026
e660047
Implement subset_matrix_entry function for handling dense and sparse …
Claptar Jan 27, 2026
949f37e
Rename --types flag to --tree in info command tests for clarity
Claptar Jan 27, 2026
3056ed4
Add tests for subsetting H5AD files with sparse matrices and variable…
Claptar Jan 27, 2026
3047c7a
Exclude 'obs_names' and 'var_names' from keys in group processing for…
Claptar Jan 27, 2026
361a2ae
Update GET_STARTED.md to include additional output examples for `info…
Claptar Jan 27, 2026
eeb34d9
Refactor subset command to require output path or use --inplace optio…
Claptar Jan 27, 2026
cb36c17
Update GET_STARTED.md to modify subset command syntax for clarity
Claptar Jan 27, 2026
6ba8336
Add inplace subsetting test for subset_h5ad function and fix dataset …
Claptar Jan 27, 2026
a4cbb6b
Refactor subset command tests to use --output flag for output file sp…
Claptar Jan 27, 2026
982efd4
Update uv.lock
Claptar Jan 27, 2026
e3acef8
Disable caching in UV setup for consistent test environment
Claptar Jan 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 23 additions & 8 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ concurrency:
cancel-in-progress: true

jobs:
test:
tests:
runs-on: ubuntu-latest
timeout-minutes: 20

Expand All @@ -23,6 +23,21 @@ jobs:
fail-fast: false
matrix:
python-version: ["3.12"] # add "3.13" if you want
module:
- name: cli
tests: tests/test_cli.py
- name: export
tests: tests/test_export.py
- name: import
tests: tests/test_import.py
- name: info-read
tests: tests/test_info_read.py
- name: subset
tests: tests/test_subset.py
- name: zarr
tests: tests/test_zarr.py

name: tests (${{ matrix.module.name }})

steps:
- uses: actions/checkout@v4
Expand All @@ -35,36 +50,36 @@ jobs:
- name: Set up uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: true
enable-cache: false

- name: Install dependencies (frozen)
run: uv sync --extra dev --frozen

- name: Run tests with coverage
run: |
uv run pytest -v \
uv run pytest -v -W default ${{ matrix.module.tests }} \
--cov=h5ad \
--cov-report=term-missing \
--cov-report=xml \
--cov-report=html \
--junitxml=pytest-results.xml
--junitxml=pytest-results-${{ matrix.module.name }}.xml

- name: Publish test results summary
uses: EnricoMi/publish-unit-test-result-action@v2
if: always()
with:
files: pytest-results.xml
check_name: Test Results
files: pytest-results-${{ matrix.module.name }}.xml
check_name: Test Results (${{ matrix.module.name }})

- name: Upload coverage artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: coverage
name: coverage-${{ matrix.module.name }}
path: |
coverage.xml
htmlcov/
pytest-results.xml
pytest-results-${{ matrix.module.name }}.xml
retention-days: 30

- name: Upload coverage to Codecov
Expand Down
60 changes: 22 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# h5ad CLI

A command-line tool for exploring huge `.h5ad` (AnnData) files without loading them fully into memory. Streams data directly from disk for efficient inspection of structure, metadata, and matrices.
A command-line tool for exploring huge AnnData stores (`.h5ad` and `.zarr`) without loading them fully into memory. Streams data directly from disk for efficient inspection of structure, metadata, and matrices.

## Features

- **`info`** – Show file structure and dimensions (`n_obs × n_var`)
- **`table`** – Export obs/var metadata to CSV with chunked streaming
- **`subset`** – Filter h5ad files by cell/gene names (supports dense and sparse CSR/CSC matrices)
- Memory-efficient chunked processing for large files
- Rich terminal output with colors and progress bars
- Streaming access to very large `.h5ad` and `.zarr` stores
- Auto-detects `.h5ad` files vs `.zarr` directories
- Chunked processing for dense and sparse matrices (CSR/CSC)
- Rich terminal output with progress indicators

## Installation

Using [uv](https://docs.astral.sh/uv/) (recommended):
```bash
git clone https://github.com/cellgeni/h5ad-cli.git
cd h5ad-cli
uv sync
```

Expand All @@ -21,45 +23,27 @@ For development and testing:
uv sync --extra dev
```

See [docs/TESTING.md](docs/TESTING.md) for testing documentation.

## Usage
Invoke any subcommand via `uv run h5ad ...`:

```bash
uv run h5ad --help
```

#### Examples

**Inspect overall structure and axis sizes:**
Alternative with pip:
```bash
uv run h5ad info data.h5ad
git clone https://github.com/cellgeni/h5ad-cli.git
cd h5ad-cli
pip install .
```

**Export full obs metadata to CSV:**
For development and testing with pip:
```bash
uv run h5ad table data.h5ad --axis obs --out obs_metadata.csv
pip install -e ".[dev]"
```

**Export selected obs columns to stdout:**
```bash
uv run h5ad table data.h5ad --axis obs --cols cell_type,donor
```
See [docs/TESTING.md](docs/TESTING.md) for testing documentation.

**Export var metadata with custom chunk size:**
```bash
uv run h5ad table data.h5ad --axis var --chunk-rows 5000 --out var_metadata.csv
```
## Commands (Overview)

**Subset by cell names:**
```bash
uv run h5ad subset input.h5ad output.h5ad --obs cells.txt
```
Run help at any level (e.g. `uv run h5ad --help`, `uv run h5ad export --help`).

**Subset by both cells and genes:**
```bash
uv run h5ad subset input.h5ad output.h5ad --obs cells.txt --var genes.txt
```
- `info` – read-only inspection of store layout, shapes, and type hints; supports drilling into paths like `obsm/X_pca` or `uns`.
- `subset` – stream and write a filtered copy based on obs/var name lists, preserving dense and sparse matrix encodings.
- `export` – extract data from a store; subcommands: `dataframe` (obs/var to CSV), `array` (dense to `.npy`), `sparse` (CSR/CSC to `.mtx`), `dict` (JSON), `image` (PNG).
- `import` – write new data into a store; subcommands: `dataframe` (CSV → obs/var), `array` (`.npy`), `sparse` (`.mtx`), `dict` (JSON).

All commands stream from disk, so even multi-GB `.h5ad` files remain responsive.
See [docs/GET_STARTED.md](docs/GET_STARTED.md) for a short tutorial.
Loading