-
Notifications
You must be signed in to change notification settings - Fork 0
Basic CLI #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gaurav
wants to merge
68
commits into
main
Choose a base branch
from
basic-implementation-in-uv
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Basic CLI #1
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
fcc27c2
This initializes a uv package in this repository.
gaurav 876353d
Added basic CLI.
gaurav ec1d1f0
Add /data to the .gitignore.
gaurav eff8f26
Initial implementation of a basic xref query-er.
gaurav 4d04e2a
Added a method to look up a particular identifier.
gaurav 8531cb7
Added CURIE expansion/recursive lookup.
gaurav a1aeec6
Added a basic ConcordTester.
gaurav bb1eb99
Added labels via NodeNorm.
gaurav 40c3338
Midnight commit: attempting to improve expansion.
gaurav 8c41112
Added some improvements.
gaurav 239c89f
Added a CLAUDE.md by Claude.ai.
gaurav 8132fe1
Reorganized file slightly.
gaurav bd00972
Claude wrote some tests.
gaurav 9cc06bc
Improved downloader using Claude.
gaurav da8bb0c
Added MD5 download functionality.
gaurav 8f36b74
Removed empty model file.
gaurav 0534fd8
Attempted to rename this package to babel-explorer.
gaurav 0b3a9f5
Add comprehensive pytest suite for all core modules
gaurav 8535202
Merge branch 'main' into basic-implementation-in-uv
gaurav ff0dacc
Added uv.lock (not sure why it wasn't added previously).
gaurav bacc72d
Update CLAUDE.md
gaurav 96d9609
Update pyproject.toml
gaurav 0c33e7e
Update src/babel_explorer/core/babel_xrefs.py
gaurav 1aff013
Update src/babel_explorer/core/nodenorm.py
gaurav af76c15
Replace MD5 checksumming with HTTP header caching and freshness window
gaurav fb41da0
Added some CURIEs to test.
gaurav 5c544a2
Partially changed --expand to --recurse.
gaurav 280212a
More fully changed --expand to --recurse.
gaurav b522e6e
Add pytest-xdist for parallel test execution
gaurav e137c31
Replace Python recursion in get_curie_xrefs with DuckDB WITH RECURSIVE
gaurav b115d02
Fix xdist race condition: skip test-data cleanup in parallel runs
gaurav 5a0f758
Made output a bit prettier.
gaurav be2fa36
Update src/babel_explorer/core/nodenorm.py
gaurav 2b2aa7f
Simplify babel_xrefs: extract helper, remove dead fetches, fix defaul…
gaurav 3cdd19c
Fix LabeledCrossReference: make it a frozen dataclass subclass
gaurav c7a3f16
Fix BabelDownloader: use tempfile.gettempdir() when local_path is None
gaurav c6635bc
Fix test-concord: guard against None from get_clique_identifiers
gaurav d74110e
Potential fix for pull request finding
gaurav 4338bdc
Potential fix for pull request finding
gaurav be3e427
Potential fix for pull request finding
gaurav f8b718b
Potential fix for pull request finding
gaurav 48c8e96
Potential fix for pull request finding
gaurav 8fb37d6
Potential fix for pull request finding
gaurav 49f5c3b
Potential fix for pull request finding
gaurav c952c12
Fix DuckDB connection leaks by using context managers
gaurav b0539bb
Fix and simplify test mocks for context manager protocol
gaurav 6319212
Add configurable HTTP timeout to NodeNorm and BabelDownloader
gaurav b634d11
Fix _etag_matches docstring to match actual behavior
gaurav a7eb8c1
Got rid of ignore_curies_in_expansion, which is no longer used.
gaurav 7163a64
Add ruff CI and fix all lint errors
gaurav f9e549e
Rename lint workflow to CI and add unit test job
gaurav 7d1d5ca
Improved documentation.
gaurav 0d9e32c
Add tests for parse_duration() in cli.py
gaurav 0ca35eb
Add CliRunner tests for xrefs, ids, and test-concord commands
gaurav 17782a2
Reformatted code with ruff.
gaurav d34c5c3
Update src/babel_explorer/cli.py
gaurav ec0a71c
Cache get_identifier() locals in _to_labeled_xref; root-anchor lib/ i…
gaurav 67be81e
Fix bugs and gaps identified in PR #1 code review
gaurav 4199ae2
Run integration tests on push to master and weekly on Tuesdays
gaurav 17f6b09
Add module, class, and method docstrings to new files in PR #1
gaurav 8216b0b
Fix LabeledCrossReference biolink_type fields to list[str]; simplify …
gaurav ac418ff
Address PR #1 review: frozen Identifier, atomic rename, fail-open HEA…
gaurav 06cd300
Sync CLAUDE.md with current code
gaurav 46f0863
Merge branch 'basic-implementation-in-uv' of github.com:TranslatorSRI…
gaurav bf1c48c
Address PR #1 review: fix six correctness and quality issues
gaurav 49e27c0
Add --format [text|json|tsv|csv] option to all CLI commands
gaurav aecae50
Add console format with rich color highlighting; replace text default
gaurav f7cde3a
Fix Identifier.from_dict splitting string fields into characters
gaurav File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| name: CI | ||
|
|
||
| on: | ||
| pull_request: | ||
| push: | ||
| branches: [main] | ||
| schedule: | ||
| - cron: "0 17 * * 2" # Tuesdays at 12pm EST (17:00 UTC); 1pm during EDT | ||
| workflow_dispatch: | ||
|
|
||
| jobs: | ||
| lint: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: astral-sh/setup-uv@v5 | ||
| - run: uv sync --group dev | ||
| - run: uv run ruff check src/ tests/ | ||
| - run: uv run ruff format --check src/ tests/ | ||
|
|
||
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: astral-sh/setup-uv@v5 | ||
| - run: uv sync --group dev | ||
| - run: uv run pytest -v -m "not integration" | ||
|
|
||
| integration-test: | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name != 'pull_request' | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: astral-sh/setup-uv@v5 | ||
| - run: uv sync --group dev | ||
| - run: uv run pytest -v -m "integration and not slow" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 3.11 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,151 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| babel-explorer is a tool for querying and exploring Babel intermediate files. It allows users to discover why two biological/chemical identifiers are considered identical by the Babel system, which handles cross-references between different ontology and database identifiers (e.g., MONDO, HP, UMLS, HGNC). | ||
|
|
||
| ## Development Setup | ||
|
|
||
| This project uses **uv** for package management: | ||
|
|
||
| ```bash | ||
| # Install dependencies | ||
| uv sync | ||
|
|
||
| # Install with dev dependencies | ||
| uv sync --group dev | ||
|
|
||
| # Run the CLI | ||
| uv run babel-explorer --help | ||
| ``` | ||
|
|
||
| ## Commands | ||
|
|
||
| ### Running the Application | ||
|
|
||
| ```bash | ||
| # Get cross-references for one or more CURIEs | ||
| uv run babel-explorer xrefs MONDO:0004979 | ||
|
|
||
| # Get cross-references with expansion (recursive lookup) | ||
| uv run babel-explorer xrefs MONDO:0004979 --recurse | ||
|
|
||
| # Get cross-references with labels from NodeNorm | ||
| uv run babel-explorer xrefs MONDO:0004979 --labels | ||
|
|
||
| # Get ID records for CURIEs | ||
| uv run babel-explorer ids MONDO:0004979 | ||
|
|
||
| # Test concordance changes with NodeNorm | ||
| uv run babel-explorer test-concord MONDO:0004979 HP:0000001 | ||
|
|
||
| # Use custom Babel server or local directory | ||
| uv run babel-explorer xrefs MONDO:0004979 --local-dir data/2025nov19 --babel-url https://stars.renci.org:443/var/babel_outputs/2025nov19/ | ||
| ``` | ||
|
|
||
| ### Development Commands | ||
|
|
||
| ```bash | ||
| # Run all tests (includes large file downloads) | ||
| uv run pytest -v | ||
|
|
||
| # Run unit tests only (fast, no network) | ||
| uv run pytest -v -m "not integration" | ||
|
|
||
| # Run integration tests without 2GB+ downloads | ||
| uv run pytest -v -m "integration and not slow" | ||
|
|
||
| # Run a single test file | ||
| uv run pytest -v tests/test_nodenorm.py | ||
|
|
||
| # Run linter | ||
| uv run ruff check | ||
|
|
||
| # Format code | ||
| uv run ruff format | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### Core Components | ||
|
|
||
| 1. **BabelDownloader** (`src/babel_explorer/core/downloader.py`): | ||
| - Downloads Babel intermediate files from a remote HTTP(S) server using Python's `requests` library (streaming downloads) | ||
| - Caches files locally in configurable directory (default: `data/2025nov19/`) | ||
| - Uses `@functools.lru_cache` to avoid re-downloading | ||
| - **Important**: Requires network access but no external tools like `wget` | ||
|
|
||
| 2. **BabelXRefs** (`src/babel_explorer/core/babel_xrefs.py`): | ||
| - Main query engine for cross-references | ||
| - Uses DuckDB to query Parquet files (`Concord.parquet`, `Identifiers.parquet`) | ||
| - Supports recursive expansion of cross-references via a single `WITH RECURSIVE` query | ||
| - Uses ephemeral in-memory DuckDB connections (nothing written to disk) | ||
|
|
||
| 3. **NodeNorm** (`src/babel_explorer/core/nodenorm.py`): | ||
| - Integration with NodeNormalization API (https://nodenormalization-sri.renci.org/) | ||
| - Fetches labels, biolink types, and equivalent identifiers for CURIEs | ||
| - Uses `@functools.lru_cache` for performance | ||
| - Optional component for label enrichment | ||
|
|
||
| 4. **CLI** (`src/babel_explorer/cli.py`): | ||
| - Click-based command-line interface | ||
| - Three main commands: `xrefs`, `ids`, `test-concord` | ||
|
|
||
| ### Data Flow | ||
|
|
||
| 1. User provides CURIEs via CLI | ||
| 2. BabelDownloader ensures required Parquet files are downloaded | ||
| 3. BabelXRefs queries files using DuckDB | ||
| 4. If `--labels` or `--recurse` flags are set, NodeNorm is queried for additional metadata | ||
| 5. Results are printed to stdout | ||
|
|
||
| ### Key Design Patterns | ||
|
|
||
| - **Lazy downloading**: Files are only downloaded when first accessed | ||
| - **LRU caching**: Heavy use of `@functools.lru_cache` to avoid redundant downloads and API calls | ||
| - **Recursive expansion**: The `--recurse` flag recursively follows all cross-references to build complete graphs | ||
| - **DuckDB for querying**: In-memory SQL queries against Parquet files for fast lookups | ||
|
|
||
| ## Testing | ||
|
|
||
| ### Test Structure | ||
|
|
||
| Tests live in `tests/` and are split into fast **unit tests** (mocked, no network) and slower **integration tests** (real downloads and API calls). Pytest markers control which tests run: | ||
|
|
||
| - **`@pytest.mark.integration`** — requires network access (downloads Parquet files or calls NodeNorm API) | ||
| - **`@pytest.mark.slow`** — downloads very large files (2 GB+) | ||
|
|
||
| | File | Unit | Integration | Slow | Total | | ||
| |------|------|-------------|------|-------| | ||
| | `tests/test_downloader.py` | 41 | 4 | 1 | 46 | | ||
| | `tests/test_babel_xrefs.py` | 23 | 20 | 3 | 46 | | ||
| | `tests/test_nodenorm.py` | 20 | 13 | 0 | 33 | | ||
| | `tests/test_cli.py` | 24 | 0 | 0 | 24 | | ||
|
|
||
| ### Test Infrastructure | ||
|
|
||
| - **`tests/conftest.py`** — Session-scoped fixtures that download Parquet files once and share them across all integration tests. Teardown removes the `data/test/` directory so the next run starts fresh. | ||
| - **`tests/constants.py`** — Shared constants (URLs, file paths) and `load_curies()` helper. | ||
| - **`tests/data/valid_curies.txt`** — One CURIE per line (`#` comments allowed). Integration tests are parametrized over this list — adding a new line automatically expands test coverage. | ||
|
|
||
| ### Key Dataclasses | ||
|
|
||
| - **`Identifier`** — Frozen dataclass for a normalized NodeNorm entry (curie, label, biolink_type, taxa, description). Returned by `NodeNorm.get_identifier()` and `get_clique_identifiers()`. | ||
| - **`CrossReference`** — Frozen dataclass for Concord.parquet rows (filename, subj, pred, obj) | ||
| - **`LabeledCrossReference`** — Extends CrossReference with labels and biolink types from NodeNorm | ||
| - **`IdentifierRecord`** — Frozen dataclass for Identifiers.parquet rows (curie + dynamic extra fields). Returned by `BabelXRefs.get_curie_ids()`. | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - **Data directory**: The `data/` directory is gitignored and contains downloaded Parquet files and generated DuckDB databases | ||
| - **Babel versions**: The default Babel version is `2025nov19`, but this can be customized via `--local-dir` and `--babel-url` | ||
|
|
||
| ## File Locations | ||
|
|
||
| - Source code: `src/babel_explorer/` | ||
| - Tests: `tests/` | ||
| - Test CURIEs: `tests/data/valid_curies.txt` | ||
| - Downloaded Babel files: `data/<version>/duckdb/*.parquet` | ||
| - Entry point: `src/babel_explorer/cli.py` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Future Work | ||
|
|
||
| ## Deduplicate CLI option blocks | ||
|
|
||
| `--local-dir`, `--babel-url`, and `--check-download` are copy-pasted between the | ||
| `xrefs` and `ids` commands in `cli.py`. Extract a `@common_babel_options` Click | ||
| decorator so defaults are defined in one place and can't drift. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,56 @@ | ||
| # Babel Explorer | ||
| Software for querying and exporting Babel intermediate files | ||
| Software for querying and exploring Babel intermediate files. | ||
|
|
||
| babel-explorer allows you to discover why two biological/chemical identifiers are considered identical by the [Babel](https://github.com/TranslatorSRI/Babel) system, which handles cross-references between different ontology and database identifiers (e.g., MONDO, HP, UMLS, HGNC). | ||
|
|
||
| ## Setup | ||
|
|
||
| This project uses [uv](https://docs.astral.sh/uv/) for package management: | ||
|
|
||
| ```bash | ||
| uv sync --group dev | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| ```bash | ||
| # Get cross-references for one or more CURIEs | ||
| uv run babel-explorer xrefs MONDO:0004979 | ||
|
|
||
| # Get cross-references with expansion (recursive lookup) | ||
| uv run babel-explorer xrefs MONDO:0004979 --recurse | ||
|
|
||
| # Get cross-references with labels from NodeNorm | ||
| uv run babel-explorer xrefs MONDO:0004979 --labels | ||
|
|
||
| # Get ID records for CURIEs | ||
| uv run babel-explorer ids MONDO:0004979 | ||
|
|
||
| # Test concordance changes with NodeNorm | ||
| uv run babel-explorer test-concord MONDO:0004979 HP:0000001 | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| Tests are split into fast **unit tests** (mocked, no network) and slower **integration tests** (real file downloads and API calls), controlled by pytest markers. | ||
|
|
||
| ```bash | ||
| # Unit tests only — fast, no network required | ||
| uv run pytest -v -m "not integration" | ||
|
|
||
| # Integration tests without 2GB+ downloads | ||
| uv run pytest -v -m "integration and not slow" | ||
|
|
||
| # Full suite including large file downloads | ||
| uv run pytest -v | ||
| ``` | ||
|
|
||
| ### Adding Test CURIEs | ||
|
|
||
| Integration tests are parametrized over the CURIEs listed in `tests/data/valid_curies.txt`. Add a new CURIE on its own line to automatically expand test coverage: | ||
|
|
||
| ``` | ||
| # tests/data/valid_curies.txt | ||
| MONDO:0004979 | ||
| HP:0000001 | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| [project] | ||
| name = "babel-explorer" | ||
| version = "0.1.0" | ||
| description = "Tool for querying and exploring Babel APIs and intermediate files" | ||
| readme = "README.md" | ||
| requires-python = ">=3.11" | ||
| dependencies = [ | ||
| "click>=8.3.1", | ||
| "duckdb>=1.4.2", | ||
| "requests>=2.32.5", | ||
| "rich>=13", | ||
| "tqdm>=4.67.0", | ||
| ] | ||
|
|
||
| [build-system] | ||
| requires = ["hatchling"] | ||
| build-backend = "hatchling.build" | ||
|
|
||
| [dependency-groups] | ||
| dev = [ | ||
| "filelock>=3.16", | ||
| "pytest>=8.3.5", | ||
| "pytest-xdist[psutil]>=3.6", | ||
| "ruff>=0.11.0", | ||
| ] | ||
|
|
||
| [project.scripts] | ||
| babel-explorer = "babel_explorer.cli:cli" | ||
|
|
||
| [tool.pytest.ini_options] | ||
| addopts = "-n auto" | ||
| markers = [ | ||
| "integration: tests requiring network access (deselect with '-m \"not integration\"')", | ||
| "slow: tests downloading very large files 2GB+ (deselect with '-m \"not slow\"')", | ||
| ] |
Empty file.
Empty file.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.