A CLI tool that extracts biological named entities from BioSample records with LLMs (Ollama) and maps them to ontology terms.
Documentation: https://dbcls.github.io/bsllmner-mk2
- Extract mode (
bsllmner2_extract) -- Performs Named Entity Recognition (NER) over BioSample metadata and emits structured JSON. - Select mode (
bsllmner2_select) -- Runs the same NER pass, searches each extracted value against ontologies (Cellosaurus, Cell Ontology, UBERON, MONDO, ChEBI, NCBI Gene, Plant Ontology), and lets the LLM pick the best ontology term per field.
docker compose up -d --build
docker compose exec app bsllmner2_extract \
--bs-entries tests/data/example_biosample.json \
--model llama3.1:70b --debugA complete walkthrough -- including ontology preparation and Select mode -- is in Getting Started.
Basics
- Getting Started -- First-run walkthrough.
- Installation -- Docker Compose, uv, host requirements.
Modes
- Extract Mode -- NER pipeline.
- Select Mode -- NER + ontology mapping pipeline.
Reference
- CLI --
bsllmner2_extract/bsllmner2_selectoptions. - Data Formats -- Input/output schemas.
- Configuration -- Environment variables and Ollama tuning.
Operations
- Ontology Preparation -- Building the OWL files Select mode consumes.
- ChIP-Atlas -- Processing ChIP-Atlas data (hg38 / mm10).
Contributing
- Development -- Local development setup.
- Testing -- pytest, mypy, ruff, mutmut, model evaluation.
- Benchmarking -- Reading performance and accuracy data.
- Original repository: sh-ikeda/bsllmner
- Related paper: https://doi.org/10.1101/2025.02.17.638570
Released under the MIT License. See LICENSE.