Skip to content

dbcls/bsllmner-mk2

Repository files navigation

bsllmner-mk2

A CLI tool that extracts biological named entities from BioSample records with LLMs (Ollama) and maps them to ontology terms.

Documentation: https://dbcls.github.io/bsllmner-mk2

Capabilities

  • Extract mode (bsllmner2_extract) -- Performs Named Entity Recognition (NER) over BioSample metadata and emits structured JSON.
  • Select mode (bsllmner2_select) -- Runs the same NER pass, searches each extracted value against ontologies (Cellosaurus, Cell Ontology, UBERON, MONDO, ChEBI, NCBI Gene, Plant Ontology), and lets the LLM pick the best ontology term per field.

Quick Start

docker compose up -d --build
docker compose exec app bsllmner2_extract \
  --bs-entries tests/data/example_biosample.json \
  --model llama3.1:70b --debug

A complete walkthrough -- including ontology preparation and Select mode -- is in Getting Started.

Documentation

Basics

Modes

Reference

  • CLI -- bsllmner2_extract / bsllmner2_select options.
  • Data Formats -- Input/output schemas.
  • Configuration -- Environment variables and Ollama tuning.

Operations

Contributing

  • Development -- Local development setup.
  • Testing -- pytest, mypy, ruff, mutmut, model evaluation.
  • Benchmarking -- Reading performance and accuracy data.

Related Resources

License

Released under the MIT License. See LICENSE.

About

A tool for extracting biological named entities from BioSample records using Large Language Models (LLMs) and mapping them to ontology terms.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors