Training: DataLoader, feature cache, AMP, lazy package exports by cropsgg · Pull Request #3 · cropsgg/BloomDNABert

cropsgg · 2026-04-15T07:50:08Z

Summary

Updates training/inference plumbing so pipelines consume AppSettings: batched DNABERT encoding, optional on-disk feature cache, DataLoader-based training, optional AMP, and lazy imports in bloom_dnabert/__init__.py so importing light modules does not require every heavy dependency at import time.

Base branch: pr/02-clinvar-data-loader (stacked — merge after #2 or retarget once earlier PRs land).

What changed

bloom_dnabert/feature_cache.py — Fingerprinted cache for precomputed Bloom/DNABERT features when data.feature_cache_dir is set.
bloom_dnabert/classifier.py — HybridClassifierPipeline and BloomGuidedPipeline read batch sizes, workers, AMP, max length, etc. from settings.
bloom_dnabert/dnabert_wrapper.py — Batch encode / token-level outputs aligned with training loops.
bloom_dnabert/bloom_filter.py — Aligns with config paths and seed loading APIs used by the app/CLI.
bloom_dnabert/__init__.py — Lazy __getattr__ exports for torch/transformer-heavy submodules.

Why this PR exists

Scales experimentation beyond notebook-style loops: reproducible batching, less redundant DNABERT work via cache, and cleaner import graph for tests and partial installs.

Dependencies

PR Data: config-driven ClinVarDataLoader and reference-validated windows #2 (Data: config-driven ClinVarDataLoader and reference-validated windows #2): data loader + settings-aware data path.

How to verify

Run classifier/bloom tests in an environment with torch and dependencies from requirements.txt (full pytest tests/).

…orts

Training pipelines: DataLoader, optional cache, AMP, lazy package exp…

ef316ea

…orts

This was referenced Apr 15, 2026

App: Gradio/CLI wired to settings; requirements and README #4

Open

Config layer: Pydantic settings, default YAML, reference assets #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training: DataLoader, feature cache, AMP, lazy package exports#3

Training: DataLoader, feature cache, AMP, lazy package exports#3
cropsgg wants to merge 1 commit intopr/02-clinvar-data-loaderfrom
pr/03-training-pipeline

cropsgg commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cropsgg commented Apr 15, 2026

Summary

What changed

Why this PR exists

Dependencies

How to verify

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant