Releases: RyanSeanPhillips/LabIndex
Releases · RyanSeanPhillips/LabIndex
v0.1.0 - Initial Release
LabIndex v0.1.0 - Initial Release
NLP-Assisted Lab Directory Indexing and File Relationship Discovery
LabIndex builds a local SQLite index of your lab/network drives, enabling
fast search, intelligent file discovery, and automatic relationship
detection between data files and notes—all without ever modifying your
source files.
Key Features
Tiered Metadata Extraction
- Tier 0: Fast file inventory (path, size, timestamps, extension-based
categorization) - Tier 1: Pattern matching for standardized formats (ABF, SMRX
headers, spreadsheet columns) - Tier 2: NLP extraction for semi-structured text (notes, references,
entity detection) - Tier 3: LLM reading for complex/ambiguous notes (budget-controlled
with gating conditions)
Automatic Link Detection
- Animal ID matching from paths and filenames
- Filename similarity with fuzzy matching and numeric suffix handling
- Content reference detection (finds mentions of files in notes)
- Folder proximity analysis (sibling/parent relationships)
- 48+ ML features for link confidence scoring
Human-in-the-Loop Review
- Confidence-based routing: auto-accept (>0.95), human review (0.4-0.95),
auto-reject (<0.4) - Review queue prioritized by uncertainty
- Corrections feed back into ML training data
LLM Integration
- Claude (Anthropic API) adapter with native tool calling
- Ollama adapter for local/free inference
- Budget-controlled usage with gating conditions
Additional Features
- Graph Visualization: Interactive QGraphicsView-based file
relationship explorer - Full-Text Search: SQLite FTS5 for fast content search
- Read-Only Safety: All file access through read-only facade—never
modifies source files - MVVM Architecture: Clean separation with PyQt6 desktop application
Supported File Types
| Category | Extensions |
|---|---|
| Physiology Data | .abf, .smrx, .smr, .edf, .tdms |
| Analysis Files | .npz, .npy, .mat, .h5 |
| Documents | .pdf, .docx, .doc, .txt, .md |
| Spreadsheets | .xlsx, .xls, .csv |
| Presentations | .pptx, .ppt |
| Code | .py, .m, .r, .ipynb |
Architecture
labindex/
├── labindex_core/ # Headless library (importable as API)
│ ├── services/ # Crawler, Extractor, Linker, ML Trainer, etc.
│ ├── adapters/ # SQLite, LLM (Claude/Ollama), Filesystem
│ └── extractors/ # 12+ file-type-specific extractors
│
└── labindex_app/ # PyQt6 desktop application
├── viewmodels/ # MVVM ViewModels
└── views/ # UI components including graph visualization
Installation
git clone https://github.com/RyanSeanPhillips/LabIndex.git
cd LabIndex
pip install -e ".[dev,extraction]"
python run.py
Usage as API
from labindex_core.adapters.sqlite_db import SqliteDB
from labindex_core.services.crawler import CrawlerService
from labindex_core.services.search import SearchService
db = SqliteDB("my_index.db")
crawler = CrawlerService(ReadOnlyFS(), db)
search = SearchService(db)
root = crawler.add_root("/path/to/data", "My Project")
crawler.crawl_root(root.root_id)
results = search.search("mouse 266")
Related Projects
- https://github.com/RyanSeanPhillips/PhysioMetrics - Multi-modal
physiological data analysis platform
License
MIT License