Skip to content

Releases: RyanSeanPhillips/LabIndex

v0.1.0 - Initial Release

02 Feb 14:21

Choose a tag to compare

LabIndex v0.1.0 - Initial Release

NLP-Assisted Lab Directory Indexing and File Relationship Discovery

LabIndex builds a local SQLite index of your lab/network drives, enabling
fast search, intelligent file discovery, and automatic relationship
detection between data files and notes—all without ever modifying your
source files.

Key Features

Tiered Metadata Extraction

  • Tier 0: Fast file inventory (path, size, timestamps, extension-based
    categorization)
  • Tier 1: Pattern matching for standardized formats (ABF, SMRX
    headers, spreadsheet columns)
  • Tier 2: NLP extraction for semi-structured text (notes, references,
    entity detection)
  • Tier 3: LLM reading for complex/ambiguous notes (budget-controlled
    with gating conditions)

Automatic Link Detection

  • Animal ID matching from paths and filenames
  • Filename similarity with fuzzy matching and numeric suffix handling
  • Content reference detection (finds mentions of files in notes)
  • Folder proximity analysis (sibling/parent relationships)
  • 48+ ML features for link confidence scoring

Human-in-the-Loop Review

  • Confidence-based routing: auto-accept (>0.95), human review (0.4-0.95),
    auto-reject (<0.4)
  • Review queue prioritized by uncertainty
  • Corrections feed back into ML training data

LLM Integration

  • Claude (Anthropic API) adapter with native tool calling
  • Ollama adapter for local/free inference
  • Budget-controlled usage with gating conditions

Additional Features

  • Graph Visualization: Interactive QGraphicsView-based file
    relationship explorer
  • Full-Text Search: SQLite FTS5 for fast content search
  • Read-Only Safety: All file access through read-only facade—never
    modifies source files
  • MVVM Architecture: Clean separation with PyQt6 desktop application

Supported File Types

Category Extensions
Physiology Data .abf, .smrx, .smr, .edf, .tdms
Analysis Files .npz, .npy, .mat, .h5
Documents .pdf, .docx, .doc, .txt, .md
Spreadsheets .xlsx, .xls, .csv
Presentations .pptx, .ppt
Code .py, .m, .r, .ipynb

Architecture

labindex/
├── labindex_core/ # Headless library (importable as API)
│ ├── services/ # Crawler, Extractor, Linker, ML Trainer, etc.
│ ├── adapters/ # SQLite, LLM (Claude/Ollama), Filesystem
│ └── extractors/ # 12+ file-type-specific extractors

└── labindex_app/ # PyQt6 desktop application
├── viewmodels/ # MVVM ViewModels
└── views/ # UI components including graph visualization

Installation

git clone https://github.com/RyanSeanPhillips/LabIndex.git
cd LabIndex
pip install -e ".[dev,extraction]"
python run.py

Usage as API

from labindex_core.adapters.sqlite_db import SqliteDB
from labindex_core.services.crawler import CrawlerService
from labindex_core.services.search import SearchService

db = SqliteDB("my_index.db")
crawler = CrawlerService(ReadOnlyFS(), db)
search = SearchService(db)

root = crawler.add_root("/path/to/data", "My Project")
crawler.crawl_root(root.root_id)
results = search.search("mouse 266")

Related Projects

- https://github.com/RyanSeanPhillips/PhysioMetrics - Multi-modal
physiological data analysis platform

License

MIT License