This plan builds on the existing foundation to complete the Adaptive Linking System with a minimal chat-first UI that allows iterative testing and refinement.
Approach: Build in small, testable increments. Each phase produces a working system that can be tested before moving on.
MVVM Compliance: All changes follow the existing pattern - ViewModels hold state and emit signals, Services do business logic, Views display and handle input.
✅ Handler system (base + registry)
✅ GenericTextHandler, GenericDataHandler, SpreadsheetHandler
✅ ContextReader (20-line context windows)
✅ MLTrainer (RandomForest/XGBoost)
✅ AdaptiveLinkingService (orchestration)
✅ FeatureExtractor (30+ features including timestamps)
✅ LinkAuditor (LLM validation)
✅ StrategyExplorerVM
✅ Persistent SQLite database
✅ ReadOnlyFS (safety layer)
✅ Timestamp features (just added)
Goal: Get a working chat interface that can talk to Ollama
File: src/labindex_app/views/chat_widget.py
class ChatWidget(QWidget):
"""Simple chat interface with message history."""
message_sent = pyqtSignal(str) # Emitted when user sends message
Components:
- QScrollArea with message bubbles
- QLineEdit for input
- Send button
- Folder drop zoneFile: src/labindex_app/viewmodels/chat_vm.py
class ChatViewModel(BaseViewModel):
"""Manages chat state and LLM interaction."""
Signals:
- message_received(str, str) # (role, content)
- thinking_started()
- thinking_finished()
Methods:
- send_message(text: str)
- clear_history()
Uses: OllamaLLM for responsesFile: src/labindex_app/views/main_window.py
class LabIndexMainWindow(QMainWindow):
"""Main window with chat panel."""
Layout:
┌─────────────────────────────────────────┐
│ LabIndex [Settings] │
├─────────────────────────────────────────┤
│ │
│ Chat Widget │
│ │
├─────────────────────────────────────────┤
│ [Folder drop zone] [Send] │
└─────────────────────────────────────────┘python -m labindex_app.main
# Should open window, type message, get Ollama responseEstimated effort: Small (foundation exists)
Goal: Drag a folder, see indexing progress, get summary
File: src/labindex_app/viewmodels/indexing_vm.py
class IndexingViewModel(BaseViewModel):
"""Manages folder indexing state."""
Signals:
- indexing_started(str) # folder path
- indexing_progress(int, int) # current, total
- indexing_finished(dict) # stats summary
Methods:
- start_indexing(folder_path: str)
- cancel_indexing()
Uses: CrawlerService, ExtractorServiceWhen user drops folder or types path:
- ChatVM detects folder path
- Triggers IndexingVM
- Shows progress in chat
- When done, summarizes findings
- Inline progress bar in chat
- "Indexed 245 files in 32 folders"
- "Found: 156 data files, 34 notes files"
# Drag folder → see progress → get summaryEstimated effort: Small
Goal: User shows examples, LLM discovers patterns
File: src/labindex_core/services/signature_learner.py
class SignatureLearner:
"""Learns content signatures from example files."""
def learn_from_examples(
self,
example_file_ids: List[int],
file_type_name: str
) -> ContentSignature:
"""
LLM reads examples and discovers keywords.
1. Read content from all example files
2. Ask LLM: "What keywords are common?"
3. Return structured ContentSignature
"""
def propose_file_type(
self,
file_id: int
) -> Tuple[str, float]:
"""
Score file against known signatures.
Returns (best_type, confidence).
"""User flow:
User: "These are photometry notes"
[selects 3 files in browse panel]
Bot: Analyzing examples...
I found these patterns:
• Keywords: 415nm, 470nm, ROI, GCaMP, GRABNE
• Mouse ID format: R-XXXXXX
• Usually in same folder as FP_data*
[✓ Looks right] [✗ Wrong] [Adjust...]
- Save to database as ContentSignature records
- Associate with file type name
- Track who created (user confirmed vs LLM proposed)
# Show examples → LLM proposes patterns → user confirmsEstimated effort: Medium
Goal: After learning patterns, re-extract metadata from matching files
File: src/labindex_core/services/reextractor.py
class ReextractionService:
"""Re-extracts metadata using learned patterns."""
def identify_candidates(
self,
signature: ContentSignature
) -> List[FileRecord]:
"""Find files that might match this signature."""
def reextract_batch(
self,
file_ids: List[int],
signature: ContentSignature
) -> ReextractionStats:
"""
Re-read files and extract enriched metadata.
Stores results in ContentRecord.entities
"""Add to ContentRecord.entities:
{
"wavelengths": [415, 470],
"mouse_ids": ["R-266018", "R-266019"],
"chambers": [1, 2],
"data_refs": ["FP_data_0", "FP_data_1"],
"experimenters": ["ANB", "JRS"],
"file_type_hint": "photometry_notes",
"signature_confidence": 0.87
}Bot: I'll now scan 45 candidate notes files for these patterns.
Progress: [████████░░] 80%
Done! Enriched metadata for 38 files:
• 28 matched photometry_notes pattern
• 10 matched pleth_notes pattern
# After learning → re-extraction runs → metadata enrichedEstimated effort: Medium
Goal: Add folder tree and search alongside chat
File: src/labindex_app/views/browse_panel.py
class BrowsePanel(QWidget):
"""Folder tree and search results."""
Components:
- QLineEdit for search
- QTreeView for folder structure
- QListWidget for search results
- File type filter chips
Signals:
- file_selected(int) # file_id
- files_selected(List[int]) # for bulk operationsFile: src/labindex_app/viewmodels/browse_vm.py
class BrowseViewModel(BaseViewModel):
"""Manages browse/search state."""
Properties:
- folder_tree: TreeModel
- search_results: List[FileRecord]
- selected_file: Optional[FileRecord]
Methods:
- search(query: str)
- filter_by_type(file_type: str)
- select_file(file_id: int)┌─────────────────────────────────────────────────────────────────┐
│ LabIndex [🔍 Search] [Settings]│
├────────────────────┬────────────────────────────────────────────┤
│ │ │
│ 📁 Browse │ 💬 Chat │
│ ───────────── │ │
│ │ [messages...] │
│ ▼ 📂 Experiments │ │
│ ▼ 📂 GRABNE... │ │
│ 📊 data.abf │ │
│ 📝 notes.txt │ │
│ │ │
│ ───────────── │ │
│ 🏷️ Types │ │
│ 📊 Data (245) │ │
│ 📝 Notes (34) │ │
│ │ │
├────────────────────┴────────────────────────────────────────────┤
│ Type a message or drag a folder... [Send] │
└─────────────────────────────────────────────────────────────────┘
- Click file in tree → show details in chat
- Right-click → "Show this as example"
- Drag files to chat → "These are photometry notes"
# Browse tree works, search works, connects to chatEstimated effort: Medium
Goal: Label multiple files at once via tree selection
File: src/labindex_app/dialogs/bulk_label_dialog.py
class BulkLabelDialog(QDialog):
"""Assign labels to multiple files."""
Features:
- Shows selected files in tree
- Dropdown for label type
- Quick actions: "All .txt → notes"
- Confirm buttonEach label action:
- Updates file category in database
- Creates training example for ML
- Increments "pending labels" counter
# Select files → bulk label → training data createdEstimated effort: Small
Goal: Train button, auto-training, view performance
File: src/labindex_app/views/training_panel.py
class TrainingPanel(QWidget):
"""ML model training controls."""
Shows:
- Current model accuracy
- Training examples count
- "Train Now" button
- Auto-train threshold setting- Runs in QThread
- Progress bar in UI
- Notification when complete
- Auto-triggers at 50+ new labels
# Label files → train model → see accuracy improvementEstimated effort: Small
Goal: Index multiple folders, cross-link between them
File: src/labindex_app/views/roots_panel.py
class RootsPanel(QWidget):
"""Manage indexed folders."""
Shows:
- List of indexed roots
- File counts per root
- "Add Folder" button
- "Refresh" per rootUpdate LinkerService to:
- Find links across roots (e.g., surgery notes → experiments)
- Use animal ID as linking key
- Track cross-root relationships
# Add surgery folder → links to experiment data → animal-centric viewEstimated effort: Medium
Goal: View, edit, merge, disable extraction rules
File: src/labindex_app/views/rules_panel.py
class RulesPanel(QWidget):
"""Manage extraction rules and signatures."""
Shows:
- Active rules with match counts
- Edit dialog for each rule
- Merge suggestions
- Disable/enable toggle
- Health warnings (unused rules, conflicts)Periodic check for:
- Rules with 0 matches → suggest delete
- Overlapping rules → suggest merge
- Slow rules → suggest optimization
# View rules → see health warnings → take suggested actionsEstimated effort: Small
Goal: Visual exploration of file relationships
File: src/labindex_app/views/graph_widget.py
class GraphWidget(QWidget):
"""Interactive relationship graph."""
Uses: pyqtgraph or matplotlib for rendering
Features:
- Nodes = files (colored by type)
- Edges = links (thickness = confidence)
- Click node → show details
- Filter by type/folder
- Layout options (tree, radial, force-directed)File: src/labindex_app/viewmodels/graph_vm.py
class GraphViewModel(BaseViewModel):
"""Manages graph state."""
Properties:
- nodes: List[GraphNode]
- edges: List[GraphEdge]
- selected_node: Optional[int]
- layout: str
Methods:
- load_subgraph(center_file_id: int, depth: int)
- filter_by_type(types: List[str])
- change_layout(layout: str)# View graph → click nodes → see relationshipsEstimated effort: Medium
Phase 1: Minimal Chat UI ← START HERE (foundation)
↓
Phase 2: Folder Indexing ← Core functionality
↓
Phase 3: LLM Example Learning ← Makes it "smart"
↓
Phase 4: Re-extraction Pipeline ← Applies learning
↓
Phase 5: Browse Panel ← Better UX
↓
Phase 6: Bulk Labeling ← Training data
↓
Phase 7: ML Training ← Improves over time
↓
Phase 8: Multi-Root Support ← Full capability
↓
Phase 9: Rules Manager ← Maintainability
↓
Phase 10: Graph Visualization ← Nice to have
✅ Timestamp features - Added to FeatureVector:
time_created_delta_hourstime_modified_delta_hourscreated_within_1h,created_within_24h,created_within_7dmodified_within_1h,modified_within_24hsrc_size_bytes,dst_size_bytes
These are automatically extracted and contribute to link scoring.
src/labindex_app/
├── main.py # Application entry point
├── viewmodels/
│ ├── __init__.py
│ ├── base.py # Existing
│ ├── chat_vm.py # Phase 1
│ ├── indexing_vm.py # Phase 2
│ ├── browse_vm.py # Phase 5
│ ├── training_vm.py # Phase 7
│ ├── graph_vm.py # Phase 10
│ ├── strategy_explorer_vm.py # Existing
│ └── candidate_review_vm.py # Existing
├── views/
│ ├── __init__.py
│ ├── main_window.py # Phase 1
│ ├── chat_widget.py # Phase 1
│ ├── browse_panel.py # Phase 5
│ ├── training_panel.py # Phase 7
│ ├── roots_panel.py # Phase 8
│ ├── rules_panel.py # Phase 9
│ └── graph_widget.py # Phase 10
└── dialogs/
├── __init__.py
└── bulk_label_dialog.py # Phase 6
src/labindex_core/
├── services/
│ ├── signature_learner.py # Phase 3 (NEW)
│ ├── reextractor.py # Phase 4 (NEW)
│ └── ... (existing services)
└── ... (existing structure)
Each phase has a checkpoint. After completing a phase:
- Manual test - Use the UI, verify behavior
- Integration test - Run test script against examples folder
- Regression check - Ensure previous features still work
All changes follow the pattern:
- ViewModels: Hold state, emit signals, call services
- Views: Display state, handle user input, connect to VM signals
- Services: Business logic, database access, LLM calls
- No direct View → Service calls
- No business logic in Views
The existing BaseViewModel class is used for all new ViewModels.
To begin implementation:
cd LabIndex
# Create the app package structure
mkdir -p src/labindex_app/views
mkdir -p src/labindex_app/dialogs
# Start with Phase 1
# Create chat_widget.py, chat_vm.py, main_window.pyThen follow each phase in order, testing at each checkpoint.