DeepLearnPhysics
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 49 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 26 additions & 0 deletions b/‎.gitignore‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 26 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 281 additions & 0 deletions b/‎README.md‎
Lines changed: 281 additions & 0 deletions
@@ -0,0 +1,49 @@
+name: CI
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main, develop]
+
+jobs:
+  lint-and-coverage:
+    name: Lint and Coverage
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.9", "3.10", "3.11", "3.12"]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e .
+          pip install black flake8 coverage
+
+      - name: Black
+        run: black --check src
+
+      - name: Flake8
+        run: flake8 src
+
+      - name: Coverage
+        run: |
+          coverage run -m spine_db.cli --help
+          coverage report -m --include="src/*"
+          coverage xml
+
+      - name: Upload coverage to Codecov
+        if: matrix.python-version == '3.11'
+        uses: codecov/codecov-action@v4
+        with:
+          file: ./coverage.xml
+          fail_ci_if_error: false
+          token: ${{ secrets.CODECOV_TOKEN }}
@@ -0,0 +1,26 @@
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+
+# Virtualenv
+.venv/
+venv/
+
+# Build artifacts
+build/
+dist/
+
+# Test/coverage
+.pytest_cache/
+.coverage
+coverage.xml
+
+# Tooling
+.mypy_cache/
+.ruff_cache/
+
+# Editors/OS
+.vscode/
+.idea/
+.DS_Store
@@ -0,0 +1,26 @@
+# Pre-commit configuration for spine-db
+# Run: pip install pre-commit && pre-commit install
+# Automatically runs code quality checks before each commit
+
+repos:
+  - repo: https://github.com/psf/black
+    rev: 24.8.0
+    hooks:
+      - id: black
+        language_version: python3
+        args: [--line-length=88]
+        files: ^src/.*\.py$
+
+  - repo: https://github.com/pycqa/isort
+    rev: 5.13.2
+    hooks:
+      - id: isort
+        args: [--profile=black, --line-length=88]
+        files: ^src/.*\.py$
+
+  - repo: https://github.com/pycqa/flake8
+    rev: 7.1.1
+    hooks:
+      - id: flake8
+        args: [--max-line-length=88, "--extend-ignore=E203,W503,E501"]
+        files: ^src/.*\.py$
@@ -0,0 +1,281 @@
+# SPINE Database (spine-db)
+
+Standalone metadata indexing and browsing system for SPINE production runs.
+
+**Key Feature**: Works with cloud databases (e.g., Supabase) for **multi-site deployments** - index files from S3DF, NERSC, or anywhere, all writing to one central database.
+
+## Quick Start
+
+### 1. Install
+
+```bash
+# From this repo root
+pip install -e .
+
+# Or from PyPI (once published)
+# pip install spine-db
+```
+
+### 2. Set Up Database
+
+**Option A: Cloud Database (Recommended for Multi-Site)**
+
+For production at multiple sites (S3DF, NERSC, etc.):
+
+```bash
+# Sign up at supabase.com (free, no credit card)
+# Create project → get connection string
+DB_URL="postgresql://user:pass@db.xyz.supabase.co:5432/postgres"
+
+# Works from anywhere - S3DF, NERSC, your laptop
+```
+
+**Option B: Local Testing with SQLite**
+
+```bash
+# Database created automatically on first use
+DB_URL="sqlite:///spine_files.db"
+```
+
+**Option C: Local PostgreSQL at S3DF/NERSC**
+
+```bash
+# Install postgres
+conda install postgresql
+initdb ~/postgres_data
+pg_ctl -D ~/postgres_data start
+
+# Create database
+createdb spine_db
+DB_URL="postgresql:///spine_db"
+```
+
+Initialize the schema once:
+
+```bash
+spine-db setup --db $DB_URL
+```
+
+### 3. Index Your Files
+
+```bash
+# Index from glob pattern
+spine-db inject --db $DB_URL --source /path/to/output/*.h5
+
+# Index from file list
+spine-db inject --db $DB_URL --source-list file_list.txt
+
+# Re-index existing files
+spine-db inject --db $DB_URL --source output/*.h5 --no-skip-existing
+```
+
+### 4. Launch Web UI
+
+```bash
+spine-db app --db $DB_URL --port 8050
+```
+
+Open your browser to http://localhost:8050
+
+## Features
+
+**Current**:
+
+- **Database Schema**: `spine_files` table tracking file_path, spine_version, model_name, dataset_name, created_at
+- **Multi-Site Support**: Works with cloud databases - S3DF and NERSC can write to same DB
+- **Metadata Extractor**: Reads HDF5 root attributes, infers from file paths
+- **Indexer**: CLI tool with glob patterns, file lists, skip/re-index options
+- **Web UI**: 
+  - Filter by version, model, dataset
+  - Sort by creation date (newest first)
+  - Adjustable result limit
+  - Full file path tooltips
+  - Total and filtered counts
+
+**Future Enhancements**:
+
+- Server-side pagination for large result sets
+- Semantic version parsing (major.minor.patch) for better filtering
+- Advanced analytics and histograms
+- Export filtered results to CSV/file lists
+- REST API for programmatic access
+- Detailed metadata panel on row click
+
+## Architecture
+
+```
+src/spine_db/
+├── __init__.py       # Package metadata
+├── cli.py            # CLI entry point
+├── schema.py         # SQLAlchemy models
+├── extractor.py      # HDF5 metadata extraction
+├── indexer.py        # Indexer logic
+├── setup.py          # Database setup helper
+├── app.py            # Dash web UI
+└── README.md         # This file
+```
+
+## Usage Examples
+
+### Testing Locally
+
+```bash
+# Index some files
+spine-db inject \
+    --db sqlite:///test_spine_files.db \
+    --source jobs/*/output/*.h5
+
+# Launch UI
+spine-db app \
+    --db sqlite:///test_spine_files.db \
+    --debug
+```
+
+### Production Setup
+
+**Multi-Site with Cloud Database** (S3DF + NERSC → Supabase):
+
+```bash
+# 1. Create free Supabase project at supabase.com
+#    Get connection string from project settings
+
+# 2. Index from S3DF
+ssh s3df.slac.stanford.edu
+spine-db inject \
+    --db postgresql://user:pass@db.xyz.supabase.co:5432/postgres \
+    --source-list /sdf/data/neutrino/spine_outputs.txt
+
+# 3. Index from NERSC (same database!)
+ssh perlmutter-p1.nersc.gov
+spine-db inject \
+    --db postgresql://user:pass@db.xyz.supabase.co:5432/postgres \
+    --source /global/cfs/cdirs/dune/www/data/*.h5
+
+# 4. Run UI anywhere (or deploy to Render/Railway for free)
+spine-db app \
+    --db postgresql://user:pass@db.xyz.supabase.co:5432/postgres \
+    --host 0.0.0.0 \
+    --port 8050
+```
+
+**Local PostgreSQL at S3DF**:
+
+```bash
+# 1. Set up PostgreSQL
+conda install postgresql
+initdb ~/postgres_data
+pg_ctl -D ~/postgres_data start
+createdb spine_db
+
+# 2. Index production outputs
+spine-db inject \
+    --db postgresql:///spine_db \
+    --source-list /path/to/production_files.txt
+
+# 3. Run UI (access via SSH tunnel)
+spine-db app \
+    --db postgresql:///spine_db \
+    --host 0.0.0.0 \
+    --port 8050
+
+# From your laptop:
+# ssh -L 8050:localhost:8050 user@s3df.slac.stanford.edu
+# Then open http://localhost:8050
+```
+
+### Extracting Metadata
+
+```bash
+# Test metadata extraction on a file
+python -m spine_db.extractor /path/to/output.h5
+```
+
+## Database Schema
+
+```sql
+CREATE TABLE spine_files (
+    id SERIAL PRIMARY KEY,
+    file_path VARCHAR UNIQUE NOT NULL,
+    spine_version VARCHAR,
+    spine_prod_version VARCHAR,
+    model_name VARCHAR,
+    dataset_name VARCHAR,
+    run INTEGER,
+    subrun INTEGER,
+    event_min INTEGER,
+    event_max INTEGER,
+    num_events INTEGER,
+    created_at TIMESTAMP NOT NULL DEFAULT NOW()
+);
+
+CREATE INDEX idx_spine_files_file_path ON spine_files(file_path);
+CREATE INDEX idx_spine_files_created_at ON spine_files(created_at);
+CREATE INDEX idx_spine_files_model_name ON spine_files(model_name);
+CREATE INDEX idx_spine_files_dataset_name ON spine_files(dataset_name);
+CREATE INDEX idx_spine_files_run ON spine_files(run);
+CREATE INDEX idx_spine_files_subrun ON spine_files(subrun);
+```
+
+## Database Size Estimates
+
+Each row stores ~350 bytes (file path, version, model, dataset, timestamps, indexes).
+
+**Examples**:
+- 1,000 files = ~350 KB
+- 10,000 files = ~3.5 MB  
+- 100,000 files = ~35 MB
+- 1,000,000 files = ~350 MB
+
+**Supabase free tier (500 MB)** = ~1.4 million files = several years of heavy production.
+
+## Integration with spine-prod
+
+### Automatic Indexing After Job Completion
+
+Add indexing to your submit.py workflow:
+
+```python
+# After job completion
+if not dry_run and job_ids:
+    # Index output files
+    output_files = glob.glob(f"{job_dir}/output/*.h5")
+    if output_files:
+        subprocess.run([
+            "spine-db", "inject",
+            "--db", os.environ.get("SPINE_DB_URL", "sqlite:///spine_files.db"),
+            "--source", *output_files
+        ])
+```
+
+### Future: Pipeline Integration
+
+In pipelines, add an indexing stage:
+
+```yaml
+stages:
+  - name: reconstruction
+    config: infer/icarus/latest.cfg
+    files: data/*.root
+    
+  - name: index
+    depends_on: [reconstruction]
+    # Custom indexer stage
+    script: |
+      spine-db inject \
+        --db $SPINE_DB_URL \
+        --source {{ reconstruction.output }}
+```
+
+## Contributing
+
+This is Phase 1 - a minimal working system. Future phases will:
+1. Add robust pagination and versioning (Phase 2)
+2. Implement security and deployment (Phase 3)
+3. Extract to separate repo with migrations (Phase 4)
+4. Add advanced features (Phase 5)
+
+Feedback and contributions welcome!
+
+## License
+
+Same as SPINE and spine-prod.